Skip to content

Remove unused hits columns#949

Merged
Luismiguelvillar merged 9 commits intonext-exp:masterfrom
gonzaponte:remove-unused-cols
Feb 24, 2026
Merged

Remove unused hits columns#949
Luismiguelvillar merged 9 commits intonext-exp:masterfrom
gonzaponte:remove-unused-cols

Conversation

@gonzaponte
Copy link
Collaborator

There are several columns that hold dummy values due to historical reasons. This PR removes them and breaks backward compatibility to reduce noise in the output file and save memory. These columns are:

  • nsipm (always 1)
  • Xrms / Yrms (always 0)
  • Qc (always -1)
  • track_id and Ep (always -1, pre-paolina)

@Luismiguelvillar
Copy link
Member

Hi Gonza! I have a qestion regarding this PR. I have been looking into event_model.py and still many of the unused columns are still there, for example in the HitCollection class definition. Is there a particular reason for this?

@Luismiguelvillar Luismiguelvillar self-requested a review February 5, 2026 11:25
@gonzaponte
Copy link
Collaborator Author

In principle, no, just leftovers that I didn't pick up in my work. Well spotted, I'll work on removing them.

@Luismiguelvillar
Copy link
Member

Make sense! There are many "hidden" references across all IC hahaha. I also found a nsimp reference at:https://github.com/gonzaponte/IC/blob/df53e196cb19900463791becec789f29d9a33113/invisible_cities/reco/psf_functions.py#L107C5-L107C30

@gonzaponte
Copy link
Collaborator Author

Thanks! I will search the entire repo for the unused columns.

@Luismiguelvillar
Copy link
Member

Thanks! I will search the entire repo for the unused columns.

Perfect! Sounds good! I also found this reference in buffy: https://github.com/gonzaponte/IC/blob/df53e196cb19900463791becec789f29d9a33113/invisible_cities/cities/buffy.py#L112C1-L112C63

@Luismiguelvillar
Copy link
Member

@Luismiguelvillar
Copy link
Member

Make sense! There are many "hidden" references across all IC hahaha. I also found a nsimp reference at:https://github.com/gonzaponte/IC/blob/df53e196cb19900463791becec789f29d9a33113/invisible_cities/reco/psf_functions.py#L107C5-L107C30

I also saw that nsimp is used in trude.py

@gonzaponte
Copy link
Collaborator Author

I've removed the remaining references to the variables we are discussing, with the exception of those that use the Cluster class directly. I think that bit of work belongs to the removal of the event model as a whole.

The other nsipm references you may encounter are not related to hits, but to other bits of the processing chain.
Same goes for X/Yrms which is used differently in the kdst stuff.

@Luismiguelvillar
Copy link
Member

I've removed the remaining references to the variables we are discussing, with the exception of those that use the Cluster class directly. I think that bit of work belongs to the removal of the event model as a whole.

The other nsipm references you may encounter are not related to hits, but to other bits of the processing chain. Same goes for X/Yrms which is used differently in the kdst stuff.

Great!! I have a quick question. I was looking at the function track_blob_info_creator_extractor()
(https://github.com/gonzaponte/IC/blob/4f448e2579a326c30df9f66e3a6335a66d47369c/invisible_cities/cities/components.py#L1454
) and I’m wondering what the impact would be of removing Ep from HitCollection. This function appears to use Ep explicitly, so I want to make sure I understand how that dependency would be handled after the change.

@gonzaponte
Copy link
Collaborator Author

The Ep field is generated within the cities (esmeralda and isaura). I guess the question is whether we want to keep it in the output. You and @jwaiton are probably the most likely users, I will ask around here as well if anyone uses this quantity.

@Luismiguelvillar
Copy link
Member

The Ep field is generated within the cities (esmeralda and isaura). I guess the question is whether we want to keep it in the output. You and @jwaiton are probably the most likely users, I will ask around here as well if anyone uses this quantity.

Yes, we use Ep at the moment, but changing the algorithms to not rely in Ep is straightforward!

@Luismiguelvillar
Copy link
Member

The Ep field is generated within the cities (esmeralda and isaura). I guess the question is whether we want to keep it in the output. You and @jwaiton are probably the most likely users, I will ask around here as well if anyone uses this quantity.

Yes, we use Ep at the moment, but changing the algorithms to not rely in Ep is straightforward!

I think it will be a good idea to eliminate Ep completly, but maybe @jwaiton has something to say!

@gonzaponte
Copy link
Collaborator Author

When you say eliminate, you mean parametrize the algorithm to be able to choose which one to use? If that's the case I'm ok with that, but that's orthogonal to this PR.

@gonzaponte
Copy link
Collaborator Author

By the way, Ep is used internally to be able to modify the hits' energy in the track-finding algorithm without affecting the event energy.

@Luismiguelvillar
Copy link
Member

When you say eliminate, you mean parametrize the algorithm to be able to choose which one to use? If that's the case I'm ok with that, but that's orthogonal to this PR.

Fair enough!

@Luismiguelvillar
Copy link
Member

By the way, Ep is used internally to be able to modify the hits' energy in the track-finding algorithm without affecting the event energy.

Yes, I understand! In that case it makes sense! I will have a look at the rest and ask more questions when appropiate!

@jwaiton
Copy link
Member

jwaiton commented Feb 6, 2026

By the way, Ep is used internally to be able to modify the hits' energy in the track-finding algorithm without affecting the event energy.

As far as I can tell (I could be very wrong), within the compute_and_write_tracks_info() code, there is no moment in which the Ep is explicitly modified, only called as reference. I have some issues with the way Ep is assigned as well, but as you said this is orthogonal to this PR :)

@Luismiguelvillar
Copy link
Member

By the way, Ep is used internally to be able to modify the hits' energy in the track-finding algorithm without affecting the event energy.

As far as I can tell (I could be very wrong), within the compute_and_write_tracks_info() code, there is no moment in which the Ep is explicitly modified, only called as reference. I have some issues with the way Ep is assigned as well, but as you said this is orthogonal to this PR :)

That is what I thought, maybe we can look into this in the near future!

@gonzaponte
Copy link
Collaborator Author

As far as I can tell (I could be very wrong), within the compute_and_write_tracks_info() code, there is no moment in which the Ep is explicitly modified,

I remembered that the reason for the existence of Ep was that there was something in there that needed to modify the energy. It took me a while, but I think I've found it. It's this line in the voxel-dropping function.

Copy link
Member

@Luismiguelvillar Luismiguelvillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks fine and all doubts have been answered. I approve the pull request

@Luismiguelvillar Luismiguelvillar merged commit 0d77b8b into next-exp:master Feb 24, 2026
1 check passed
@gonzaponte
Copy link
Collaborator Author

Thanks, @Luismiguelvillar. For future reference, we don't merge-commit like this. We let a third person do it (to increase the know-how) following the instructions in this file. I'll rewrite the history now.

gonzaponte pushed a commit that referenced this pull request Feb 24, 2026
#949

[author: gonzaponte]

There are several columns that hold dummy values due to historical
reasons. This PR removes them and breaks backward compatibility to
reduce noise in the output file and save memory. These columns are:

- nsipm (always 1)
- Xrms / Yrms (always 0)
- Qc (always -1)
- track_id and Ep (always -1, pre-paolina)

[reviewer: Luismiguelvillar]

Everything looks fine and all doubts have been answered.
@gonzaponte gonzaponte deleted the remove-unused-cols branch February 24, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants