Skip to content

Performance cleanup for continuum background estimation#489

Closed
jdbuhler wants to merge 5 commits intocositools:developfrom
McKelvey-Engineering-CSE:faster_background
Closed

Performance cleanup for continuum background estimation#489
jdbuhler wants to merge 5 commits intocositools:developfrom
McKelvey-Engineering-CSE:faster_background

Conversation

@jdbuhler
Copy link
Contributor

The continuum background estimation code is documented as taking several hours to compute the background for a response with ~360 Em x phi bins. There are some obvious opportunities to rewrite the key parts of this code to eliminate data copies and interpreter loops. More importantly, the current implementation re-projects the same large, sparse binned data set for every Em / Phi combo; this is a very expensive operation that need be performed only once, rather than on every pass through the loop.

This patch overhauls the code to address the issues above. After my changes, the code spends around 45s loading and projecting the binned data set once, and then spends negligible time actually computing the background for all Em/Phi. The output should be identical to the previous code, except that I believe the old implementation of in-painting would not use the last pixel in the map because of an >= rather than a > in one test. Hopefully this implementation will be a source of inspiration as we consider how to implement better estimation algorithms.

Because the code is overall 2-3 orders of magnitude faster, there is no longer a need for progress-related printing and logging. I also removed the helper methods to load the response, which respectively consisted of two and one line of code that has nothing to do with the core task of background estimation. These operations will likely change anyway with the adoption of the interfaces branch, so let's just code them as needed in the notebook/script that calls background estimation, consistently with how we load responses in all the other tutorial notebooks.

I updated the background estimation notebook to compute every Em/Phi bin before writing the output.

I want to call out the fact that in this code, much like in the spectral fitting notebooks, projecting away the time axis from a large, sparse binned data set is a serious performance bottleneck. It would be much preferable to have the BinnedData object optionally bin events densely without using the time axis for analyses that don't need the event times. This would avoid the need for expensive projection later on.

Jeremy Buhler and others added 2 commits February 16, 2026 22:55
* remove the helper methods for loading responses, which are not
  specific to this class and can be implemented in at most two lines
  of code in the notebook
* change the notebook to compute the background for all Em / Phi bins,
  in addition to plotting the couple of bins it plotted before
@jdbuhler jdbuhler requested a review from ckarwin February 17, 2026 05:20
@jdbuhler
Copy link
Contributor Author

@ckarwin , should the first line of simple_inpainting compute the mean of only the nonzero entries in m_data? Your comment suggests that this is what you'd prefer.

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.16%. Comparing base (ebb34b1) to head (db122d6).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
...osipy/background_estimation/ContinuumEstimation.py 94.73% 3 Missing ⚠️
Files with missing lines Coverage Δ
...osipy/background_estimation/ContinuumEstimation.py 95.94% <94.73%> (+53.19%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jeremy Buhler added 3 commits February 16, 2026 23:49
* name the output written by the notebook with an .h5 extension,
  since it is an HDF5 containing a Histogram.
* name the file saved in the tutorial with an .h5 extension, since
  it is a Histogram in a HDF5 file
@jdbuhler
Copy link
Contributor Author

jdbuhler commented Feb 17, 2026

Additional question for @israelmcmc : should the opening and projection of the binned data be moved out of this class into the notebook? I assume you will want to use your new interfaces to open it.

@ckarwin
Copy link
Contributor

ckarwin commented Feb 17, 2026

@jdbuhler This entire continuum background estimation code has been replaced by a new version in an open PR to @israelmcmc interfaces branch: israelmcmc#10. The new version uses a a GCN for the inpainting, and it also includes the simplified inpainting method presented in the original version, which has been optimized and now takes on the order of seconds.

@jdbuhler
Copy link
Contributor Author

Ah, ok. A quick look at the new code suggests that some variant of my vectorized inpainting code could still be useful in the non-NN case. But I will hold off on this until after interfaces and the new code goes in.

@jdbuhler jdbuhler closed this Feb 17, 2026
@israelmcmc
Copy link
Collaborator

Yeah, let's just wait a little bit. Thanks @jdbuhler!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants