Skip to content

Make UnbinnedLikelihood use cached density#522

Merged
israelmcmc merged 6 commits intodevelopfrom
PR_faster_UnbinnedLikelihood
Mar 19, 2026
Merged

Make UnbinnedLikelihood use cached density#522
israelmcmc merged 6 commits intodevelopfrom
PR_faster_UnbinnedLikelihood

Conversation

@scipascal
Copy link
Contributor

The UnbinnedLikelihood still uses fromiter instead of asarray. Instead of using a vectorized: bool argument (see SumExpectationDensity), I replaced it with batch_size: Optional[int]. We should decide whether to align this behavior with SumExpectationDensity.

Updated batch_size parameter to be optional and use asarray in case the density is already cached
@scipascal scipascal added the optimization Code can or needs to be optimized label Mar 18, 2026
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.52%. Comparing base (4b43a59) to head (67132ee).
⚠️ Report is 12 commits behind head on develop.

Files with missing lines Coverage Δ
cosipy/statistics/likelihood_functions.py 100.00% <100.00%> (+34.61%) ⬆️
cosipy/util/iterables.py 88.88% <100.00%> (+60.31%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scipascal scipascal added the pull-request-needs-reviewer No reviewer assigned label Mar 18, 2026
@israelmcmc israelmcmc added the Feature / Enhancement New functionality or improvement label Mar 18, 2026
@israelmcmc
Copy link
Collaborator

Thanks @scipascal.

I like this behavior, we should use it for all similar functions. To summarize from reading your code: if batch_size is None we vectorize the calculation, caching the whole input into an array if it is not an array already. Otherwise we process it in chunk, caching the input in chunks as well if it's an iterable. The vectorize parameter is then not needed.

As for the failing codecov/patch: if you can add an unit test for the unbinned likelihood I'd appreciate it. Otherwise let me know and I can bypass it, since that code was already not covered (and I'm the one to blame...)

@scipascal
Copy link
Contributor Author

@israelmcmc

  1. Should be set a batch size or None by default?
  2. What do we do if the user specifies a batch size but the expectation density is already cached. Does batching even make sense? Should one even use itertools_batched on it? Thats basically the only case where I am not sure what to here and also for the similar implementations.

@israelmcmc
Copy link
Collaborator

@scipascal

  1. Should be set a batch size or None by default?

Yeah, let's do that, makes sense.

  1. What do we do if the user specifies a batch size but the expectation density is already cached. Does batching even make sense? Should one even use itertools_batched on it? Thats basically the only case where I am not sure what to here and also for the similar implementations.

Good point. I think we should handle that case on a function-by-function basis. For some functions batching won't make sense if it's already cached (such as the case at hand, computing the unbinned likelihood). I can envision some other calculations where it would still make sense to process them it batches as a tradeoff between speed and memory --e.g. you need initialize a large matrix to complete the calculation for a given event, and while it's faster to do it all at once in parallel, the user might choose to do it in batches depending on their system. The docstring of the function should just be clear about what behavior to expect --e.g. the batching parameter will be ignored if the input already return arrays, not a general iterable.

pjanowsk and others added 4 commits March 19, 2026 11:24
…he data.

Signed-off-by: Israel Martinez <imc@umd.edu>
…already an array

- Check explicitly for numpy array subclass, not just len (iterators do not need __len__, but can have a __len__)
- Use recently introduce force_dtype=False to avoid an unnecessary copy it the input is already an array but not of dtype float64

Signed-off-by: Israel Martinez <imc@umd.edu>
@israelmcmc
Copy link
Collaborator

@scipascal I made a couple of suggestions in this PR to your branch #526

First, I added force_dtype=False to asarray to avoid an unnecessary copy it the input is already an array but not of dtype float64. I already merged this in #524

The main changes are in this commit b685124.

  • Use force_dtype=False
  • I'm checking explicitly for numpy array subclass, not just len (iterators do not need len, but can have a len)

All other changes in #526 are just from syncing with develop.

If you agree with the changes, we can go ahead an merge this.

Thanks for the unit test btw!

@scipascal
Copy link
Contributor Author

@israelmcmc I guess in this case forcing the dtype is not necessary, but I haven't checked. Overall, I found that when using many events for the unbinned analysis, float64 becomes necessary to ensure the EDM of iminuit can be estimated reliably. Especially during the part where you evaluate the flux model and perform the numerical integration. You can share your opinion, but I guess then it can be merged.

I made the same modifications (batch_size instead of vectorize) to the UnbinnedThreeMLModelFolding. But since it was originally related to my CachedUnbinnedThreeMLModelFolding it is part of my bigger PR.

@israelmcmc
Copy link
Collaborator

Thanks @scipascal.

The fact that not using float64 is the likely the cause of threeML/threeML#587 is a good catch. But, as you said, it shouldn't matter in this particular case since if the inputs are in e.g. float32 and we cast them to float64, we're not gaining the precision back anyway. If the input are already float64, they will remain that way.

I'll go ahead and merge this with the additional changes then.

I made the same modifications (batch_size instead of vectorize) to the UnbinnedThreeMLModelFolding. But since it was originally related to my CachedUnbinnedThreeMLModelFolding it is part of my bigger PR.

Sounds good, I'll check it on #515

@israelmcmc israelmcmc merged commit 94518bd into develop Mar 19, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature / Enhancement New functionality or improvement optimization Code can or needs to be optimized pull-request-needs-reviewer No reviewer assigned

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants