Make UnbinnedLikelihood use cached density by scipascal · Pull Request #522 · cositools/cosipy

scipascal · 2026-03-18T11:06:45Z

The UnbinnedLikelihood still uses fromiter instead of asarray. Instead of using a vectorized: bool argument (see SumExpectationDensity), I replaced it with batch_size: Optional[int]. We should decide whether to align this behavior with SumExpectationDensity.

Updated batch_size parameter to be optional and use asarray in case the density is already cached

codecov · 2026-03-18T11:10:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.52%. Comparing base (4b43a59) to head (67132ee).
⚠️ Report is 12 commits behind head on develop.

Files with missing lines	Coverage Δ
cosipy/statistics/likelihood_functions.py	`100.00% <100.00%> (+34.61%)`	⬆️
cosipy/util/iterables.py	`88.88% <100.00%> (+60.31%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

israelmcmc · 2026-03-18T16:16:33Z

Thanks @scipascal.

I like this behavior, we should use it for all similar functions. To summarize from reading your code: if batch_size is None we vectorize the calculation, caching the whole input into an array if it is not an array already. Otherwise we process it in chunk, caching the input in chunks as well if it's an iterable. The vectorize parameter is then not needed.

As for the failing codecov/patch: if you can add an unit test for the unbinned likelihood I'd appreciate it. Otherwise let me know and I can bypass it, since that code was already not covered (and I'm the one to blame...)

scipascal · 2026-03-18T18:31:19Z

@israelmcmc

Should be set a batch size or None by default?
What do we do if the user specifies a batch size but the expectation density is already cached. Does batching even make sense? Should one even use itertools_batched on it? Thats basically the only case where I am not sure what to here and also for the similar implementations.

israelmcmc · 2026-03-18T18:44:28Z

@scipascal

Should be set a batch size or None by default?

Yeah, let's do that, makes sense.

What do we do if the user specifies a batch size but the expectation density is already cached. Does batching even make sense? Should one even use itertools_batched on it? Thats basically the only case where I am not sure what to here and also for the similar implementations.

Good point. I think we should handle that case on a function-by-function basis. For some functions batching won't make sense if it's already cached (such as the case at hand, computing the unbinned likelihood). I can envision some other calculations where it would still make sense to process them it batches as a tradeoff between speed and memory --e.g. you need initialize a large matrix to complete the calculation for a given event, and while it's faster to do it all at once in parallel, the user might choose to do it in batches depending on their system. The docstring of the function should just be clear about what behavior to expect --e.g. the batching parameter will be ignored if the input already return arrays, not a general iterable.

…. Also added some tests.

…he data. Signed-off-by: Israel Martinez <imc@umd.edu>

…already an array - Check explicitly for numpy array subclass, not just len (iterators do not need __len__, but can have a __len__) - Use recently introduce force_dtype=False to avoid an unnecessary copy it the input is already an array but not of dtype float64 Signed-off-by: Israel Martinez <imc@umd.edu>

israelmcmc · 2026-03-19T21:07:06Z

@scipascal I made a couple of suggestions in this PR to your branch #526

First, I added force_dtype=False to asarray to avoid an unnecessary copy it the input is already an array but not of dtype float64. I already merged this in #524

The main changes are in this commit b685124.

Use force_dtype=False
I'm checking explicitly for numpy array subclass, not just len (iterators do not need len, but can have a len)

All other changes in #526 are just from syncing with develop.

If you agree with the changes, we can go ahead an merge this.

Thanks for the unit test btw!

scipascal · 2026-03-19T21:16:59Z

@israelmcmc I guess in this case forcing the dtype is not necessary, but I haven't checked. Overall, I found that when using many events for the unbinned analysis, float64 becomes necessary to ensure the EDM of iminuit can be estimated reliably. Especially during the part where you evaluate the flux model and perform the numerical integration. You can share your opinion, but I guess then it can be merged.

I made the same modifications (batch_size instead of vectorize) to the UnbinnedThreeMLModelFolding. But since it was originally related to my CachedUnbinnedThreeMLModelFolding it is part of my bigger PR.

israelmcmc · 2026-03-19T21:25:41Z

Thanks @scipascal.

The fact that not using float64 is the likely the cause of threeML/threeML#587 is a good catch. But, as you said, it shouldn't matter in this particular case since if the inputs are in e.g. float32 and we cast them to float64, we're not gaining the precision back anyway. If the input are already float64, they will remain that way.

I'll go ahead and merge this with the additional changes then.

I made the same modifications (batch_size instead of vectorize) to the UnbinnedThreeMLModelFolding. But since it was originally related to my CachedUnbinnedThreeMLModelFolding it is part of my bigger PR.

Sounds good, I'll check it on #515

PR522 Israel changes

Make UnbinnedLikelihood use cached density

b408725

Updated batch_size parameter to be optional and use asarray in case the density is already cached

scipascal added the optimization Code can or needs to be optimized label Mar 18, 2026

scipascal added the pull-request-needs-reviewer No reviewer assigned label Mar 18, 2026

israelmcmc added this to cosipy Mar 18, 2026

israelmcmc added the Feature / Enhancement New functionality or improvement label Mar 18, 2026

pjanowsk and others added 4 commits March 19, 2026 11:24

Modified behaviour when batch size is None but density already cached…

87cb140

…. Also added some tests.

Allow to relax the dtype requirement to prevent unnecessary copy of t…

1033954

…he data. Signed-off-by: Israel Martinez <imc@umd.edu>

Merge branch 'relax_asarray_dtype' into PR522_IsraelChanges

d504bce

Merge pull request #526 from israelmcmc/PR522_IsraelChanges

67132ee

PR522 Israel changes

israelmcmc merged commit 94518bd into develop Mar 19, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make UnbinnedLikelihood use cached density#522

Make UnbinnedLikelihood use cached density#522
israelmcmc merged 6 commits intodevelopfrom
PR_faster_UnbinnedLikelihood

scipascal commented Mar 18, 2026

Uh oh!

codecov bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

israelmcmc commented Mar 18, 2026

Uh oh!

scipascal commented Mar 18, 2026

Uh oh!

israelmcmc commented Mar 18, 2026

Uh oh!

israelmcmc commented Mar 19, 2026

Uh oh!

scipascal commented Mar 19, 2026

Uh oh!

israelmcmc commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scipascal commented Mar 18, 2026

Uh oh!

codecov bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

israelmcmc commented Mar 18, 2026

Uh oh!

scipascal commented Mar 18, 2026

Uh oh!

israelmcmc commented Mar 18, 2026

Uh oh!

israelmcmc commented Mar 19, 2026

Uh oh!

scipascal commented Mar 19, 2026

Uh oh!

israelmcmc commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 18, 2026 •

edited

Loading