Make UnbinnedLikelihood use cached density#522
Conversation
Updated batch_size parameter to be optional and use asarray in case the density is already cached
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
|
Thanks @scipascal. I like this behavior, we should use it for all similar functions. To summarize from reading your code: if As for the failing codecov/patch: if you can add an unit test for the unbinned likelihood I'd appreciate it. Otherwise let me know and I can bypass it, since that code was already not covered (and I'm the one to blame...) |
|
Yeah, let's do that, makes sense.
Good point. I think we should handle that case on a function-by-function basis. For some functions batching won't make sense if it's already cached (such as the case at hand, computing the unbinned likelihood). I can envision some other calculations where it would still make sense to process them it batches as a tradeoff between speed and memory --e.g. you need initialize a large matrix to complete the calculation for a given event, and while it's faster to do it all at once in parallel, the user might choose to do it in batches depending on their system. The docstring of the function should just be clear about what behavior to expect --e.g. the batching parameter will be ignored if the input already return arrays, not a general iterable. |
…. Also added some tests.
…he data. Signed-off-by: Israel Martinez <imc@umd.edu>
…already an array - Check explicitly for numpy array subclass, not just len (iterators do not need __len__, but can have a __len__) - Use recently introduce force_dtype=False to avoid an unnecessary copy it the input is already an array but not of dtype float64 Signed-off-by: Israel Martinez <imc@umd.edu>
|
@scipascal I made a couple of suggestions in this PR to your branch #526 First, I added The main changes are in this commit b685124.
All other changes in #526 are just from syncing with develop. If you agree with the changes, we can go ahead an merge this. Thanks for the unit test btw! |
|
@israelmcmc I guess in this case forcing the dtype is not necessary, but I haven't checked. Overall, I found that when using many events for the unbinned analysis, float64 becomes necessary to ensure the EDM of iminuit can be estimated reliably. Especially during the part where you evaluate the flux model and perform the numerical integration. You can share your opinion, but I guess then it can be merged. I made the same modifications (batch_size instead of vectorize) to the UnbinnedThreeMLModelFolding. But since it was originally related to my CachedUnbinnedThreeMLModelFolding it is part of my bigger PR. |
|
Thanks @scipascal. The fact that not using float64 is the likely the cause of threeML/threeML#587 is a good catch. But, as you said, it shouldn't matter in this particular case since if the inputs are in e.g. float32 and we cast them to float64, we're not gaining the precision back anyway. If the input are already float64, they will remain that way. I'll go ahead and merge this with the additional changes then.
Sounds good, I'll check it on #515 |
PR522 Israel changes
The
UnbinnedLikelihoodstill usesfromiterinstead ofasarray. Instead of using avectorized: boolargument (seeSumExpectationDensity), I replaced it withbatch_size: Optional[int]. We should decide whether to align this behavior withSumExpectationDensity.