Replies: 1 comment 14 replies
-
|
Thanks for bringing this up. I checked how pandas handles that case and rather than using some epsilon for instability, they have logic in their variance method that checks if the current window is all the same value (https://github.com/pandas-dev/pandas/blob/1da0d022057862f4352113d884648606efd60099/pandas/_libs/window/aggregations.pyx#L337). This seems like a better solution to me as a small positive variance is possible for certain inputs, so rounding it down to zero is not ideal. One downside of the above approach is that we'll need to compare/count the number of consecutive values, which will have a slight performance cost. But I don't think it will make a huge difference, so I am good with making that change. Can you go ahead and copy what you have here into an Issue for tracking? Or alternatively, if you are comfortable making the change, your contribution is always welcome. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I noticed that variance compute method is implemented just to avoid negative values (cpp link), but not like other statistics using epsilon-based thresholding. This results in weird numerical error when all the values are the same, and csp.stats.stddev gives not exactly zero but small positive numbers. I have prepared a self-contained case to show the error. I think we should convert variance calculation to the same approach. Below is my proposed improvement:
Test case
Expected output from test case
Beta Was this translation helpful? Give feedback.
All reactions