-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to implement a Centered moving average? #186
Comments
I am happy to hear that bottleneck has been useful. You made my day. I doubt I will be making your day because I am going to suggest a workaround which might not work for you. Could you shift the output to get what you want? |
We implemented the shifting solution in xarray to do centered moving averages on top of bottleneck. Try |
Oh! Nice. |
Here's the implementation (by @fujiisoup) for anyone that's curious: https://github.com/pydata/xarray/blob/e1dc51572e971567fd3562db0e9f662e3de80898/xarray/core/rolling.py#L281 |
@kwgoodman Well bottleneck has reduced my algorithm's calculation times by quite a significant factor (approx an order of magnitude). For my current application, I'm calculating a moving mean over a couple of billion values, and even though pandas is not slow, bottleneck still manages to save me quite some time. So I can honestly say that you made my day when I discovered your module! @shoyer Thank you for the suggestion, now I know about the existence of xarray as well! I also tried out the suggested method and compared it to bottleneck and pandas using some quickly written example code. However, I have the feeling the bottleneck module is not being recognized by xarray, since the computation times are way higher than for the bottleneck module. I've added the code I used to check this .
import numpy as np N = 30000200 # Number of datapoints def using_bottleneck_mean(data,width): def using_pandas_rolling_mean(data,width): def using_xarray_mean(data,width): start=time.time()
|
@RayPalmerTech thanks for trying it out, and for the reproducible example! We will definitely add something like it to our benchmark suite. Indeed this is way slower than it should be. With a simple fix (see pydata/xarray#1993 for details) I get much more reasonable performance: Bottleneck: 0.06775331497192383 s |
@RayPalmerTech if you don't need the output to be the same length as the input you can slice into the output, something like |
A workaround is to pad the array with NaNs in the filtering axis on the right side and remove the excess elements on the left side to emulate symmetric filtering. This works when no NaNs are expected in the original array.
And here the code I used to generate the results: import time
import numpy as np
from scipy.signal import medfilt2d
import bottleneck as bk
def _medfilt1_np(array, size):
lsize = size // 2
rsize = size - lsize
return np.stack([np.median(array[:, max(i - lsize, 0):i + rsize], axis=1)
for i in range(array.shape[1])]).T
def _medfilt1_bk(array, size):
rsize = size - size // 2 - 1
array = np.pad(array, pad_width=((0, 0), (0, rsize)),
mode='constant', constant_values=np.nan)
return bk.move_median(array, size, min_count=1, axis=1)[:, rsize:]
test = np.random.randn(10000, 200)
start = time.time()
res = bk.move_median(test, 15, min_count=1, axis=1)
print('Bottleneck: {:.4}s'.format(time.time() - start))
start = time.time()
res = _medfilt1_bk(test, 15)
print('Bottleneck (pad): {:.4}s'.format(time.time() - start))
start = time.time()
res = medfilt2d(test, (1, 15))
print('Scipy: {:.4}s'.format(time.time() - start))
start = time.time()
res = _medfilt1_np(test, 15)
print('Numpy: {:.4}s'.format(time.time() - start)) |
Hi Mr Goodman, I am relative new and inexperienced with python an came across your bottleneck Documentation, Release 1.4.0.dev0+102.g933b653 whilst searching for a fast moving average routine. Is it OK for people to use your code freely. Regards |
Dear Devs,
I have used your module in the past with great success in speeding up my code. However, I wonder if it is possible to implement or add perhaps a Boolean to the moving mean function that lets the user specify whether he wants a centered average or not, similar to:
pandas.DataFrame.rolling(window=width,center=True).mean()
Currently I am still using pandas for central moving averages but it is significantly slower than Bottlenecks functions unfortunately.
The text was updated successfully, but these errors were encountered: