Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to implement a Centered moving average? #186

Open
CP-Vub opened this issue Mar 15, 2018 · 9 comments
Open

Possible to implement a Centered moving average? #186

CP-Vub opened this issue Mar 15, 2018 · 9 comments

Comments

@CP-Vub
Copy link

CP-Vub commented Mar 15, 2018

Dear Devs,
I have used your module in the past with great success in speeding up my code. However, I wonder if it is possible to implement or add perhaps a Boolean to the moving mean function that lets the user specify whether he wants a centered average or not, similar to:
pandas.DataFrame.rolling(window=width,center=True).mean()
Currently I am still using pandas for central moving averages but it is significantly slower than Bottlenecks functions unfortunately.

@kwgoodman
Copy link
Collaborator

I am happy to hear that bottleneck has been useful. You made my day.

I doubt I will be making your day because I am going to suggest a workaround which might not work for you. Could you shift the output to get what you want?

@shoyer
Copy link
Member

shoyer commented Mar 15, 2018

We implemented the shifting solution in xarray to do centered moving averages on top of bottleneck. Try xarray.DataArray.rolling(..., center=True).mean()

@kwgoodman
Copy link
Collaborator

Oh! Nice.

@shoyer
Copy link
Member

shoyer commented Mar 15, 2018

@CP-Vub
Copy link
Author

CP-Vub commented Mar 15, 2018

@kwgoodman Well bottleneck has reduced my algorithm's calculation times by quite a significant factor (approx an order of magnitude). For my current application, I'm calculating a moving mean over a couple of billion values, and even though pandas is not slow, bottleneck still manages to save me quite some time. So I can honestly say that you made my day when I discovered your module!

@shoyer Thank you for the suggestion, now I know about the existence of xarray as well! I also tried out the suggested method and compared it to bottleneck and pandas using some quickly written example code. However, I have the feeling the bottleneck module is not being recognized by xarray, since the computation times are way higher than for the bottleneck module. I've added the code I used to check this .

  • Code:

import numpy as np
import pandas as pd
import time
import bottleneck as bn
import xarray
import matplotlib.pyplot as plt

N = 30000200 # Number of datapoints
Fs = 30000 # sample rate
T=1/Fs # sample period
duration = N/Fs # duration in s
t = np.arange(0,duration,T) # time vector
DATA = np.random.randn(N,)+5
np.sin(2np.pi0.01t) # Example noisy sine data and window size
w = 3
30000

def using_bottleneck_mean(data,width):
return bn.move_mean(a=data,window=width,min_count = 1)

def using_pandas_rolling_mean(data,width):
return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel()

def using_xarray_mean(data,width):
return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean()

start=time.time()
A = using_bottleneck_mean(DATA,w)
print('Bottleneck: ', time.time()-start, 's')
start=time.time()
B = using_pandas_rolling_mean(DATA,w)
print('Pandas: ',time.time()-start,'s')
start=time.time()
C = using_xarray_mean(DATA,w)
print('Xarray: ',time.time()-start,'s')

  • Output for a single iteration, time taken:
    Bottleneck: 0.033s
    Pandas: 0.255s
    Xarray: 5.217s

@shoyer
Copy link
Member

shoyer commented Mar 15, 2018

@RayPalmerTech thanks for trying it out, and for the reproducible example! We will definitely add something like it to our benchmark suite.

Indeed this is way slower than it should be. With a simple fix (see pydata/xarray#1993 for details) I get much more reasonable performance:

Bottleneck: 0.06775331497192383 s
Pandas: 0.48262882232666016 s
Xarray: 0.1723031997680664 s

@kwgoodman
Copy link
Collaborator

@RayPalmerTech if you don't need the output to be the same length as the input you can slice into the output, something like a[half_window:-half_window]. Slicing is very fast as there are no copies made.

@Banus
Copy link

Banus commented Jun 5, 2020

A workaround is to pad the array with NaNs in the filtering axis on the right side and remove the excess elements on the left side to emulate symmetric filtering. This works when no NaNs are expected in the original array.
There is a ~30% overhead for it, but it's still blazingly fast compared to numpy or even scipy's medfilt2d and I'm able to replicate Matlab's medfilt1 with the 'truncate' option (only 'numpy' and 'bottleneck pad' give the same result). Here the times for a 10000 x 200 array and move_median with size 15:

Bottleneck: 0.106s
Bottleneck (pad): 0.134s
Scipy: 0.472s
Numpy: 1.033s

And here the code I used to generate the results:

import time
import numpy as np
from scipy.signal import medfilt2d
import bottleneck as bk

def _medfilt1_np(array, size):
    lsize = size // 2
    rsize = size - lsize
    return np.stack([np.median(array[:, max(i - lsize, 0):i + rsize], axis=1)
                     for i in range(array.shape[1])]).T

def _medfilt1_bk(array, size):
    rsize = size - size // 2 - 1
    array = np.pad(array, pad_width=((0, 0), (0, rsize)),
                   mode='constant', constant_values=np.nan)
    return bk.move_median(array, size, min_count=1, axis=1)[:, rsize:]

test = np.random.randn(10000, 200)

start = time.time()
res = bk.move_median(test, 15, min_count=1, axis=1)
print('Bottleneck: {:.4}s'.format(time.time() - start))

start = time.time()
res = _medfilt1_bk(test, 15)
print('Bottleneck (pad): {:.4}s'.format(time.time() - start))

start = time.time()
res = medfilt2d(test, (1, 15))
print('Scipy: {:.4}s'.format(time.time() - start))

start = time.time()
res = _medfilt1_np(test, 15)
print('Numpy: {:.4}s'.format(time.time() - start))

@CharalapML
Copy link

Hi Mr Goodman,

I am relative new and inexperienced with python an came across your bottleneck Documentation, Release 1.4.0.dev0+102.g933b653 whilst searching for a fast moving average routine.

Is it OK for people to use your code freely.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants