-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update internals to vectorised SciPy #94
Conversation
Note to @cjhopp and @d-chambers this is not yet ready for a full review, but @cjhopp may want to test this on PAN to see if this gets around the memory issues. So far I have run the tests for synthetic data, but not for real data (haven't had the time at JpGU), so wait for CI etc. |
Fingers crossed. I'll have a look and see in the next day or so. |
ci fails look like install issues, but tests don't run at the moment on other machines - running into the bottleneck issue here: bottleneck: 164 |
Crashes when installing unnecessary packages...
eqcorrscan/core/match_filter.py
Outdated
templates.std(axis=-1, keepdims=True) * template_length)) | ||
norm_sum = norm.sum(axis=-1, keepdims=True) | ||
stream_fft = np.fft.rfft(stream, fftshape) | ||
template_fft = np.fft.rfft(np.flip(norm, axis=-1), fftshape, axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy's flip was added in version 1.12.0 so we need to make sure and bump the version in the setup.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obspy 1.0.3 does not play nice on Windows, but the current master does, I'm going to keep the appveyor running the osbpy master and travis running the current release until the next obspy release when appveyor should revert to the current obspy release. I think it comes down to their pinning of matplotlib...?
eqcorrscan/core/match_filter.py
Outdated
# cccsums = dask.delayed(np.sum)(xcorrs, axis=0) | ||
# if compute: | ||
# cccsums.compute() | ||
if cores is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to abstract the multiprocessing further up? I know several functions use something similar so maybe we could make a generic pool interface on the module level, then we could have persistent processes/threads in the pool so we wouldn't need to spin them up every time.
warnings.warn('The expected result was not achieved, ' + | ||
'but it has the same shape') | ||
|
||
def test_perfect_template_loop(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tested in the normxcorr2 tests as _template_loop has been removed.
@@ -368,6 +323,19 @@ def test_detection_extraction(self): | |||
self.assertEqual(len(detections), 4) | |||
self.assertEqual(len(detection_streams), len(detections)) | |||
|
|||
def test_normxcorr(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These data have large variations in amplitude late in the continuous data, which leads to floating point error accumulation when using float (32-Bit) in the C routine - overcome by using double (64-Bit float) internally within the C-routine.
@@ -788,14 +763,14 @@ def test_day_long_methods(self): | |||
# Aftershock sequence, with 1Hz data, lots of good correlations = high | |||
# MAD! | |||
day_party = daylong_tribe.detect( | |||
stream=st, threshold=4.5, threshold_type='MAD', trig_int=6.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This threshold was too low to produce meaningful results.
@@ -17,12 +18,19 @@ | |||
from eqcorrscan.core.match_filter import read_detections | |||
|
|||
|
|||
slow = pytest.mark.skipif( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that we don't always have to run the tutorial tests - they rely on long data downloads, which appveyor is not so great at handling.
A quick note: tests are passing (despite the apparent fail on appveyor, something is wrong with the interface there), I'm going to work on upping test coverage, but could do with a review. A lot of the changes in here are extra little patches, the main things that need looking at are:
A further note, I have deliberately not written the c functions with python bindings, because I would like them to be employed by other programs if wanted (e.g. matlab or other C funcs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me. Two minor docstring things.
This PR represents a TON of work and you deserve a hearty THANK YOU, @calum-chamberlain. Applause.
README.md
Outdated
source. The user should follow the instructions above for OpenCV install. | ||
|
||
We have also added subspace detection and correlation derived pick adjustment. | ||
and writing seismic data, as well as subspace detection, brightness source-scanning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma at end of line
eqcorrscan/utils/correlate.py
Outdated
A single Stream object to be correlated with the templates. | ||
:type cores: int | ||
:param cores: | ||
Number of processed to use, if set to None, and dask==False, no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processes
Merged. Will work on openMP loop implementation (for SLURM applications) elsewhere. |
This PR is a work in progress update to the internals of match-filter. This PR will:
To be done:
- [ ] Implement multi-threaded fft? (pyfftw)EDIT:
This PR has evolved into a complete overhaul of the internals of EQcorrscan - Scipy was tested and found to be memory inefficient when run in parallel, also memory profiler issues with SLURM have forced a desire to move away from python multiprocessing.
All this has led to the development of C internals using FFTW and openMP.