-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: #54 with fixed merge conflicts. #59
base: tdv-PR44-fixed-merge-conflict
Are you sure you want to change the base?
Draft: #54 with fixed merge conflicts. #59
Conversation
…ge-latest Accept all current changes from bmiszalski:bm_work as it appears it was already up to date.
Fixed twilight flats correction, wire solution in red arm and Python version to 3.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just posting what I have so far, as I'm off work for the rest of the day.
I'll pick up the review again on Monday. Let me know if you have any questions or comments.
"scipy==1.9.1", | ||
"scipy>=1.9.1", | ||
"numpy", | ||
"matplotlib", | ||
"photutils==1.8.0", | ||
"photutils>=1.8.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ran into an issue with a later version of scipy at one point: https://github.com/PyWiFeS/pipeline/pull/43/files#r1527907039
Should've added an issue, and it needs steps to reproduce the error before we can investigate it. So for now I think it's ok to proceed with this, but we should retest on @felipeji's setup before we merge.
lazy_results = pool.imap(_unwrap_and_run, tasks, chunksize=chunksize) | ||
# chunksize is not strictly necessary; can break some pools | ||
lazy_results = pool.imap(_unwrap_and_run, tasks) # , chunksize=chunksize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general rule, best to delete code rather than leave it commented out. Otherwise you wind up with a lot of outdated code sitting around 'just-in-case' someone wants to debug or reuse it later.
I'd lean towards leaving chunksize in, but defaulting it to 1 (The default value according to https://docs.python.org/release/2.6.6/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.imap), and moving the calculation to its own 'public' but unused function, with a doc string describing its purpose:
def get_chunksize_for_single_batch(tasks, max_processes):
"""
Get the `chunksize` to be used such that `tasks` are divided evenly
between processes, allowing them to be run in a single batch.
For large numbers of tasks, consider setting a smaller chunksize.
Note that chunksize is not strictly necessary, and it can break some pools
"""
num_processes = _get_num_processes(max_processes)
return math.ceil(len(tasks) / num_processes)
def map_tasks(tasks, max_processes=-1, chunksize=1):
"""
Run the `tasks`, divided between up to `max_processes` processes in chunks of `chunksize`.
Each task should follow the pattern in `get_task`, storing the function, args and kwargs to run.
The results will be returned in order.
"""
num_processes = _get_num_processes(max_processes)
results = []
with multiprocessing.Pool(num_processes) as pool:
# `lazy_results` must be enumerated within the pool's scope or the threads will not complete.
lazy_results = pool.imap(_unwrap_and_run, tasks, chunksize=chunksize)
results = list(lazy_results)
return results
def run_slice(packaged_args): | ||
""" | ||
A function to be used by multiprocessing in derive_wifes_optical_wave_solution to derive the wifes optical wave solution for each slice `s`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a function isn't intended to be used outside the file, best to mark it as private with a leading underscore:
def _run_slice(packaged_args):
task = get_task(_run_slice,
You could then simplify the docstring a little:
"""
Derive the wifes optical wave solution for a single slice `s` in a single thread.
"""
(s,inimg,arc_name,ref_arclines,ref_arcline_file,dlam_cut_start,flux_threshold_nsig,find_method,shift_method,exclude_from,exclude,epsilon,doalphapfit,automatic,verbose,decimate,sigma,alphapfile,doplot,savefigs,save_prefix) = packaged_args | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These args should probably be split over the line, but currently the file as a whole hasn't been formatted with Black.
So after this MR, we should follow-up by running black on this file.
For review, I've done a quick find replace to split the line:
(s,
inimg,
arc_name,
ref_arclines,
ref_arcline_file,
dlam_cut_start,
flux_threshold_nsig,
find_method,
shift_method,
exclude_from,
exclude,
epsilon,
doalphapfit,
automatic,
verbose,
decimate,
sigma,
alphapfile,
doplot,
savefigs,
save_prefix) = packaged_args
VSCode highlights a number of those args as unused, which means some unnecessary memory copying to the threads. Might be worth trimming down.
Unused args:
exclude_from,
exclude,
epsilon,
doalphapfit,
automatic,
decimate,
sigma,
alphapfile,
savefigs,
save_prefix
# since we are using multiprocessing over each slice, no need to use multiprocessing for each call to find_lines_and_guess_refs | ||
multithread=False | ||
max_processes=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally best to declare new variables as late as possible, to keep them grouped with where they're used.
Also, max_processes shouldn't be necessary to set. It's unused if multithread=False.
flux_threshold_nsig=flux_threshold_nsig, | ||
deriv_threshold_nsig=1.0, | ||
multithread=multithread, | ||
max_processes=max_processes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the comment on line 1522-1524 (can't link until the review is posted), I'd just set
multithread=False,
plot=step1plot
Skipping max_processes, and just setting multithread directly.
You could add a comment above that parameter just in case someone looks at it and forgets the context that run_slice
is singlethreaded.
# save some metadata to the return_dict. since this is always from the f[1] extension, a key without the slice `s` is sufficient | ||
return_dict['grating'] = grating | ||
return_dict['bin_x'] = bin_x | ||
return_dict['bin_y'] = bin_y | ||
return_dict['dateobs'] = dateobs | ||
return_dict['tdk'] = tdk | ||
return_dict['pmb'] = pmb | ||
return_dict['rh'] = rh | ||
return_dict['rma'] = rma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wound up with a few points in the one comment:
1:
Could declare & initialize the dictionary in one go here:
return_dict = {
'grating': grating,
etc.
}
2:
But if these values aren't changing, I'd say it's best to add everything at the end of the function instead.
In part, I think it's easiest to use the function if its return value is all defined in one place, so you can see all the headers.
3:
Might even be better to return a tuple instead of a dictionary:
return (grating, bin_x, bin_y, dateobs, etc.
If any fields aren't clear from the value, you could make them variables first:
slice_rk = new_r
return (..... slice_rk, slice_bla, etc.)
It'd save derive_wifes_optical_wave_solution from having to parse the key names, but it'd need to be careful with the positioning of its args:
(grating, bin_x, ..., slice_rk, slice_bla, ...) = run_slice(packaged_args)
Neither option is really ideal, but it's a private function, so it doesn't need to be perfectly structured. But given we don't currently have unit tests, I think a tuple is the most likely to (correctly) crash out if the private function was changed and the caller wasn't.
Dear Tim, |
See #57
This PR is #57, but I:
tdv-PR44-fixed-merge-conflict
were actually one commit behind automation.tdv-PR44-fixed-merge-conflict
tdv-PR44-fixed-merge-conflict
, resolving conflicts by accepting current.I'll use it to leave some review comments, the original branch
bmiszalski:bm_work
can be updated from automation (Hopefully more cleanly), then this PR can be linked from the other PR and closed.