Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine prs and nat grib files. #126

Merged
merged 6 commits into from
Nov 8, 2021
Merged

Combine prs and nat grib files. #126

merged 6 commits into from
Nov 8, 2021

Conversation

christinaholtNOAA
Copy link
Collaborator

This PR adds options to concatenate grib files when multiple paths and file templates are provided. When combined, the resulting file will also remove all duplicated entries. The script self-cleans those combined files upon exit.

This work also addresses what should happen when a forecast hour is not found. If a grib input file is not found within the waiting period, all accumulated variables will be removed from the subsequent forecast hours, and all other images should be generated.

The feature that allows accumulated variables to be plotted a single forecast lead time (an -all_leads flag with -f 6, for example), should now work as expected. If combining files in this mode, all files must be present. A real-time feature to wait on the file has been disabled since this this is not how real-time cases are run.

Should address Issues #94, #111, #85, and #84.

Copy link
Collaborator Author

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be able to see the diffs in this one with the "split" view in GitHub.

if fcst_hour <= first_fcst:
filenames['01fcst'].append(filename)
else:
filenames['free_fcst'].append(filename)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is now only handling one forecast hour at a time. The loop should take place in the main loop for handling missing files appropriately.

cla.images[1][variable].remove(level)
if not cla.images[1][variable]:
del cla.images[1][variable]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel slightly icky about this implementation where I'm changing input arguments explicitly. I may want to clean that up more in a subsequent PR. I think it works okay for now. Happy to hear feedback.

print(f'Removing combined files: ')
for file_path in combined_files:
print(f' {file_path}')
os.remove(file_path)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this needs to be done in a couple of places, using a function.

print(f"{tile_zip_file} found and removed")
os.makedirs(tile_zip_dir, exist_ok=True)
zipfiles[tile] = tile_zip_file
return zipfiles
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just moved from the driver function after complaints from the linter.

args=(png_files, zipf),
)
zip_proc.start()
zip_proc.join()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also moved due to linter complaints.

print(f'{LOG_BREAK}\n',
f'Graphics will be created for input file: {grib_path}\n',
f'Output graphics directory: {workdir} \n'
f'{LOG_BREAK}')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this syntax because linter was complaining about "too many statements"

@@ -32,6 +33,11 @@

AIRPORTS = 'static/Airports_locs.txt'

COMBINED_FN = 'combined_{fhr:03d}.grib2'
TMP_FN = 'combined_{fhr:03d}.tmp.grib2'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using parameters here because multiple methods need to know how these temporary files are named.

Copy link
Collaborator

@cshartsough cshartsough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem straightforward, I didn't see any problems. One minor check:
line 126 has first_fcst = 6 if cla.images[0] == 'global' else 1,
line 762 has first_fcst = 6 if cla.images[0] == 'global' else 0,

I assume that's correct but just making sure.

Great job, as always.

@christinaholtNOAA
Copy link
Collaborator Author

Great question!

The first instance (first_fcst = 6 if cla.images[0] == 'global' else 1) is using the following logic, and we want to catch 0, 1, and 6 for the relevant cases:

if fcst_hour <= first_fcst:
        filenames['01fcst'].append(filename)

And then the second instance (first_fcst = 6 if cla.images[0] == 'global' else 0) has this logic which should start the range from either 0 or 6 depending on the model:

for fhr in range(first_fcst, int(cla.fcst_hour[0]), fcst_inc):

I think that's right...

Copy link
Collaborator

@Brian-Jamison Brian-Jamison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! A fairly heavy lift, thanks for taking it on. Everything looks good. One thing to note is the typical schedule of the data. For a HRRR run, the nat files start coming in about 1:03 hrs after the init time, and come in chunks every 10 minutes. The prs files start coming in about 53 minutes after the init time, and new forecasts come in about every 2-3 minutes, but some can straggle and come in several minutes later. So some adjustment may be needed for the wait times when combining prs and nat files.

@christinaholtNOAA
Copy link
Collaborator Author

@Brian-Jamison That's a great point! I think we will really need to consider how we want to wait and submit these jobs. The code will wait for a while to make sure the real-time files are available, but maybe we want to delay the start of the runs until nat files start showing up.

@christinaholtNOAA christinaholtNOAA merged commit d9925f6 into main Nov 8, 2021
@christinaholtNOAA christinaholtNOAA deleted the combine_grib branch November 8, 2021 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants