[5pt] WIP PR: Logging and Error handling upgrades#1731
[5pt] WIP PR: Logging and Error handling upgrades#1731RobHanna-NOAA wants to merge 36 commits intodevfrom
Conversation
…into dev-logging
…checked some files
|
This is what was written in calibrate_rating_curves:44 calibrate_rating_curves:149 fim_pipeline:117 fim_pipeline:132 fim_post_processing:42 fim_post_processing:137 fim_process_huc:140 fim_process_huc:203 fim_process_huc:253 fim_process_huc:257 process_branch:20 process_branch:81 run_by_branch:50 run_huc:309 run_huc:330 run_huc:361 |
|
huh..... I know about the whole "pipfile...." part and don't know how to fix that. I was going to ignore that part for now. I am surprised about some of your other problems. I do expect some duplication in errors but somethign seems weird here. |
|
Maybe I checked in a last minute bug |
|
dang.. I see my bug. I will get back to you. last minute bug added in my testing |
|
Update: I just talked to Ali.
Getting these changes all in are critical for the BED. Well.. . clarification. The Pipeline parts have to be working by this Saturday's BED. If we get cornered on time, we can merge in the WIP changes from process_rerun_calibration_huc.sh and rerun_calibration.py and make a second branch / and PR after the BED to finish getting those working correctly. |
|
For rerun calibration code, I fixed three minor bugs. In addition, implemented three design improvements as shown in this commit: 1- Ensure error scanning runs on failure 2- Wait for background grep jobs to complete 3- Stream subprocess output in real time |
…into dev-logging
|
Here's the result of re-running the previous test of adding No
Incorrect status error for These appear to be correct |
…into dev-logging
Items fixed or adjusted:
Expected results for logging:
fim_pipeline.sh: Depending on line of failure, there may be zero hucs, no post-processing attempts, a missing all_errors.log file. etc
FIM_process_huc.sh
src/process_rerun_calibration_huc.sh is a new file that operates in the same basic functionality as process_huc.sh and process_branch.sh which have "tee" commands and error handling. However, process_rerun_calibration.py is not part of the fim_pipeline.sh chain and will never be called when in pipeline mode. That file is only used in re-run mode which starts with rerun_calibraitonn.py which calls only this new process_rerun_calibration.sh file which handles trapping, logging and errors for calibrate_rating_curves.sh which is used in both pipeline and re-run mode.
fim_post_processing.sh has been upgraded to add a "trap" system for error handling with code directly inside the fim_post_processing.sh file. It will continue to do aggregation of all "error" files from each HUC folder, and will do a scan of all branch log files in each HUC directly looking for invalid exit codes. All of the data in the /logs/branch_non_zero_exit_codes.log file. This is the same functionality as before and only helps show branch error code primarily aiming showing if a large number of branch special exit codes such as 62-64 exit. All info in this branch_non_zero_exit_codes.log continues to also exists in the all_errors.log file and is expected to be duplicate information.
A new change in this PR, is the removal of a previous folder named /branch_errors/ which had the same list of branch non zero exit codes, which is now moved into the /logs/ folder. Prior to this release, in the /branch_errors/ folder included a copy of the branch log that included an non zero file. Those files are no longer copied as they can easily be traced via the branch_non_zero_exit_codes.log and the all_errors.log files.
Use of the l_echo tool should reallly only be used by the three "process..sh" files, fim_pipeline or fim_post_processing. Use of that command by any other child .sh file can create log info that is very out of order due to the nature of the "tee" commands and store up echos and prints to write out at the end of that processing block.
Fixes to be done in future PR's:
More info to be added here.
Additions
Changes
Removals
Testing
Generally, you do not copy this part into the ChangeLog. These are some quick notes on what you did test and/or notes for the reviewer to help with their review testing.
Deployment Plan (For FIM developers use)
Does the change impact inputs, docker or python packages?
If you are not a FIM dev team member: Please let us know what you need and we can help with it.
If you are a FIM Dev team member:
Please work with the DevOps team and do not just go ahead and do it without some co-ordination.
Copy where you can, assign where you can not, and it is your responsibility to ensure it is done. Please ensure it is completed before the PR is merged.
Has new or updated python packages, PipFile, Pipefile.lock or Dockerfile changes? DevOps can help or take care of it if you want. Just need to know if it is required.
Require new or adjusted data inputs? Does it have a way to version (folder or file dates)?
Please use caution in removing older version unless it is at least two versions ago. Confirm with DevOps if cleanup might be involved.
If new or updated data sets, has the FIM code, including running fim_pipeline.sh, been updated and tested with the new/adjusted data? You can dev test against subsets if you like.
Notes to DevOps Team or others:
Please add any notes that are helpful for us to make sure it is all done correctly. Do not put actual server names or full true paths, just shortcut paths like 'efs..../inputs/, or 'dev1....inputs', etc.
Issuer Checklist (For developer use)
You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.
[_pt] PR: <description>devbranch (the default branch), you have a descriptive Feature Branch name using the format:dev-<description-of-change>(e.g.dev-revise-levee-masking)devbranchpre-commithooks were run locally4.x.x.xMerge Checklist (For Technical Lead use only)