Skip to content

[1pt] Debugging changes from full-scale flow-based and stage-based CatFIM runs#1786

Open
EmilyDeardorff wants to merge 16 commits intodevfrom
dev-catfim-thresh-bug-fixes
Open

[1pt] Debugging changes from full-scale flow-based and stage-based CatFIM runs#1786
EmilyDeardorff wants to merge 16 commits intodevfrom
dev-catfim-thresh-bug-fixes

Conversation

@EmilyDeardorff
Copy link
Copy Markdown
Contributor

@EmilyDeardorff EmilyDeardorff commented Mar 17, 2026

This PR contains fixes to bugs that arose during the March 2026 full-scale CatFIM runs for the FIM 6.1 release.

The following bugs were addressed:

  • Incomplete HUC Dictionary Bug: The get_huc_dictionary() function was not creating a complete HUC list and was therefore not pulling all of the required site thresholds from WRDS.

    Issue was resolved by implementing the aggregate_wbd_hucs() function, which uses the WBD geopackage to identify the proper HUC for each site.

  • Incomplete NRLDB Data Bug: We recently switched from prioritizing USGS site data to prioritizing NRLDB site data when both are available on WRDS. Unfortunately, a LOT of sites have incomplete NRLDB data or incorrect datum information.

    Implemented some logic into __adjust_datum_ft() to apply corrections for mistyped horizontal and vertical datums (when the intended datum is obvious enough). Implemented the check_metadata_CRS_availability(), to check whether sites have CRS available for one or both sources and updated get_thresholds() to use the optional source_crs_availability list to determine which data source (USGS or NRLDB) to use based on site metadata availability.

  • Status = None Bug: Entire CatFIM run was being aborted in update_sites_mapping_status() when a mapped site had no value in the status_val object.

    Updated the logic so the code would process a value of None the same as a value of "".

Changes

  • data/wrds/download_process_wrds.py:
    • Replaced get_huc_dictionary() function with the aggregate_wbd_hucs() function so it gets the HUCs for each site using geospatial overlay rather than relying on WRDS to have the right HUC information for each site (because that isn't always the case).
    • Added site source dictionary as an input to download_all_thresholds() so it can download data based on which one has metadata available.
    • Created a new function, check_metadata_CRS_availability(), to check whether sites have CRS available for one or both sources. This function creates the lid_source_dict dictionary.
  • data/nws/preprocess_ahps_nws.py: Updated get_thresholds() to have the proper placeholder for the source_crs_availability input.
  • data/usgs/get_usgs_rating_curves.py: Updated get_thresholds() to have the proper placeholder for the source_crs_availability input.
  • data/usgs/preprocess_ahps_usgs.py: Updated get_thresholds() to have the proper placeholder for the source_crs_availability input.
  • tools/catfim/generate_categorical_fim.py:
    • Replaced get_huc_dictionary() function with the `aggregate_wbd_hucs() function
    • Added site source dictionary as an input to download_all_thresholds()
    • Fixed bug in update_sites_mapping_status() so it no longer aborts the entire CatFIM run when a status_val for a mapped site = None.
    • Updated __adjust_datum_ft() to apply corrections for mistyped horizontal and vertical datums.
  • tools/catfim/generate_categorical_fim_flows.py: Removed get_thresholds() and get_metadata() from the inputs because they are not used in this script.
  • tools/tools_shared_functions.py: Updated get_thresholds() to use the optional source_crs_availability list to determine which data source (USGS or NRLDB) to use based on site metadata availability.

Testing

Tested the new get_thresholds() functionality (in tools/tools_shared_functions.py) multiple times, both individually and in data/wrds/download_process_wrds.py and tools/catfim/generate_categorical_fim.py, and it performed as expected.

The updates to tools/catfim/generate_categorical_fim.py were also tested in several small scale and full-scale flow- and stage-based CatFIM runs and they ran as expected.

Tested the get_thresholds() function with the placeholder of None for the source_crs_availability input and it worked as expected (no change in previous functionality). This means that this code change should be fine in the scripts where it was implemented (data/nws/preprocess_ahps_nws.py, data/usgs/get_usgs_rating_curves.py, and data/usgs/preprocess_ahps_usgs.py).

Tested the individual run of data/wrds/download_process_wrds.py with the new changes and it also performed as expected.


Deployment Plan (For FIM developers use)

  • Does the change impact inputs, docker or python packages?

    • Yes
    • No (f no.. skip the rest of the Deployment Plan section)
  • If you are not a FIM dev team member: Please let us know what you need and we can help with it.

  • If you are a FIM Dev team member:

    • Please work with the DevOps team and do not just go ahead and do it without some co-ordination.

    • Copy where you can, assign where you can not, and it is your responsibility to ensure it is done. Please ensure it is completed before the PR is merged.

    • Has new or updated python packages, PipFile, Pipefile.lock or Dockerfile changes? DevOps can help or take care of it if you want. Just need to know if it is required.

      • Yes
      • No
    • Require new or adjusted data inputs? Does it have a way to version (folder or file dates)?

      • No
      • Yes
        • Require new pre-clip set or any other data reloads, such as DEMS, osm, etc. ie.. pre-requisite re-data upstream of your input changes.
          • Yes
          • No
        • Has the inputs been copied/exist in all four enviros:
          • FIM EFS
          • FIM S3
          • ESIP
          • Dev1
  • Please use caution in removing older version unless it is at least two versions ago. Confirm with DevOps if cleanup might be involved.

  • If new or updated data sets, has the FIM code, including running fim_pipeline.sh, been updated and tested with the new/adjusted data? You can dev test against subsets if you like.

    • Yes

Notes to DevOps Team or others:

Please add any notes that are helpful for us to make sure it is all done correctly. Do not put actual server names or full true paths, just shortcut paths like 'efs..../inputs/, or 'dev1....inputs', etc.


Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead
  • Where applicable, has fim_pipeline been tested with muliple HUCs, including some other unaffected HUCs?

Reviewer / Approver Checklist

  • Where applicable, has fim_pipeline been tested with muliple HUCs, including some other unaffected HUCs?
  • If there are new inputs, have you confirmed that they have been copied to all enviroments?

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@EmilyDeardorff EmilyDeardorff self-assigned this Mar 17, 2026
@EmilyDeardorff EmilyDeardorff added bug Something isn't working CatFIM NWS Flood Categorical HAND FIM labels Mar 17, 2026
@EmilyDeardorff EmilyDeardorff changed the title WIP [1pt] Debugging changes from full-scale flow-based and stage-based CatFIM runs [1pt] Debugging changes from full-scale flow-based and stage-based CatFIM runs Mar 18, 2026
@EmilyDeardorff EmilyDeardorff marked this pull request as ready for review March 19, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CatFIM NWS Flood Categorical HAND FIM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants