Skip to content

[15pt] Incorporate buildings into FIM#1777

Open
AliForghani-NOAA wants to merge 16 commits intodevfrom
dev-buildings
Open

[15pt] Incorporate buildings into FIM#1777
AliForghani-NOAA wants to merge 16 commits intodevfrom
dev-buildings

Conversation

@AliForghani-NOAA
Copy link
Copy Markdown
Collaborator

@AliForghani-NOAA AliForghani-NOAA commented Mar 3, 2026

This PR closes the issue #1739 and includes the following enhancements to address buildings Fimpacts:

  • Ingests FEMA buildings as a new input data for FIM.

  • Derives the threshold discharge required for buildings inundation. To achieve this, the minimum non-zero HAND value within each building is extracted as the inundation threshold stage. The corresponding threshold discharge values are then interpolated from the HydroTables.

  • Enhances tools/fimpacts_inundation.py (formerly road_inundation.py) to identify inundated buildings and calculate corresponding flood depths for specific events.

In addition to introducing building pre-clipping in the data/wbd/generate_pre_clip_fim_huc8.py script, this PR refactors the interface from --copy_* arguments (e.g., --copy_osm_roads) to direct layer arguments for preclipping (e.g., --osm_roads). Listed layers are pre-clipped, while unlisted layers are copied, simplifying the interface and making layer selection more intuitive.

The updated pre-clipped dataset with new FEMA buildings data has been prepared here: inputs/pre_clip_huc8/20260306/.

In-Depth Workflow Explanation

  1. data/buildings/get_fema_buildings.py (new script)
    This script downloads FEMA’s latest per-state building structure geodatabases from the official USA Structures page. It then converts the gdb files to GeoParquet format using the appropriate CRS for each region (CONUS, Alaska, Guam, and American Samoa). The script supports preparing data for specific states only if desired.

  2. data/buildings/make_buildings_parts_per_huc.py (new script)
    This script splits state-level building parquet datasets into HUC8-based parquet “parts”, keeping only the following building attributes ["UUID", "HEIGHT", "OCC_CLS", "SOURCE", "VAL_METHOD"] plus geometry. It processes a mixed sequence of parquet row groups in parallel (taking row groups from different states in turn, instead of finishing one state at a time), and uses a bounding-box prefilter to efficiently identify which HUCs intersect each row group before running the spatial join. Outputs are written as per-HUC8 folders (for example, huc8_XXXXXXXX/STATE_rg001.parquet), and it can optionally run for only selected states.

  3. src/process_buildings_fimpact.py (new script)
    This script is run for each branch of an HUC using three inputs:

    • Buildings polygons
    • HAND raster
    • HAND-generated HydroIDs gpkg

    A single building segment may intersect multiple HydroIDs. To account for this, the script splits building segments at HydroID boundaries and calculates the minimum HAND value (excluding zeros) within each segment to serve as the inundation threshold.

    Three new columns are added to the building dataset: threshold_hand, HydroID, and feature_id. The results are saved as buildings_fimpact_***.csv for each branch, where *** represents the branch number. Each CSV file contains one record per UUID, which is the unique identifier for each building segment, within each HydroID, providing the minimum HAND value for that combination.

  4. src/aggregate_by_huc.py (updated script)
    For each branch, the script retrieves the discharge value corresponding to each threshold_hand from the branch’s HydroTable (per HydroID) and assigns it as threshold_discharge. Any record with a threshold_hand value greater than 25m (the maximum stage listed in the HydroTables) is removed entirely. The outputs from all branches are combined into a single file: buildings_fimpact.csv.

  5. tools/fimpacts_inundation.py (formerly called ‎tools/road_inundation.py
    This tool now takes three inputs:

    • A FIM run directory (which includes buildings_fimpact.csv in addition to osm_roads_fimpact.csv file), and
    • A flow file.
    • A new flag to indicate whether the script should process buildings or roads

    The script identifies buildings segments where the given flow (referred to as evaluated_discharge) exceeds the threshold discharge and flags them as inundated. It also looks up the stage corresponding to the evaluated_discharge (and call it evaluated_stage) and subtracts the evaluated_stage from the threshold_hand value to calculate the flood_depth.

    Records with negative flood depth are currently removed, as these may result from non-monotonic synthetic rating curves—most commonly observed in branch zero.

    Note that a single building segment may have multiple inundation records, originating from different branches or intersecting multiple HydroIDs. The code retains only the record with the maximum flood depth for each building segment.

    The figure below displays the output of the fimpacts_inundation.py tool with inundated buildings (with their flood depth) and non-inundated buildings (gray) overlaid on a FIM raster. Both results were generated from a common 50-year recurrence interval flow file for HUC 11070103.

image

Additions

  • data/buildings/get_fema_buildings.py
  • data/buildings/make_buildings_parts_per_huc.py
  • src/process_buildings_fimpact.py

Changes

  • Renamed tools/road_inundation.py to tools/fimpacts_inundation.py and extended the script to support building inundation processing in addition to roads.
  • data/wbd/clip_vectors_to_wbd.py -> Updated to enable pre-clipping of buildings dataset
  • data/wbd/generate_pre_clip_fim_huc8.py -> Updated to enable pre-clipping of buildings dataset. Also refactored the CLI to switch from copy-first arguments to preclip-first arguments (as described above).
  • src/aggregate_branches_to_huc.py -> Aggregates branch-level building FIMpact results by HUC
  • src/delineate_hydros_and_produce_HAND.sh -> Calls the new src/process_buildings_fimpact.py script
  • src/bash_variables.env -> Updated the reference to the new pre-clipped dataset and added a reference to the building parts dataset required for pre-clipping
  • src/calibrate_rating_curves.sh -> Enables aggregating buildings FIMpact results by HUC

Testing

Generally, you do not copy this part into the ChangeLog. These are some quick notes on what you did test and/or notes for the reviewer to help with their review testing.


Deployment Plan (For FIM developers use)

  • Does the change impact inputs, docker or python packages?

    • Yes
    • No (f no.. skip the rest of the Deployment Plan section)
  • If you are not a FIM dev team member: Please let us know what you need and we can help with it.

  • If you are a FIM Dev team member:

    • Please work with the DevOps team and do not just go ahead and do it without some co-ordination.

    • Copy where you can, assign where you can not, and it is your responsibility to ensure it is done. Please ensure it is completed before the PR is merged.

    • Has new or updated python packages, PipFile, Pipefile.lock or Dockerfile changes? DevOps can help or take care of it if you want. Just need to know if it is required.

      • Yes
      • No
    • Require new or adjusted data inputs? Does it have a way to version (folder or file dates)?

      • No
      • Yes
        • Require new pre-clip set or any other data reloads, such as DEMS, osm, etc. ie.. pre-requisite re-data upstream of your input changes.
          • Yes
          • No
        • Has the inputs been copied/exist in all four enviros:
          • FIM EFS
          • FIM S3
          • ESIP
          • Dev1
  • Please use caution in removing older version unless it is at least two versions ago. Confirm with DevOps if cleanup might be involved.

  • If new or updated data sets, has the FIM code, including running fim_pipeline.sh, been updated and tested with the new/adjusted data? You can dev test against subsets if you like.

    • Yes

Notes to DevOps Team or others:

Please add any notes that are helpful for us to make sure it is all done correctly. Do not put actual server names or full true paths, just shortcut paths like 'efs..../inputs/, or 'dev1....inputs', etc.


Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead
  • Where applicable, has fim_pipeline been tested with muliple HUCs, including some other unaffected HUCs?

Reviewer / Approver Checklist

  • Where applicable, has fim_pipeline been tested with muliple HUCs, including some other unaffected HUCs?
  • If there are new inputs, have you confirmed that they have been copied to all enviroments?

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@AliForghani-NOAA AliForghani-NOAA self-assigned this Mar 5, 2026
@AliForghani-NOAA AliForghani-NOAA added the enhancement New feature or request label Mar 5, 2026
@AliForghani-NOAA AliForghani-NOAA marked this pull request as ready for review March 6, 2026 16:46
@mluck mluck self-requested a review March 12, 2026 16:03
Copy link
Copy Markdown
Contributor

@ZahraGhahremani ZahraGhahremani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran fim_pipline for HUC 12090301 and tested the fimpacts_inundation tool for that:

Image

I also tested data/buildings/get_fema_buildings.py and data/buildings/make_buildings_parts_per_huc.py for Idaho and it works as expected.

mluck
mluck previously approved these changes Mar 17, 2026
Copy link
Copy Markdown
Contributor

@mluck mluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested get_fema_buildings.py on Vermont (VT) and make_buildings_parts_per_huc.py on these data. It worked perfectly, although it still loops through all 2155 HUCs even though VT covers only a small handful of those HUCs. Is there a way to preselect the HUCs that intersect the data to be more efficient?

The refactoring and CLI logging are nice updates to preclipping.

huc_root/<HUC8>/wbd_buffered.gpkg
"""
if not current_preclip_directory.exists():
raise RuntimeError(f"Prclip directory does not exist: {current_preclip_directory}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prclip spelled incorrectly

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a commit to address this.

Comment thread src/bash_variables.env
# NOTE: $inputsDir is defined in Dockerfile

export pre_clip_huc_dir=${inputsDir}/pre_clip_huc8/20260205
export pre_clip_huc_dir=${inputsDir}/pre_clip_huc8/20260306
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update pre_clip_huc_dir to 20260312

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use '20260312' when we merge your PR? My PR is making '20260306'

@AliForghani-NOAA
Copy link
Copy Markdown
Collaborator Author

I tested get_fema_buildings.py on Vermont (VT) and make_buildings_parts_per_huc.py on these data. It worked perfectly, although it still loops through all 2155 HUCs even though VT covers only a small handful of those HUCs. Is there a way to preselect the HUCs that intersect the data to be more efficient?

The refactoring and CLI logging are nice updates to preclipping.

I looked into this and tested an extent-based preselection so we would only load HUCs intersecting the selected state data instead of all 2,155 HUCs. In my testing, that change actually increased runtime from about 16 minutes to about 21 minutes, (only for VT) which suggests that the added preselection work outweighed any savings from reducing the HUC load step. An alternative would be to maintain a separate static input file mapping HUCs to each state, but that would add maintenance overhead and potential issues if HUC boundaries change. Therefore, I’d prefer to keep the original approach.

@RobHanna-NOAA
Copy link
Copy Markdown
Contributor

Ya.. I agree. I am not sure we should allow by HUC only. It seems like there is a much greater risk of things getting out of sync unless we do full HUCs. And, the time is already negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants