[15pt] Incorporate buildings into FIM#1777
Conversation
mluck
left a comment
There was a problem hiding this comment.
I tested get_fema_buildings.py on Vermont (VT) and make_buildings_parts_per_huc.py on these data. It worked perfectly, although it still loops through all 2155 HUCs even though VT covers only a small handful of those HUCs. Is there a way to preselect the HUCs that intersect the data to be more efficient?
The refactoring and CLI logging are nice updates to preclipping.
| huc_root/<HUC8>/wbd_buffered.gpkg | ||
| """ | ||
| if not current_preclip_directory.exists(): | ||
| raise RuntimeError(f"Prclip directory does not exist: {current_preclip_directory}") |
There was a problem hiding this comment.
Just pushed a commit to address this.
| # NOTE: $inputsDir is defined in Dockerfile | ||
|
|
||
| export pre_clip_huc_dir=${inputsDir}/pre_clip_huc8/20260205 | ||
| export pre_clip_huc_dir=${inputsDir}/pre_clip_huc8/20260306 |
There was a problem hiding this comment.
Update pre_clip_huc_dir to 20260312
There was a problem hiding this comment.
Should we use '20260312' when we merge your PR? My PR is making '20260306'
I looked into this and tested an extent-based preselection so we would only load HUCs intersecting the selected state data instead of all 2,155 HUCs. In my testing, that change actually increased runtime from about 16 minutes to about 21 minutes, (only for VT) which suggests that the added preselection work outweighed any savings from reducing the HUC load step. An alternative would be to maintain a separate static input file mapping HUCs to each state, but that would add maintenance overhead and potential issues if HUC boundaries change. Therefore, I’d prefer to keep the original approach. |
|
Ya.. I agree. I am not sure we should allow by HUC only. It seems like there is a much greater risk of things getting out of sync unless we do full HUCs. And, the time is already negligible. |

This PR closes the issue #1739 and includes the following enhancements to address buildings Fimpacts:
Ingests FEMA buildings as a new input data for FIM.
Derives the threshold discharge required for buildings inundation. To achieve this, the minimum non-zero HAND value within each building is extracted as the inundation threshold stage. The corresponding threshold discharge values are then interpolated from the HydroTables.
Enhances
tools/fimpacts_inundation.py(formerlyroad_inundation.py) to identify inundated buildings and calculate corresponding flood depths for specific events.In addition to introducing building pre-clipping in the
data/wbd/generate_pre_clip_fim_huc8.pyscript, this PR refactors the interface from--copy_*arguments (e.g.,--copy_osm_roads) to direct layer arguments for preclipping (e.g.,--osm_roads). Listed layers are pre-clipped, while unlisted layers are copied, simplifying the interface and making layer selection more intuitive.The updated pre-clipped dataset with new FEMA buildings data has been prepared here:
inputs/pre_clip_huc8/20260306/.In-Depth Workflow Explanation
data/buildings/get_fema_buildings.py(new script)This script downloads FEMA’s latest per-state building structure geodatabases from the official USA Structures page. It then converts the gdb files to GeoParquet format using the appropriate CRS for each region (CONUS, Alaska, Guam, and American Samoa). The script supports preparing data for specific states only if desired.
data/buildings/make_buildings_parts_per_huc.py(new script)This script splits state-level building parquet datasets into HUC8-based parquet “parts”, keeping only the following building attributes ["UUID", "HEIGHT", "OCC_CLS", "SOURCE", "VAL_METHOD"] plus geometry. It processes a mixed sequence of parquet row groups in parallel (taking row groups from different states in turn, instead of finishing one state at a time), and uses a bounding-box prefilter to efficiently identify which HUCs intersect each row group before running the spatial join. Outputs are written as per-HUC8 folders (for example, huc8_XXXXXXXX/STATE_rg001.parquet), and it can optionally run for only selected states.
src/process_buildings_fimpact.py(new script)This script is run for each branch of an HUC using three inputs:
A single building segment may intersect multiple HydroIDs. To account for this, the script splits building segments at HydroID boundaries and calculates the minimum HAND value (excluding zeros) within each segment to serve as the inundation threshold.
Three new columns are added to the building dataset:
threshold_hand,HydroID, andfeature_id. The results are saved asbuildings_fimpact_***.csvfor each branch, where***represents the branch number. Each CSV file contains one record perUUID, which is the unique identifier for each building segment, within eachHydroID, providing the minimum HAND value for that combination.src/aggregate_by_huc.py(updated script)For each branch, the script retrieves the discharge value corresponding to each
threshold_handfrom the branch’s HydroTable (per HydroID) and assigns it asthreshold_discharge. Any record with athreshold_handvalue greater than 25m (the maximum stage listed in the HydroTables) is removed entirely. The outputs from all branches are combined into a single file:buildings_fimpact.csv.tools/fimpacts_inundation.py(formerly calledtools/road_inundation.pyThis tool now takes three inputs:
buildings_fimpact.csvin addition toosm_roads_fimpact.csvfile), andThe script identifies buildings segments where the given flow (referred to as
evaluated_discharge) exceeds thethreshold dischargeand flags them as inundated. It also looks up the stage corresponding to theevaluated_discharge(and call itevaluated_stage) and subtracts theevaluated_stagefrom thethreshold_handvalue to calculate theflood_depth.Records with negative flood depth are currently removed, as these may result from non-monotonic synthetic rating curves—most commonly observed in branch zero.
Note that a single building segment may have multiple inundation records, originating from different branches or intersecting multiple HydroIDs. The code retains only the record with the maximum flood depth for each building segment.
The figure below displays the output of the
fimpacts_inundation.pytool with inundated buildings (with their flood depth) and non-inundated buildings (gray) overlaid on a FIM raster. Both results were generated from a common 50-year recurrence interval flow file for HUC 11070103.Additions
Changes
tools/road_inundation.pytotools/fimpacts_inundation.pyand extended the script to support building inundation processing in addition to roads.src/process_buildings_fimpact.pyscriptTesting
Generally, you do not copy this part into the ChangeLog. These are some quick notes on what you did test and/or notes for the reviewer to help with their review testing.
Deployment Plan (For FIM developers use)
Does the change impact inputs, docker or python packages?
If you are not a FIM dev team member: Please let us know what you need and we can help with it.
If you are a FIM Dev team member:
Please work with the DevOps team and do not just go ahead and do it without some co-ordination.
Copy where you can, assign where you can not, and it is your responsibility to ensure it is done. Please ensure it is completed before the PR is merged.
Has new or updated python packages, PipFile, Pipefile.lock or Dockerfile changes? DevOps can help or take care of it if you want. Just need to know if it is required.
Require new or adjusted data inputs? Does it have a way to version (folder or file dates)?
Please use caution in removing older version unless it is at least two versions ago. Confirm with DevOps if cleanup might be involved.
If new or updated data sets, has the FIM code, including running fim_pipeline.sh, been updated and tested with the new/adjusted data? You can dev test against subsets if you like.
Notes to DevOps Team or others:
Please add any notes that are helpful for us to make sure it is all done correctly. Do not put actual server names or full true paths, just shortcut paths like 'efs..../inputs/, or 'dev1....inputs', etc.
Issuer Checklist (For developer use)
You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.
[_pt] PR: <description>devbranch (the default branch), you have a descriptive Feature Branch name using the format:dev-<description-of-change>(e.g.dev-revise-levee-masking)devbranchpre-commithooks were run locally4.x.x.xReviewer / Approver Checklist
Merge Checklist (For Technical Lead use only)