Conversation
…for spatial data collection
There was a problem hiding this comment.
Pull request overview
This PR extends the forward-pass output/collection pipeline so NetCDF chunk collection can work with arbitrary spatial chunk grids by propagating global row/column indices through chunk writing and into the NetCDF collection step.
Changes:
- Plumbs
row_inds/col_indsthroughForwardPassStrategy→ForwardPass.run_chunk()→ output writers. - Updates NetCDF writer and collector logic to support coordinate-based spatial recombination.
- Updates H5 collection defaults and adjusts tests to match the new
CollectorNC.collect()signature.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
sup3r/pipeline/strategy.py |
Adds global grid index generation and passes per-chunk row/col indices into ForwardPassChunk. |
sup3r/pipeline/forward_pass.py |
Passes row_inds / col_inds into output handlers when writing chunk outputs. |
sup3r/writers/base.py |
Extends writer APIs to accept row_inds / col_inds and forwards them to type-specific writers. |
sup3r/writers/nc.py |
Writes optional row/col indexing into NetCDF outputs and refactors coords/data_vars construction. |
sup3r/writers/h5.py |
Attempts to add row/col indices to H5 meta output. |
sup3r/postprocessing/collectors/nc.py |
Refactors NetCDF collection to combine spatial chunks via xr.combine_by_coords and loader-based grouping. |
sup3r/postprocessing/collectors/h5.py |
Makes features optional ('all') and derives default features from the first file. |
tests/output/test_output_handling.py |
Updates test call site to match the new CollectorNC.collect() signature. |
Comments suppressed due to low confidence (1)
sup3r/postprocessing/collectors/nc.py:97
reset_coords(Dimension.coords_2d())demoteslatitude/longitudefrom coords to data variables before combining, but the subsequentCacher._write_single(..., features=features)will drop those lat/lon variables wheneverfeaturesis a list (since the cacher subsetsdata[features]and only re-attaches coords, not data_vars). This will produce an output file missing required spatial coordinates, andLoaderNCwill fail because it requireslatitude/longitudeto exist.
After combine_by_coords, re-promote lat/lon back to coordinates (e.g. out = out.set_coords(Dimension.coords_2d()) if present) or include lat/lon explicitly in the written variable list regardless of the features argument.
dsets = list(spatial_chunks.values())
dsets = [ds.reset_coords(Dimension.coords_2d()) for ds in dsets]
out = xr.combine_by_coords(dsets, combine_attrs='override')
cacher_kwargs = cacher_kwargs or {}
Cacher._write_single(
out_file=out_file,
data=out,
features=features,
**cacher_kwargs,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
fix: write lat / lon to coords for each chunk to prevent time dimension expansion during collection
There was a problem hiding this comment.
Pull request overview
This PR updates the forward-pass output schema and NetCDF collection path so chunked outputs can be stitched back together for arbitrary spatial chunk grids by carrying global row/column indices through to file writing and using coordinate-based combination.
Changes:
- Plumb
row_inds/col_indsthroughForwardPassChunk→ forward pass writing → output handlers (H5/NC). - Write global grid indices into outputs (H5 meta fields; NC 1D dimension coords) to support spatially correct chunk stitching.
- Update
CollectorNCto combine chunks viaxarray.combine_by_coords, and adjust tests/helpers accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/output/test_output_handling.py | Updates NetCDF collection test expectations/inputs to reflect new collection behavior. |
| sup3r/writers/nc.py | Adds row_inds/col_inds support and writes them as dimension coords for coordinate-based stitching. |
| sup3r/writers/h5.py | Adds row_inds/col_inds to output meta for chunk identification/collection. |
| sup3r/writers/base.py | Threads row_inds/col_inds through the base output writer interface. |
| sup3r/utilities/pytest/helpers.py | Writes chunk outputs with row/col indices in test fixtures. |
| sup3r/postprocessing/collectors/nc.py | Switches NetCDF chunk stitching to combine_by_coords and updates Loader usage. |
| sup3r/postprocessing/collectors/h5.py | Defaults features to 'all' and resolves feature list from the first file. |
| sup3r/pipeline/strategy.py | Computes and stores global grid indices per chunk in ForwardPassChunk. |
| sup3r/pipeline/forward_pass.py | Passes row/col indices into output writing during chunk runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Updated docstring for group_spatial_chunks method to clarify its purpose and return value.
Updated docstring to clarify return type and behavior.
nc collection for arbitrary chunk shapes
CollectorNCto collect forward pass chunks with arbitrary grids