Skip to content

nc collection for arbitrary chunk shapes#311

Merged
bnb32 merged 6 commits intomainfrom
bnb/collection
Mar 23, 2026
Merged

nc collection for arbitrary chunk shapes#311
bnb32 merged 6 commits intomainfrom
bnb/collection

Conversation

@bnb32
Copy link
Collaborator

@bnb32 bnb32 commented Mar 19, 2026

  • Enable CollectorNC to collect forward pass chunks with arbitrary grids
  • Flow global row and column indices through forward pass output to chunk writing so these indices can be used to collect chunks

@bnb32 bnb32 requested a review from Copilot March 19, 2026 01:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the forward-pass output/collection pipeline so NetCDF chunk collection can work with arbitrary spatial chunk grids by propagating global row/column indices through chunk writing and into the NetCDF collection step.

Changes:

  • Plumbs row_inds / col_inds through ForwardPassStrategyForwardPass.run_chunk() → output writers.
  • Updates NetCDF writer and collector logic to support coordinate-based spatial recombination.
  • Updates H5 collection defaults and adjusts tests to match the new CollectorNC.collect() signature.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sup3r/pipeline/strategy.py Adds global grid index generation and passes per-chunk row/col indices into ForwardPassChunk.
sup3r/pipeline/forward_pass.py Passes row_inds / col_inds into output handlers when writing chunk outputs.
sup3r/writers/base.py Extends writer APIs to accept row_inds / col_inds and forwards them to type-specific writers.
sup3r/writers/nc.py Writes optional row/col indexing into NetCDF outputs and refactors coords/data_vars construction.
sup3r/writers/h5.py Attempts to add row/col indices to H5 meta output.
sup3r/postprocessing/collectors/nc.py Refactors NetCDF collection to combine spatial chunks via xr.combine_by_coords and loader-based grouping.
sup3r/postprocessing/collectors/h5.py Makes features optional ('all') and derives default features from the first file.
tests/output/test_output_handling.py Updates test call site to match the new CollectorNC.collect() signature.
Comments suppressed due to low confidence (1)

sup3r/postprocessing/collectors/nc.py:97

  • reset_coords(Dimension.coords_2d()) demotes latitude/longitude from coords to data variables before combining, but the subsequent Cacher._write_single(..., features=features) will drop those lat/lon variables whenever features is a list (since the cacher subsets data[features] and only re-attaches coords, not data_vars). This will produce an output file missing required spatial coordinates, and LoaderNC will fail because it requires latitude/longitude to exist.

After combine_by_coords, re-promote lat/lon back to coordinates (e.g. out = out.set_coords(Dimension.coords_2d()) if present) or include lat/lon explicitly in the written variable list regardless of the features argument.

            dsets = list(spatial_chunks.values())
            dsets = [ds.reset_coords(Dimension.coords_2d()) for ds in dsets]
            out = xr.combine_by_coords(dsets, combine_attrs='override')

            cacher_kwargs = cacher_kwargs or {}
            Cacher._write_single(
                out_file=out_file,
                data=out,
                features=features,
                **cacher_kwargs,
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

fix: write lat / lon to coords for each chunk to prevent time dimension expansion during collection
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the forward-pass output schema and NetCDF collection path so chunked outputs can be stitched back together for arbitrary spatial chunk grids by carrying global row/column indices through to file writing and using coordinate-based combination.

Changes:

  • Plumb row_inds / col_inds through ForwardPassChunk → forward pass writing → output handlers (H5/NC).
  • Write global grid indices into outputs (H5 meta fields; NC 1D dimension coords) to support spatially correct chunk stitching.
  • Update CollectorNC to combine chunks via xarray.combine_by_coords, and adjust tests/helpers accordingly.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/output/test_output_handling.py Updates NetCDF collection test expectations/inputs to reflect new collection behavior.
sup3r/writers/nc.py Adds row_inds/col_inds support and writes them as dimension coords for coordinate-based stitching.
sup3r/writers/h5.py Adds row_inds/col_inds to output meta for chunk identification/collection.
sup3r/writers/base.py Threads row_inds/col_inds through the base output writer interface.
sup3r/utilities/pytest/helpers.py Writes chunk outputs with row/col indices in test fixtures.
sup3r/postprocessing/collectors/nc.py Switches NetCDF chunk stitching to combine_by_coords and updates Loader usage.
sup3r/postprocessing/collectors/h5.py Defaults features to 'all' and resolves feature list from the first file.
sup3r/pipeline/strategy.py Computes and stores global grid indices per chunk in ForwardPassChunk.
sup3r/pipeline/forward_pass.py Passes row/col indices into output writing during chunk runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bnb32 bnb32 merged commit 2046f8b into main Mar 23, 2026
6 checks passed
@bnb32 bnb32 deleted the bnb/collection branch March 23, 2026 21:42
github-actions bot pushed a commit that referenced this pull request Mar 23, 2026
nc collection for arbitrary chunk shapes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants