Skip to content

Use cell bounding box metadata#230

Merged
kyleaoman merged 60 commits intomasterfrom
cell_bbox
May 14, 2025
Merged

Use cell bounding box metadata#230
kyleaoman merged 60 commits intomasterfrom
cell_bbox

Conversation

@kyleaoman
Copy link
Member

@kyleaoman kyleaoman commented Apr 18, 2025

  • Implemented using the cell bounding box metadata (min particle position and max particle position), if present. If not behaviour changes: we pad the selection by 0.1 cell length to make sure that we get any particles that have drifted out of their cells. If they've drifted by >0.1 cell length... too bad.
  • Added some tests as suggested by Test and document wrapped masking #88. The old test data doesn't include the new metadata (min/max positions). Which leads to...
  • Made some new test data, addressing Test data is very outdated #177. It's located at /cosma5/data/Swift/web-storage/IOExamples/ssio_ci_04_2025. I've left the old test data in place in case that's used by anyone else. There's a readme in the new test data directory with details. Next time we decide we need new test data we can just make a new subdirectory. I've also included some of the old test data files (symlinked in) as "legacy" examples to make sure we don't break backwards compatibility. I've also made sure that we have both a "single" and "distributed" file
  • Set up fixtures to either test with both the old and the new single and distributed test files (so we can ensure backwards-compatibility and compatibility with both types), or just the new single-snapshot file where this is not relevant.
  • Abandoned the cosmo_volume_example.hdf5 test data file and use the new single-file snapshot instead.
  • Documented the masking behaviour with respect to periodic boxes as suggested in Test and document wrapped masking #88.
  • Made the cell padding length configurable (including able to be disabled) in cases where the cell bounding box metadata is absent.

Closes #121
Closes #88
Closes #177

@kyleaoman kyleaoman added bug Something isn't working enhancement New feature or request labels Apr 18, 2025
@kyleaoman kyleaoman self-assigned this Apr 18, 2025
@kyleaoman kyleaoman marked this pull request as draft April 18, 2025 17:14
@kyleaoman
Copy link
Member Author

There is now a mechanism for users to adjust the padding (ignored for newer snapshots where the cell bounding box is known), and tests for that. Once we agree on a desired default behaviour it's trivial to implement.

@MatthieuSchaller
Copy link
Member

What simulations are affected? Is it mainly a legacy problem or are colibre runs also in a state where they need 27x more data?

@kyleaoman
Copy link
Member Author

I checked a few Colibre runs, they seem to have the metadata present. I also checked a few Flamingo runs, there the metadata is absent.

@kyleaoman kyleaoman mentioned this pull request Apr 30, 2025
3 tasks
@kyleaoman
Copy link
Member Author

@MatthieuSchaller @JBorrow @robjmcgibbon to make this concrete, I propose that:

  1. If the cell bounding box metadata is present then that is used to determine what cells to read. Any bbox that touches the region gets the cell selected, period.
  2. If the cell bbox metadata is absent then a warning is emitted and a default pad of 0.1 cell side length is applied. The warning mentions how to configure the pad to be very safe (pad by 1 cell side length) or minimize i/o (no pad).

In case it wasn't clear, padding by <1 cell helps because the region is generically not aligned with the cell grid, and the pad is not adding adjacent cells to be read but just making each cell slightly bigger when deciding whether its bbox touches the region.

I feel like this strikes a reasonable balance between accuracy and minimizing unnecessary overheads. Does anyone want to voice an objection to the above? If not I'll wrap up the modifications and call it (actually) ready for review.

@JBorrow
Copy link
Member

JBorrow commented May 8, 2025

The only tricky thing here I think would be to make sure that the extra padding doesn't lead to extra padding on things like the particle-level mask you can ask for. As long as the padding is only used for cell selection I see no issue with adding a small pad to the region to account for cases like this.

@kyleaoman
Copy link
Member Author

kyleaoman commented May 8, 2025

It's implemented directly in the spatial masking function, so it can't "leak out" into other modes. Reading catalogue files (soap, fof) where particles drifting does not happen is also guarded (never padded).

To keep things analogous with the case where bbox metadata is present, the pad is actually applied to the cells, not the region. This is part of a small conceptual change from the original implementation that actually padded the region by half a cell and looked for enclosed cell centres, but the result is the same (and yes, it wraps the periodic boundary properly).

@kyleaoman
Copy link
Member Author

@MatthieuSchaller @JBorrow @robjmcgibbon to make this concrete, I propose that:

  1. If the cell bounding box metadata is present then that is used to determine what cells to read. Any bbox that touches the region gets the cell selected, period.
  2. If the cell bbox metadata is absent then a warning is emitted and a default pad of 0.1 cell side length is applied. The warning mentions how to configure the pad to be very safe (pad by 1 cell side length) or minimize i/o (no pad).

This is now implemented, so ready for review, I think!

@MatthieuSchaller
Copy link
Member

Ok. I agree with this last proposal. Better be safe for the old data. The new sims with more metadata will only read what is required.

What about the buffer size in the old case? Are we not always reading one full extra cell? I must be missing something about the 0.1

@kyleaoman
Copy link
Member Author

@MatthieuSchaller Hmm I tried in words before, let's go with the diagram:

17472384890146612420408370118921

The pad is applied cell by cell (so that when the metadata is available it can be variable). If the user asks for a region exactly aligns with the cell grid then both padding options will read the same set of cells. In this diagram the region is not along grid vertices so we save on I/O by padding less.

@MatthieuSchaller
Copy link
Member

Thanks. I am with you now. That sounds good to me.

@kyleaoman kyleaoman merged commit 3d9070c into master May 14, 2025
10 checks passed
@kyleaoman kyleaoman deleted the cell_bbox branch May 14, 2025 19:21
stankortmann pushed a commit to stankortmann/swiftsimio that referenced this pull request Mar 4, 2026
- Implemented using the cell bounding box metadata (min particle position and max particle position), if present. If not behaviour changes: we pad the selection by 0.1 cell length to make sure that we get any particles that have drifted out of their cells. If they've drifted by >0.1 cell length... too bad.
- Added some tests as suggested by Test and document wrapped masking SWIFTSIM#88. The old test data doesn't include the new metadata (min/max positions). Which leads to...
- Made some new test data, addressing Test data is very outdated SWIFTSIM#177. It's located at /cosma5/data/Swift/web-storage/IOExamples/ssio_ci_04_2025. I've left the old test data in place in case that's used by anyone else. There's a readme in the new test data directory with details. Next time we decide we need new test data we can just make a new subdirectory. I've also included some of the old test data files (symlinked in) as "legacy" examples to make sure we don't break backwards compatibility. I've also made sure that we have both a "single" and "distributed" file
- Set up fixtures to either test with both the old and the new single and distributed test files (so we can ensure backwards-compatibility and compatibility with both types), or just the new single-snapshot file where this is not relevant.
- Abandoned the cosmo_volume_example.hdf5 test data file and use the new single-file snapshot instead.
- Documented the masking behaviour with respect to periodic boxes as suggested in Test and document wrapped masking SWIFTSIM#88.
- Made the cell padding length configurable (including able to be disabled) in cases where the cell bounding box metadata is absent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test data is very outdated Add support for partilcle bounding boxes in the snapshot meta-data Test and document wrapped masking

3 participants