Summary
The HDF5 schema is currently defined in multiple places across the repo:
- simulation row structs in
sim/include/structures.hh
- simulation HDF5 field registration in
sim/src/SimIO.cc
- transport output dtype in
src/optics/OpticalTransport.py
- human-readable docs in
README.md
This makes schema evolution harder than it should be and creates a real risk of drift between sim output, Python consumers, and documentation.
Problem
Right now there is no single canonical schema definition that both C++ and Python consume.
That means:
- field additions/renames have to be updated manually in multiple places
- analysis/transport code can end up carrying local copies of schema knowledge
- docs can become stale
- backward-compatibility decisions get scattered instead of being explicit
Proposed direction
Introduce a language-neutral schema spec as the single source of truth, for example:
Use that spec to generate or derive:
- C++ row/field definitions for simulation output
- Python constants / dtype helpers for analysis and transport
- schema documentation snippets if useful
Goals
- define each dataset and field once
- make current writer schema explicit
- reduce schema drift between C++ and Python
- make future schema changes easier to review
Non-goals
- redesigning the analysis module itself
- adding legacy compatibility logic as part of this issue
- changing dataset semantics unless needed for consistency
Likely implementation steps
- Define a repo-level schema spec for
/primaries, /secondaries, /photons, and /transported_photons.
- Add Python-side loading/generation from that spec.
- Update transport and analysis code to use the shared schema definitions.
- Add C++ generation or a build-time export path for sim writers.
- Update docs to reference the shared schema source.
Summary
The HDF5 schema is currently defined in multiple places across the repo:
sim/include/structures.hhsim/src/SimIO.ccsrc/optics/OpticalTransport.pyREADME.mdThis makes schema evolution harder than it should be and creates a real risk of drift between sim output, Python consumers, and documentation.
Problem
Right now there is no single canonical schema definition that both C++ and Python consume.
That means:
Proposed direction
Introduce a language-neutral schema spec as the single source of truth, for example:
schema/hdf5_schema.yamlUse that spec to generate or derive:
Goals
Non-goals
Likely implementation steps
/primaries,/secondaries,/photons, and/transported_photons.