Skip to content

Create a single source of truth for g4emi HDF5 schema definitions #1

@along4

Description

@along4

Summary

The HDF5 schema is currently defined in multiple places across the repo:

  • simulation row structs in sim/include/structures.hh
  • simulation HDF5 field registration in sim/src/SimIO.cc
  • transport output dtype in src/optics/OpticalTransport.py
  • human-readable docs in README.md

This makes schema evolution harder than it should be and creates a real risk of drift between sim output, Python consumers, and documentation.

Problem

Right now there is no single canonical schema definition that both C++ and Python consume.

That means:

  • field additions/renames have to be updated manually in multiple places
  • analysis/transport code can end up carrying local copies of schema knowledge
  • docs can become stale
  • backward-compatibility decisions get scattered instead of being explicit

Proposed direction

Introduce a language-neutral schema spec as the single source of truth, for example:

  • schema/hdf5_schema.yaml

Use that spec to generate or derive:

  • C++ row/field definitions for simulation output
  • Python constants / dtype helpers for analysis and transport
  • schema documentation snippets if useful

Goals

  • define each dataset and field once
  • make current writer schema explicit
  • reduce schema drift between C++ and Python
  • make future schema changes easier to review

Non-goals

  • redesigning the analysis module itself
  • adding legacy compatibility logic as part of this issue
  • changing dataset semantics unless needed for consistency

Likely implementation steps

  1. Define a repo-level schema spec for /primaries, /secondaries, /photons, and /transported_photons.
  2. Add Python-side loading/generation from that spec.
  3. Update transport and analysis code to use the shared schema definitions.
  4. Add C++ generation or a build-time export path for sim writers.
  5. Update docs to reference the shared schema source.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions