This repository was archived by the owner on Apr 25, 2023. It is now read-only.

Description
Issue
Before we can think about enhanced parallelization and pytorch dataloaders, we need to rethink the data formats for dynamorph.
For each stage of the pipeline, we should define the data type inputs and outputs better (file format, dimensionality, file name)
considerations
We primarily need:
- data consistency between each stage
- parallelization
- efficiency (compute and loading. zarr caching?)
questions
Can we avoid data duplication? Are there intermediate stages that can avoid data duplication?