Skip to content
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

Checkout of HDF5 datasets #21

Open
khinsen opened this issue Feb 5, 2023 · 2 comments
Open

Checkout of HDF5 datasets #21

khinsen opened this issue Feb 5, 2023 · 2 comments

Comments

@khinsen
Copy link
Member

khinsen commented Feb 5, 2023

Currently, aptool checkout works only on text files, not on array-like HDF5 datasets.

Question: what would be a useful format for such datasets? It needs to be easier to use (by some meaningful definition of "easy") than HDF5, because accessing the datasets via HDF5 is already possible (and straightforward at least from high-level languages).

@khinsen
Copy link
Member Author

khinsen commented Feb 5, 2023

In #20, @sjdv1982 proposes .npy. That's easy to convert from and to most (but not all) HDF5 datasets, and it's easier to use in not requiring HDF5, a rather heavy dependency. It has also been proposed elsewhere in an alternative implementation of the HDF5 data model (see Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format)

@sjdv1982
Copy link

sjdv1982 commented Feb 5, 2023

Here is a quick-and-dirty tool that dumps all datasets in "data/" in .npy format.

It seems that h5py is bugged for structured arrays, but I was able to work around it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants