Background
The DGIdb downloads page currently displays a row labeled latest, which maps to a directory of downloadable TSV files hosted externally (I believe these are stored in an AWS instance owned/managed by the Griffith lab - so we may need to leverage their help for this).
At present, there is no programmatic way to determine when the “latest” files were generated. As a temporary solution, the UI hardcodes a display label (e.g. latest (2024-Dec)), while continuing to use latest for the download paths.
This approach is not sustainable long-term and risks becoming inaccurate over time.
Assumptions
- the downloadable TSVs are stored in an external AWS environment (S3?) managed by the Griffith lab
- File access is based on stable endpoints/directory names (e.g.
data/latest, data/2024-Dec)
- There is no exposed metadata endpoint indicating when files were generating
Goal
Introduce a mechanism for determining and displaying the generation date for the "latest" DGIdb download files
Proposed Approaches
File-level metadata
- We could add a generated-on date to the TSV files themselves
- For this, we would need to know how these files are generated - this may need to be done by the Griffith lab if we determine this is the best approach
Manifest/metadata file
- Provide a small metadata file alongside the downloads (e.g.
latest/metadata.json
- Ex:
{
"generated_at": "2024-12-15",
"version_label": "2024-Dec"
}
Storage Metadata (S3 object metadata)
- Expose last-modified timestamps or current metadata
API solution
- Endpoint to provide available download versions and date?
Acceptance Criteria
- Investigate and document where DGIdb download files are stored and how they are updated
- Identify a reliable source of truth for the generation date of the "latest" files
- Implement a programmatic mechanism to retrieve this date (file header, metadata file, api, etc.)
- Update the downloads UI (Files.tsx) to display the date dynamically
- Remove hardcoded display label (
latest (2024-Dec)) from Files.tsx
- Ensure existing download URLs and behavior remain unchanged
- Document chosen approach for future maintainers
Background
The DGIdb downloads page currently displays a row labeled latest, which maps to a directory of downloadable TSV files hosted externally (I believe these are stored in an AWS instance owned/managed by the Griffith lab - so we may need to leverage their help for this).
At present, there is no programmatic way to determine when the “latest” files were generated. As a temporary solution, the UI hardcodes a display label (e.g. latest (2024-Dec)), while continuing to use latest for the download paths.
This approach is not sustainable long-term and risks becoming inaccurate over time.
Assumptions
data/latest,data/2024-Dec)Goal
Introduce a mechanism for determining and displaying the generation date for the "latest" DGIdb download files
Proposed Approaches
File-level metadata
Manifest/metadata file
latest/metadata.jsonStorage Metadata (S3 object metadata)
API solution
Acceptance Criteria
latest (2024-Dec)) fromFiles.tsx