Skip to content

Programmatically determine DGIdb “latest” download date and remove hardcoded UI label #618

@katie-perry

Description

@katie-perry

Background

The DGIdb downloads page currently displays a row labeled latest, which maps to a directory of downloadable TSV files hosted externally (I believe these are stored in an AWS instance owned/managed by the Griffith lab - so we may need to leverage their help for this).

At present, there is no programmatic way to determine when the “latest” files were generated. As a temporary solution, the UI hardcodes a display label (e.g. latest (2024-Dec)), while continuing to use latest for the download paths.

This approach is not sustainable long-term and risks becoming inaccurate over time.

Assumptions

  • the downloadable TSVs are stored in an external AWS environment (S3?) managed by the Griffith lab
  • File access is based on stable endpoints/directory names (e.g. data/latest, data/2024-Dec)
  • There is no exposed metadata endpoint indicating when files were generating

Goal

Introduce a mechanism for determining and displaying the generation date for the "latest" DGIdb download files

Proposed Approaches

File-level metadata

  • We could add a generated-on date to the TSV files themselves
  • For this, we would need to know how these files are generated - this may need to be done by the Griffith lab if we determine this is the best approach

Manifest/metadata file

  • Provide a small metadata file alongside the downloads (e.g. latest/metadata.json
  • Ex:
{
  "generated_at": "2024-12-15",
  "version_label": "2024-Dec"
}

Storage Metadata (S3 object metadata)

  • Expose last-modified timestamps or current metadata

API solution

  • Endpoint to provide available download versions and date?

Acceptance Criteria

  • Investigate and document where DGIdb download files are stored and how they are updated
  • Identify a reliable source of truth for the generation date of the "latest" files
  • Implement a programmatic mechanism to retrieve this date (file header, metadata file, api, etc.)
  • Update the downloads UI (Files.tsx) to display the date dynamically
  • Remove hardcoded display label (latest (2024-Dec)) from Files.tsx
  • Ensure existing download URLs and behavior remain unchanged
  • Document chosen approach for future maintainers

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions