Skip to content

[V2 Migration] Move metadata modals in tile charts to V2#6078

Merged
nick-nlb merged 19 commits intodatacommonsorg:masterfrom
nick-nlb:metadata_v2_tiles
Mar 16, 2026
Merged

[V2 Migration] Move metadata modals in tile charts to V2#6078
nick-nlb merged 19 commits intodatacommonsorg:masterfrom
nick-nlb:metadata_v2_tiles

Conversation

@nick-nlb
Copy link
Contributor

Issue

b/491842059

Description

This is part one of the conversion of the website metadata handling from V1 to V2.

Previously, metadata fetching from the front end via metadata_fetcher.ts was performed by making a sequence of discrete endpoint calls, which, once resolved, were collated in the frontend into a final metadata object. Most importantly for this task, this previous method leaned heavily on v1 endpoints v1/bulk/info/variable-group and /v1/bulk/info/variable.

Solution

We have moved what used to be multiple frontend calls into a single Flask endpoint /metadata, that then uses only v2 to fetch the data. This endpoint consolidates that data and returns it to the frontend.

Because the v2 methods used to pull metadata require entities, these entities needed to be plumbed up through from each tile into the metadata modal.

The changes comprise the following distinct areas:

  • The metadata.py endpoint. This is a new endpoint that takes as parameters: entities, statVars, a statVar to facet map and an optional list of actively selected facets. Aside from the addition of the entities, these are the same inputs that the old, frontend-based V1 facet interface was taking
  • A new, much lighter "V2" version of the frontend facet fetching function.
  • Plumbing of the entities from the scatter, highlight, bar and timeline charts into the metadata modal.

Notes

An important aspect of the conversion from v1 to v2 is that v1 did not require entity information to function, whereas v2 does. With the bulk/info/variable-group endpoint, entities did not need to be passed up through from the charts to the metadata modal. However, this also meant that the old metadata fetch could not correctly resolve dates to the entities involved (they were scoped to the entire facet), meaning that the dates were often far broader than they are now.

With the v2 version of the metadata fetch, we now use the entities and are able to get dates that are location-scoped. A note here that these do not necessarily correspond to exactly what is shown in the chart (although they usually do), but rather what dates are available for the entity/facet/stat var combinations.

Testing

For the most part, the metadata modal should show the same information as it does in production. There are two areas where this can be expected to diverge:

  • Dates: Dates will mostly be the same. However, because we are now including entities in how we reckon dates, these are sometimes different between the V2 and V1 versions of the metadata. In those cases, V2 should provide narrower (and more accurate dates).
  • Topics: The topic section of the metadata will sometimes provide more than one vertical (i.e., Demographics, Economy) in the new version, as we walk up the graph to each vertical, because we sometimes have multiple memberOf, specializationOf entries that end up resolving to different verticals. In the original metadata modal, this usually showed only a single vertical.

These changes only apply to the tiles (and so the explore page). The changes can be tested by running the same query (i.e., "Demographics in the United States") side-by-side locally and in production, to verify that, in the various chart configurations (and facet selections), the data is consistent between the two versions (aside from the exceptions listed above).

Upcoming

This is the first change in a series. Pending changes coming in later PRs are:

  • The extension of this functionality to the standardized vis tools.
  • The moving of facet selection dialog metadata to V2.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request initiates the migration of the website's metadata handling from a V1-based approach to a more efficient V2 system, specifically targeting tile charts. The core change involves centralizing metadata retrieval through a new backend endpoint, which processes requests using V2 APIs. This transition enhances the accuracy of date ranges and the granularity of topic information presented in metadata modals by incorporating entity-specific data, which was not a requirement in the previous V1 implementation.

Highlights

  • New V2 Metadata Endpoint: Introduced a new Flask endpoint /api/shared/metadata that consolidates multiple frontend calls into a single backend request, utilizing only V2 APIs for metadata fetching.
  • Frontend V2 Facet Fetching: Implemented a new, lighter frontend function (fetchMetadataV2) for fetching facets, which interacts with the new V2 backend endpoint.
  • Entity Plumbing for Metadata Modals: Plumbed entity information from various tile charts (scatter, highlight, bar, timeline, map, ranking) up to the metadata modal, enabling more accurate, location-scoped date resolution and potentially richer topic categorization in V2.
Changelog
  • server/init.py
    • Registered the new shared_metadata blueprint to expose the /api/shared/metadata endpoint.
  • server/routes/shared_api/metadata.py
    • Added a new Python file defining the /api/shared/metadata endpoint, which handles V2 metadata fetching, category traversal, facet processing, and provenance lookups.
  • server/services/datacommons.py
    • Modified the v2observation function to accept an optional filter parameter, allowing for filtering by facetIds in V2 observation requests.
  • static/js/components/tiles/bar_tile.tsx
    • Passed the entities prop to the ChartTileContainer component.
  • static/js/components/tiles/chart_tile.tsx
    • Added an entities prop to the ChartTileContainerProp interface and propagated it to TileSources and ChartEmbed components.
  • static/js/components/tiles/highlight_tile.tsx
    • Passed the entities prop, containing the place DCID, to the ChartTileContainer component.
  • static/js/components/tiles/line_tile.tsx
    • Passed the entities prop, derived from place DCIDs, to the ChartTileContainer component.
  • static/js/components/tiles/map_tile.tsx
    • Added a useMemo hook to derive entities from mapChartData and passed it to the ChartTileContainer component.
  • static/js/components/tiles/ranking_tile.tsx
    • Added a useMemo hook to derive entities from rankingData and passed it to the ChartTileContainer component.
  • static/js/components/tiles/scatter_tile.tsx
    • Added a useMemo hook to derive entities from scatterChartData and passed it to the ChartTileContainer component.
  • static/js/components/tiles/sv_ranking_units.tsx
    • Derived entities from ranking points and passed it to the TileSources component.
  • static/js/place/chart_embed.tsx
    • Imported fetchMetadataV2, added an entities prop to ChartEmbedPropsType, and conditionally called fetchMetadataV2 if entities are present, otherwise fetchMetadata.
  • static/js/tools/shared/metadata/metadata_fetcher.ts
    • Added fetchMetadataV2 function to fetch metadata using the new V2 backend endpoint, consolidating V1 calls.
  • static/js/tools/shared/metadata/tile_metadata_modal.tsx
    • Imported fetchMetadataV2, added an entities prop to TileMetadataModalPropType, and conditionally called fetchMetadataV2 if entities are present, otherwise fetchMetadata.
  • static/js/tools/shared/metadata/tile_sources.tsx
    • Added an entities prop to the TileSources component and passed it to the TileMetadataModal.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new backend API endpoint (/api/shared/metadata) to fetch comprehensive metadata for statistical variables and entities, consolidating existing metadata fetching logic and enhancing the v2observation function to support facet filtering. Frontend chart tile components are updated to collect and pass entity DCIDs to this new, more efficient metadata API. Review comments suggest addressing a potential IndexError when accessing licenseType if the list is empty and removing an unused node_data variable for improved code clarity in the new metadata API implementation.

@nick-nlb nick-nlb requested a review from juliawu March 13, 2026 20:27
@nick-nlb nick-nlb marked this pull request as ready for review March 13, 2026 20:27
Copy link
Contributor

@juliawu juliawu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes! Left some questions inline. Generally would just like some more inline comments in metadata.py for the new functions, to help future maintainers.


from server.services import datacommons as dc

bp = Blueprint("metadata", __name__, url_prefix='/api/shared/metadata')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want /shared/ in the API path? I see that this file is in the shared_api/ folder, but none of the other files in it use this convention. E.g. shared_api/stats.py serves /api/stats API routes.

What do you think of making the url prefix /api/metadata instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good call - it definitely should not have the "shared" in the route!


bp = Blueprint("metadata", __name__, url_prefix='/api/shared/metadata')

MAX_CATEGORY_DEPTH = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on what MAX_CATEGORY_DEPTH does.


MAX_CATEGORY_DEPTH = 50

MEASUREMENT_METHODS_SUPPRESSION_PROVENANCES: set[str] = {"WikipediaStatsData"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on what MEASUREMENT_METHODS_SUPPRESSION_PROVENANCES does. E.g., what happens if I add a provenance to this set?

Comment on lines +536 to +537
* This version utilizes a consolidated backend API endpoint that contains no
* V1 calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you go back to remove the old endpoint, could you remove or update this line as well? In the future once we're completely off of V1, this note might be confusing about what it's referring to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this right away (as the old endpoint will be removed relatively shortly anyway).

Comment on lines +73 to +74
async def fetch_categories_async(stat_vars: list[str]) -> dict[str, list[str]]:
"""Traverses the category hierarchy tree up to top-level topics."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is quite long, could you add a little more to the docstring about what this function is doing and how? E.g. what the str -> list[str] dictionary represents (I think statVar -> top-level topics) and the type of traversal implemented (I see both BFS and DFS here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both BFS and DFS! (one for fetching, one for traversal). I've updated the documentation.

for sv in stat_vars:
tops = set()

def traverse(n: str, curr_visited: set[str]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short docstring on what traverse() does?

sv_top_levels = collections.defaultdict(list)
all_top_level_dcids = set()

for sv in stat_vars:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can you add a note on what this for loop does?

Comment on lines +138 to +155
category_map: dict[str, list[str]] = {}
if all_top_level_dcids:
parent_name_resp = await asyncio.to_thread(dc.v2node,
list(all_top_level_dcids),
'->name')
parent_name_map = {}
for pid in all_top_level_dcids:
nodes = _get_arc_nodes(parent_name_resp, pid, 'name')
if nodes:
parent_name_map[pid] = nodes[0].get('value')

for sv in stat_vars:
category_map[sv] = [
parent_name_map.get(p) or p.split('/')[-1]
for p in sv_top_levels.get(sv, [])
]
else:
category_map = {sv: [] for sv in stat_vars}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, could you add a comment on what this code block does? I see fetching names for top level dcids and building the final map with them.


for sv in stat_vars:
category_map[sv] = [
parent_name_map.get(p) or p.split('/')[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a bit to get why you used p.split('/')[-1] here, could you add a quick explanation comment?

Comment on lines +423 to +440
let metadataResp;
if (this.props.entities && this.props.entities.length > 0) {
metadataResp = await fetchMetadataV2(
this.props.entities,
statVarSet,
statVarToFacets,
apiRoot,
facets
);
} else {
metadataResp = await fetchMetadata(
statVarSet,
facets,
dataCommonsClient,
statVarToFacets,
apiRoot
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both here and in tile_metadata_modal, what is the migration plan for the "else" case here? Would we ever reach that code block going forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration plan (which will happen as soon as we've converted all calls to V2 by supplying entities) will be to:

  • Remove the else
  • Remove the old fetchMetadata function completely
  • Rename fetchMetadataV2 to fetchMetadata

…tion out from a closure to a separate top-level function, and more thoroughly document the more complex areas of the metadata endpoint.
@nick-nlb nick-nlb requested a review from juliawu March 16, 2026 20:56
Copy link
Contributor

@juliawu juliawu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates!

@nick-nlb nick-nlb merged commit 2214b89 into datacommonsorg:master Mar 16, 2026
12 checks passed
nick-nlb added a commit that referenced this pull request Mar 17, 2026
## Issue
[b/491842059](https://buganizer.corp.google.com/issues/491842059)

## Description

This is part two of the conversion of the website metadata handling from
V1 to V2.

See [6078](#6078) for part
1, which describes the overall purpose.

This part converts metadata calls related to the three tools (timeline,
scatter and map) to use the V2 metadata endpoint.

## Solution

The primary change was to send entities up through into the metadata
modal and the chart embed for each of the tools.

However, this update (and the general move to V2) highlighted some
changes that needed to be made to all three of the tools in order to
supply the precisely used numerator and demoninator facets in each of
the charts. This is what makes up the majority of the changes in this
PR.

## Testing

For the most part, the metadata modal should show the same information
as it does in production. However, there are certain areas where we can
expect divergences. The production metadata modals do not recognize
denominator values (and so will not display the chosen denomator.
Additionally, the production metadata modals will sometimes show only a
single facet when more facets were actually used (ultimately for the
same reason).

The following chart demonstrates both of these issues:

[Literacy in
India](https://datacommons.org/tools/scatter#svx%3DCount_Person_BelowPovertyLevelInThePast12Months_AsFractionOf_Count_Person%26dx%3DCount_Person%26svy%3DCount_Person_Literate%26pcy%3D1%26dy%3DCount_Person%26epd%3Dcountry%2FIND%26ept%3DAdministrativeArea1)

## Upcoming

This is the second change in a series. Pending changes coming in later
PRs are:
* The moving of facet selection dialog metadata to V2.
* Final cleanup and removal of the non V2 endpoints.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants