[V2 Migration] Move metadata modals in tile charts to V2#6078
[V2 Migration] Move metadata modals in tile charts to V2#6078nick-nlb merged 19 commits intodatacommonsorg:masterfrom
Conversation
…ls made directly via the frontend to a single collated Flask endpoint that returns the same data. Entities, now required, and now plumbed up from the various tile sources (except for the highlight).
…he backend are used in a filter applied directly to the V2 call.
…n the full facet to attribute mapping (as the endpoint is just discarding that information).
…o entities, rather than just the parent
…a little too small).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request initiates the migration of the website's metadata handling from a V1-based approach to a more efficient V2 system, specifically targeting tile charts. The core change involves centralizing metadata retrieval through a new backend endpoint, which processes requests using V2 APIs. This transition enhances the accuracy of date ranges and the granularity of topic information presented in metadata modals by incorporating entity-specific data, which was not a requirement in the previous V1 implementation. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces a new backend API endpoint (/api/shared/metadata) to fetch comprehensive metadata for statistical variables and entities, consolidating existing metadata fetching logic and enhancing the v2observation function to support facet filtering. Frontend chart tile components are updated to collect and pass entity DCIDs to this new, more efficient metadata API. Review comments suggest addressing a potential IndexError when accessing licenseType if the list is empty and removing an unused node_data variable for improved code clarity in the new metadata API implementation.
juliawu
left a comment
There was a problem hiding this comment.
Great changes! Left some questions inline. Generally would just like some more inline comments in metadata.py for the new functions, to help future maintainers.
server/routes/shared_api/metadata.py
Outdated
|
|
||
| from server.services import datacommons as dc | ||
|
|
||
| bp = Blueprint("metadata", __name__, url_prefix='/api/shared/metadata') |
There was a problem hiding this comment.
Do we want /shared/ in the API path? I see that this file is in the shared_api/ folder, but none of the other files in it use this convention. E.g. shared_api/stats.py serves /api/stats API routes.
What do you think of making the url prefix /api/metadata instead?
There was a problem hiding this comment.
Very good call - it definitely should not have the "shared" in the route!
|
|
||
| bp = Blueprint("metadata", __name__, url_prefix='/api/shared/metadata') | ||
|
|
||
| MAX_CATEGORY_DEPTH = 50 |
There was a problem hiding this comment.
Please add a comment on what MAX_CATEGORY_DEPTH does.
|
|
||
| MAX_CATEGORY_DEPTH = 50 | ||
|
|
||
| MEASUREMENT_METHODS_SUPPRESSION_PROVENANCES: set[str] = {"WikipediaStatsData"} |
There was a problem hiding this comment.
Please add a comment on what MEASUREMENT_METHODS_SUPPRESSION_PROVENANCES does. E.g., what happens if I add a provenance to this set?
| * This version utilizes a consolidated backend API endpoint that contains no | ||
| * V1 calls. |
There was a problem hiding this comment.
When you go back to remove the old endpoint, could you remove or update this line as well? In the future once we're completely off of V1, this note might be confusing about what it's referring to.
There was a problem hiding this comment.
I removed this right away (as the old endpoint will be removed relatively shortly anyway).
server/routes/shared_api/metadata.py
Outdated
| async def fetch_categories_async(stat_vars: list[str]) -> dict[str, list[str]]: | ||
| """Traverses the category hierarchy tree up to top-level topics.""" |
There was a problem hiding this comment.
This function is quite long, could you add a little more to the docstring about what this function is doing and how? E.g. what the str -> list[str] dictionary represents (I think statVar -> top-level topics) and the type of traversal implemented (I see both BFS and DFS here)
There was a problem hiding this comment.
Both BFS and DFS! (one for fetching, one for traversal). I've updated the documentation.
server/routes/shared_api/metadata.py
Outdated
| for sv in stat_vars: | ||
| tops = set() | ||
|
|
||
| def traverse(n: str, curr_visited: set[str]) -> None: |
There was a problem hiding this comment.
Can you add a short docstring on what traverse() does?
| sv_top_levels = collections.defaultdict(list) | ||
| all_top_level_dcids = set() | ||
|
|
||
| for sv in stat_vars: |
There was a problem hiding this comment.
Similarly, can you add a note on what this for loop does?
| category_map: dict[str, list[str]] = {} | ||
| if all_top_level_dcids: | ||
| parent_name_resp = await asyncio.to_thread(dc.v2node, | ||
| list(all_top_level_dcids), | ||
| '->name') | ||
| parent_name_map = {} | ||
| for pid in all_top_level_dcids: | ||
| nodes = _get_arc_nodes(parent_name_resp, pid, 'name') | ||
| if nodes: | ||
| parent_name_map[pid] = nodes[0].get('value') | ||
|
|
||
| for sv in stat_vars: | ||
| category_map[sv] = [ | ||
| parent_name_map.get(p) or p.split('/')[-1] | ||
| for p in sv_top_levels.get(sv, []) | ||
| ] | ||
| else: | ||
| category_map = {sv: [] for sv in stat_vars} |
There was a problem hiding this comment.
Similarly, could you add a comment on what this code block does? I see fetching names for top level dcids and building the final map with them.
|
|
||
| for sv in stat_vars: | ||
| category_map[sv] = [ | ||
| parent_name_map.get(p) or p.split('/')[-1] |
There was a problem hiding this comment.
Took me a bit to get why you used p.split('/')[-1] here, could you add a quick explanation comment?
| let metadataResp; | ||
| if (this.props.entities && this.props.entities.length > 0) { | ||
| metadataResp = await fetchMetadataV2( | ||
| this.props.entities, | ||
| statVarSet, | ||
| statVarToFacets, | ||
| apiRoot, | ||
| facets | ||
| ); | ||
| } else { | ||
| metadataResp = await fetchMetadata( | ||
| statVarSet, | ||
| facets, | ||
| dataCommonsClient, | ||
| statVarToFacets, | ||
| apiRoot | ||
| ); | ||
| } |
There was a problem hiding this comment.
For both here and in tile_metadata_modal, what is the migration plan for the "else" case here? Would we ever reach that code block going forward?
There was a problem hiding this comment.
The migration plan (which will happen as soon as we've converted all calls to V2 by supplying entities) will be to:
- Remove the else
- Remove the old
fetchMetadatafunction completely - Rename
fetchMetadataV2tofetchMetadata
…tion out from a closure to a separate top-level function, and more thoroughly document the more complex areas of the metadata endpoint.
juliawu
left a comment
There was a problem hiding this comment.
Thank you for the updates!
## Issue [b/491842059](https://buganizer.corp.google.com/issues/491842059) ## Description This is part two of the conversion of the website metadata handling from V1 to V2. See [6078](#6078) for part 1, which describes the overall purpose. This part converts metadata calls related to the three tools (timeline, scatter and map) to use the V2 metadata endpoint. ## Solution The primary change was to send entities up through into the metadata modal and the chart embed for each of the tools. However, this update (and the general move to V2) highlighted some changes that needed to be made to all three of the tools in order to supply the precisely used numerator and demoninator facets in each of the charts. This is what makes up the majority of the changes in this PR. ## Testing For the most part, the metadata modal should show the same information as it does in production. However, there are certain areas where we can expect divergences. The production metadata modals do not recognize denominator values (and so will not display the chosen denomator. Additionally, the production metadata modals will sometimes show only a single facet when more facets were actually used (ultimately for the same reason). The following chart demonstrates both of these issues: [Literacy in India](https://datacommons.org/tools/scatter#svx%3DCount_Person_BelowPovertyLevelInThePast12Months_AsFractionOf_Count_Person%26dx%3DCount_Person%26svy%3DCount_Person_Literate%26pcy%3D1%26dy%3DCount_Person%26epd%3Dcountry%2FIND%26ept%3DAdministrativeArea1) ## Upcoming This is the second change in a series. Pending changes coming in later PRs are: * The moving of facet selection dialog metadata to V2. * Final cleanup and removal of the non V2 endpoints.
Issue
b/491842059
Description
This is part one of the conversion of the website metadata handling from V1 to V2.
Previously, metadata fetching from the front end via
metadata_fetcher.tswas performed by making a sequence of discrete endpoint calls, which, once resolved, were collated in the frontend into a final metadata object. Most importantly for this task, this previous method leaned heavily onv1endpointsv1/bulk/info/variable-groupand/v1/bulk/info/variable.Solution
We have moved what used to be multiple frontend calls into a single Flask endpoint
/metadata, that then uses onlyv2to fetch the data. This endpoint consolidates that data and returns it to the frontend.Because the
v2methods used to pull metadata require entities, these entities needed to be plumbed up through from each tile into the metadata modal.The changes comprise the following distinct areas:
metadata.pyendpoint. This is a new endpoint that takes as parameters: entities, statVars, a statVar to facet map and an optional list of actively selected facets. Aside from the addition of the entities, these are the same inputs that the old, frontend-based V1 facet interface was takingNotes
An important aspect of the conversion from
v1tov2is thatv1did not require entity information to function, whereasv2does. With thebulk/info/variable-groupendpoint, entities did not need to be passed up through from the charts to the metadata modal. However, this also meant that the old metadata fetch could not correctly resolve dates to the entities involved (they were scoped to the entire facet), meaning that the dates were often far broader than they are now.With the
v2version of the metadata fetch, we now use the entities and are able to get dates that are location-scoped. A note here that these do not necessarily correspond to exactly what is shown in the chart (although they usually do), but rather what dates are available for the entity/facet/stat var combinations.Testing
For the most part, the metadata modal should show the same information as it does in production. There are two areas where this can be expected to diverge:
These changes only apply to the tiles (and so the explore page). The changes can be tested by running the same query (i.e., "Demographics in the United States") side-by-side locally and in production, to verify that, in the various chart configurations (and facet selections), the data is consistent between the two versions (aside from the exceptions listed above).
Upcoming
This is the first change in a series. Pending changes coming in later PRs are: