[v0/v1 migration] Move facet metadata enrichment to V2#6104
[v0/v1 migration] Move facet metadata enrichment to V2#6104nick-nlb merged 10 commits intodatacommonsorg:masterfrom
Conversation
… various aspects of the application that use it to pass up the entities.
…et endpoint rather than constructing the string in the frontend.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request marks the third phase in migrating the website's metadata handling from V1 to V2. It establishes a dedicated backend endpoint for enriching facet metadata, which now exclusively uses V2 API calls and includes robust batching to manage extensive data requests efficiently. The corresponding frontend updates ensure that various tools consume this new V2-powered enrichment, resulting in more precise date ranges and enhanced performance for facet-related functionalities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new backend endpoint /api/metadata/facet to handle the enrichment of facet metadata. This moves complex data fetching and processing logic from the client-side to the server, which is a great architectural improvement for performance and maintainability. The new endpoint replicates the batching logic from series.py to handle large queries gracefully. The existing /api/metadata endpoint is also refactored to use new shared helper functions. On the client-side, fetchFacetsWithMetadata is simplified to call this new endpoint, and all its call sites across different tools (Download, Map, Scatter, Timeline) are updated accordingly.
The changes are well-implemented. I have one suggestion to refactor a small piece of duplicated code in the new enrich_facets endpoint for better clarity and maintainability. Otherwise, the PR looks good.
| if earliest and (not facet_date_ranges[fid].get('earliestDate') or | ||
| earliest < facet_date_ranges[fid]['earliestDate']): | ||
| facet_date_ranges[fid]['earliestDate'] = earliest | ||
|
|
||
| if latest and (not facet_date_ranges[fid].get('latestDate') or | ||
| latest > facet_date_ranges[fid]['latestDate']): | ||
| facet_date_ranges[fid]['latestDate'] = latest |
There was a problem hiding this comment.
It's hard for me to tell at a glance what these if statements are filtering for. Please add an inline comment.
| for sv, sv_facets in facets.items(): | ||
| for fid, finfo in sv_facets.items(): | ||
| if finfo.get('importName'): | ||
| provenance_endpoints.add(f"dc/base/{finfo['importName']}") | ||
| if finfo.get('measurementMethod'): | ||
| measurement_methods.add(finfo['measurementMethod']) | ||
| if finfo.get('unit'): | ||
| units.add(finfo['unit']) | ||
|
|
||
| prov_map, linked_names_map, mm_map, unit_map = await _fetch_secondary_metadata( | ||
| provenance_endpoints, measurement_methods, units) | ||
|
|
||
| for sv, sv_facets in facets.items(): | ||
| for fid, finfo in sv_facets.items(): | ||
| dr = facet_date_ranges.get(fid, {}) | ||
| if dr.get('earliestDate'): | ||
| finfo['dateRangeStart'] = dr.get('earliestDate') | ||
| if dr.get('latestDate'): | ||
| finfo['dateRangeEnd'] = dr.get('latestDate') | ||
|
|
||
| import_name = finfo.get('importName') | ||
| if import_name: | ||
| prov_id = f"dc/base/{import_name}" | ||
| pdata = prov_map.get(prov_id) | ||
| if pdata: | ||
| finfo['sourceName'] = _get_node_name(pdata.get('source', []), | ||
| linked_names_map) | ||
| finfo['provenanceName'] = _get_node_name(pdata.get('isPartOf', []), linked_names_map) or \ | ||
| _get_node_name(pdata.get('name', []), linked_names_map) or import_name |
There was a problem hiding this comment.
Nit: Could you add some more inline comments to make these for loops more readable on a scan through?
There was a problem hiding this comment.
Overall, the implementation looks good to me! This file is getting quite long though. What do you think of pulling out the helper functions into their own file, and leaving just the bp.route() functions here?
There was a problem hiding this comment.
Overall, the implementation looks good to me! This file is getting quite long though. What do you think of pulling out the helper functions into their own file, and leaving just the bp.route() functions here? Totally fine to leave this for a followup PR.
There was a problem hiding this comment.
It's a good call (and something I was pondering as well). I've put it in as a TODO (along with the other item I flagged) for a separate PR.
Issue
b/491842059
Description
This is part three of the conversion of the website metadata handling from V1 to V2.
See 6078 for part 1, which describes the overall purpose.
See 6090 for part 2.
This third part:
api/metadata/facetendpoint that enriches the facets using only V2 calls (replicating what was previously done using V1 in the frontend). This is parallel to the recently addedapi/metadatacall.Notes
The use of the observation endpoint to get earliest and latest dates threw a challenge here, because in certain cases (such as with US counties and multiple stat vars), there were too many series and the mixer refused to handle the request.
This is parallel to what happens in
series.py, where this is compensated for with very highly specific triggers that result in batching.We have replicated the same functionality (adjusted to fit our situation) here.
A note that I left a TODO to merge the constants between the two file. That can be a small follow-up PR to this later (I didn't want to touch series.py in this PR).
Another note is that the changes to metadata.py may appear more significant than they really are (there was some refactoring to use reuse code between the two endpoints.
Testing
For the most part, the facet modal should show the same information as it does in production. The primary point of divergence will be dates. The old facet modal used facet-wide dates, whereas the new V2 endpoints are more accurate, as they incorporate entities.
Pages that demonstrate the new facet modal are:
Upcoming
This is the second change in a series. Pending changes coming are: