Skip to content

[v0/v1 migration] Move facet metadata enrichment to V2#6104

Merged
nick-nlb merged 10 commits intodatacommonsorg:masterfrom
nick-nlb:metadata_facet_selector
Mar 19, 2026
Merged

[v0/v1 migration] Move facet metadata enrichment to V2#6104
nick-nlb merged 10 commits intodatacommonsorg:masterfrom
nick-nlb:metadata_facet_selector

Conversation

@nick-nlb
Copy link
Contributor

@nick-nlb nick-nlb commented Mar 18, 2026

Issue

b/491842059

Description

This is part three of the conversion of the website metadata handling from V1 to V2.

See 6078 for part 1, which describes the overall purpose.
See 6090 for part 2.

This third part:

  • adds a new api/metadata/facet endpoint that enriches the facets using only V2 calls (replicating what was previously done using V1 in the frontend). This is parallel to the recently added api/metadata call.
  • Updates the front end so that entities (or enclosed places and place types) are sent through to the facet selector.

Notes

The use of the observation endpoint to get earliest and latest dates threw a challenge here, because in certain cases (such as with US counties and multiple stat vars), there were too many series and the mixer refused to handle the request.

This is parallel to what happens in series.py, where this is compensated for with very highly specific triggers that result in batching.

We have replicated the same functionality (adjusted to fit our situation) here.

A note that I left a TODO to merge the constants between the two file. That can be a small follow-up PR to this later (I didn't want to touch series.py in this PR).

Another note is that the changes to metadata.py may appear more significant than they really are (there was some refactoring to use reuse code between the two endpoints.

Testing

For the most part, the facet modal should show the same information as it does in production. The primary point of divergence will be dates. The old facet modal used facet-wide dates, whereas the new V2 endpoints are more accurate, as they incorporate entities.

Pages that demonstrate the new facet modal are:

Upcoming

This is the second change in a series. Pending changes coming are:

  • Final cleanup and removal of the non V2 endpoints.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks the third phase in migrating the website's metadata handling from V1 to V2. It establishes a dedicated backend endpoint for enriching facet metadata, which now exclusively uses V2 API calls and includes robust batching to manage extensive data requests efficiently. The corresponding frontend updates ensure that various tools consume this new V2-powered enrichment, resulting in more precise date ranges and enhanced performance for facet-related functionalities.

Highlights

  • New API Endpoint for Facet Enrichment: Introduced a new /api/metadata/facets endpoint on the backend to handle facet metadata enrichment, leveraging V2 API calls for more accurate and efficient data processing.
  • Backend Refactoring and Batching: Refactored backend logic in metadata.py with new helper functions for fetching node data, extracting date ranges, and resolving secondary metadata. Implemented batching for observation endpoint calls to prevent timeouts when dealing with a large number of statistical variables or entities.
  • Frontend Migration to V2 Facet API: Updated multiple frontend components, including the Download Tool, Map, Scatter, Timeline, and shared facet choice fetchers, to utilize the new /api/metadata/facets endpoint and pass relevant entity context for improved facet selection.
  • Improved Date Range Accuracy: The new V2 endpoints incorporate entities for date range calculations, leading to more accurate earliest and latest dates for facets compared to the previous facet-wide date approach.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new backend endpoint /api/metadata/facet to handle the enrichment of facet metadata. This moves complex data fetching and processing logic from the client-side to the server, which is a great architectural improvement for performance and maintainability. The new endpoint replicates the batching logic from series.py to handle large queries gracefully. The existing /api/metadata endpoint is also refactored to use new shared helper functions. On the client-side, fetchFacetsWithMetadata is simplified to call this new endpoint, and all its call sites across different tools (Download, Map, Scatter, Timeline) are updated accordingly.

The changes are well-implemented. I have one suggestion to refactor a small piece of duplicated code in the new enrich_facets endpoint for better clarity and maintainability. Otherwise, the PR looks good.

@nick-nlb nick-nlb marked this pull request as ready for review March 18, 2026 21:57
@nick-nlb nick-nlb requested a review from juliawu March 18, 2026 21:57
Copy link
Contributor

@juliawu juliawu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +123 to +129
if earliest and (not facet_date_ranges[fid].get('earliestDate') or
earliest < facet_date_ranges[fid]['earliestDate']):
facet_date_ranges[fid]['earliestDate'] = earliest

if latest and (not facet_date_ranges[fid].get('latestDate') or
latest > facet_date_ranges[fid]['latestDate']):
facet_date_ranges[fid]['latestDate'] = latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to tell at a glance what these if statements are filtering for. Please add an inline comment.

Comment on lines +596 to +624
for sv, sv_facets in facets.items():
for fid, finfo in sv_facets.items():
if finfo.get('importName'):
provenance_endpoints.add(f"dc/base/{finfo['importName']}")
if finfo.get('measurementMethod'):
measurement_methods.add(finfo['measurementMethod'])
if finfo.get('unit'):
units.add(finfo['unit'])

prov_map, linked_names_map, mm_map, unit_map = await _fetch_secondary_metadata(
provenance_endpoints, measurement_methods, units)

for sv, sv_facets in facets.items():
for fid, finfo in sv_facets.items():
dr = facet_date_ranges.get(fid, {})
if dr.get('earliestDate'):
finfo['dateRangeStart'] = dr.get('earliestDate')
if dr.get('latestDate'):
finfo['dateRangeEnd'] = dr.get('latestDate')

import_name = finfo.get('importName')
if import_name:
prov_id = f"dc/base/{import_name}"
pdata = prov_map.get(prov_id)
if pdata:
finfo['sourceName'] = _get_node_name(pdata.get('source', []),
linked_names_map)
finfo['provenanceName'] = _get_node_name(pdata.get('isPartOf', []), linked_names_map) or \
_get_node_name(pdata.get('name', []), linked_names_map) or import_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you add some more inline comments to make these for loops more readable on a scan through?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the implementation looks good to me! This file is getting quite long though. What do you think of pulling out the helper functions into their own file, and leaving just the bp.route() functions here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the implementation looks good to me! This file is getting quite long though. What do you think of pulling out the helper functions into their own file, and leaving just the bp.route() functions here? Totally fine to leave this for a followup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good call (and something I was pondering as well). I've put it in as a TODO (along with the other item I flagged) for a separate PR.

@nick-nlb nick-nlb merged commit 6ccf397 into datacommonsorg:master Mar 19, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants