Migrate Place Info and Series Dates to V2 API by SandeepTuniki · Pull Request #6047 · datacommonsorg/website

SandeepTuniki · 2026-03-02T16:45:40Z

This PR migrates two backend service functions, get_place_info and get_series_dates, from the v1 to v2 APIs.

Notes:

Due to differences between v1 and v2, I needed to update golden files. This meant the 2 API responses have now changed in place names like United States vs United States of America, New York vs New York City etc.
The important code changes are in files:
- server/services/datacommons.py (main logic changes are here)
- server/services/discovery.py (removed the migrated v1 endpoints)
- shared/lib/constants.py
- server/tests/migration_verification_test.py
- The rest of the changes are golden files' updates.

Also, I was having trouble with updating golden files locally (the script was erroring out due to timeout issues in staging), so as a workaround, I triggered cloud builds through build/ci/cloudbuild.update_nl_goldens.yaml and brought the updated goldens to this PR.

gemini-code-assist · 2026-03-02T16:46:02Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request re-implements key Data Commons service functions, get_place_info and get_series_dates, to migrate them from the V1 to the V2 API. This migration aims to improve data retrieval mechanisms, particularly for hierarchical place information and observation series. The changes include comprehensive unit tests and a live verification script to ensure the robustness and correctness of the new V2 API integrations. This PR is a re-creation of a previous, "messy" pull request, and is not yet ready for review.

Highlights

V2 API Migration for get_place_info: The get_place_info function was completely refactored to utilize the V2 Data Commons API, implementing a Breadth-First Search (BFS) for ancestor retrieval and sorting parents by a new PLACE_TYPE_RANK system.
V2 API Migration for get_series_dates: The get_series_dates function was re-implemented to leverage the V2 Data Commons API, including filtering children by type and aggregating observation dates.
New Unit Tests: New unit tests were introduced for get_place_info and get_series_dates to validate their V2 API implementations, covering edge cases like recursion depth and cyclic dependencies.
Live Verification Script: A new live verification script was added to test the migrated functions in a Flask application context, ensuring their correct operation against the live Data Commons API.

Changelog

server/services/datacommons.py
- Removed the old get_place_info function.
- Added a PLACE_TYPE_RANK dictionary for sorting place types.
- Implemented a new get_place_info function using V2 v2node API, including BFS for ancestor discovery, handling recursion limits, and sorting parents.
- Removed the old get_series_dates function.
- Implemented a new get_series_dates function using V2 v2node and v2observation APIs, including child filtering and observation aggregation.
server/tests/migration_verification_test.py
- Added a new test file.
- Included test_get_place_info_v2 to verify the new get_place_info implementation with mocked V2 API responses.
- Included test_get_series_dates_v2 to verify the new get_series_dates implementation with mocked V2 API responses.
- Added test_get_place_info_edge_cases to test recursion limits and cycles in get_place_info.
- Added test_get_series_dates_error to test error handling in get_series_dates.
server/tests/verify_live_migration.py
- Added a new script for live testing.
- Implemented verify_get_place_info to test the get_place_info function against the live Data Commons API.
- Implemented verify_get_series_dates to test the get_series_dates function against the live Data Commons API.
- Included Flask app context setup for proper configuration loading during live verification.

Activity

The author, SandeepTuniki, has recreated this pull request as a cleaner version of a previous one (chore: Migrate Place Info and Series Dates to V2 API #5959) that became "messy".
The pull request is currently marked as "Don't review", indicating it is still in progress or not yet ready for formal review.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors get_place_info and get_series_dates to use V2 APIs instead of V1, which is a significant change involving the reimplementation of V1 endpoint logic by composing multiple V2 API calls. The changes also include new unit tests and a live verification script.

My review identified several issues that need attention:

A correctness bug in get_series_dates related to filtering child places by type.
An inefficiency in get_place_info that can be improved for better performance.
The unit tests for get_series_dates use incorrect mock data that doesn't match the real API response, which unfortunately hides the implementation bug.
The new live verification script contains a bug in its assertion logic, rendering it ineffective.

I have provided specific comments and code suggestions to address these points. While the overall direction of the changes is good, the implementation requires these fixes to ensure correctness and robustness.

server/services/datacommons.py

server/tests/migration_verification_test.py

server/tests/verify_live_migration.py

server/services/datacommons.py

…1f1-80c0-2922463dcc7d)

…v2-golden-updates

…b54-9d0c-4c13795f46b5)

…erica

SandeepTuniki · 2026-03-18T08:59:38Z

/gemini review

gemini-code-assist

Code Review

This pull request successfully migrates get_place_info and get_series_dates from v1 to v2 APIs. The new implementations in datacommons.py are more complex to accommodate the v2 API structure but appear correct and are well-tested with new verification tests. The related changes, such as removing old v1 endpoints and updating numerous golden files, are consistent with this migration. I have one minor suggestion to improve efficiency in the get_series_dates implementation. Overall, this is a solid refactoring effort.

server/services/datacommons.py

juliawu

Hooray! Nice work. Just left some suggestions for extra comments/cleanup.

juliawu · 2026-03-18T18:54:22Z

shared/lib/constants.py

+    "CensusZipCodeTabulationArea": 1,
+    "AdministrativeArea5": 2,
+    "AdministrativeArea4": 2,
+    "Village": 5,
+    "City": 5,
+    "Town": 5,
+    "Borough": 5,
+    "AdministrativeArea3": 5,
+    "County": 10,
+    "AdministrativeArea2": 10,
+    "EurostatNUTS3": 10,
+    "CensusDivision": 15,
+    "State": 20,
+    "AdministrativeArea1": 20,
+    "EurostatNUTS2": 20,
+    "EurostatNUTS1": 20,
+    "Country": 30,
+    "CensusRegion": 35,
+    "GeoRegion": 38,
+    "Continent": 40,
+    "Place": 50,


Should this list also contain the other StatisticalArea place types? See https://datacommons.org/browser/StatisticalArea

juliawu · 2026-03-19T01:07:22Z

server/services/datacommons.py

+    resp = v2node(fetch_dcids, '->containedInPlace')
+    data = resp.get('data', {})
+
+    current_frontier = set()
+    for dcid in fetch_dcids:
+      visited.add(dcid)
+      node_data = data.get(dcid, {})
+
+      arcs_obj = node_data.get('arcs', {}).get('containedInPlace', {})
+      nodes_list = arcs_obj.get('nodes', []) if isinstance(arcs_obj,
+                                                           dict) else []
+
+      parents = [x['dcid'] for x in nodes_list if 'dcid' in x]
+      if parents:
+        parent_graph[dcid] = parents
+        current_frontier.update(parents)


nit: Could you add a few more inline comments here to explain what the loop is doing? I just want it to be easier for future maintainers to parse if we ever decide to update the implementation down the road.

juliawu · 2026-03-19T01:17:43Z

server/services/datacommons.py

+  def get_all_values(resp, dcid, prop, key='dcid'):
+    node_data = resp.get('data', {}).get(dcid, {})
+    arcs_obj = node_data.get('arcs', {}).get(prop, {})
+    if not arcs_obj:
+      # Try checking without arrow if key mismatch
+      arcs_obj = node_data.get('arcs', {}).get(prop.replace('->', ''), {})
+
+    nodes_list = arcs_obj.get('nodes', []) if isinstance(arcs_obj, dict) else []
+    return [n.get(key, '') for n in nodes_list if key in n]
+
+  def get_best_type(types_list):
+    if not types_list:
+      return ''
+
+    # Sort types by rank (highest rank first)
+    # If ranks are tied, prefer types that don't start with 'AdministrativeArea'
+    def sort_key(t):
+      rank = PLACE_TYPE_RANK.get(t, 0)
+      is_admin = 1 if t.startswith('AdministrativeArea') else 0
+      return (rank, -is_admin)
+
+    return sorted(types_list, key=sort_key, reverse=True)[0]


These nested functions don't look like closure functions to me. What do you think of extracting these out to _get_all_values() and _get_best_type() at the top of this file instead? Because get_place_info() is already a very long function, I think this would help improve readability.

juliawu · 2026-03-19T01:18:55Z

server/services/datacommons.py

+  for dcid in dcids:
+    self_types = get_all_values(types_resp, dcid, 'typeOf')
+    self_names = get_all_values(names_resp, dcid, 'name', 'value')
+
+    # Skip DCIDs that don't exist in the graph (bogus places)
+    if not self_types and not self_names:
+      continue
+
+    self_type = get_best_type(self_types)
+    self_name = self_names[0] if self_names else ''
+
+    parents = []
+    for anc_dcid in ancestors_map.get(dcid, []):
+      if anc_dcid == dcid:
+        continue
+
+      anc_types = get_all_values(types_resp, anc_dcid, 'typeOf')
+      anc_type = get_best_type(anc_types)
+      anc_names = get_all_values(names_resp, anc_dcid, 'name', 'value')
+      anc_name = anc_names[0] if anc_names else ''
+
+      if anc_type in PLACE_TYPE_RANK:
+        parents.append({
+            'dcid': anc_dcid,
+            'type': anc_type,
+            'name': anc_name,
+            'rank': PLACE_TYPE_RANK[anc_type]
+        })
+
+    parents.sort(key=lambda x: x['rank'])
+    for p in parents:
+      del p['rank']
+
+    result_data.append({
+        'node': dcid,
+        'info': {
+            'self': {
+                'dcid': dcid,
+                'type': self_type,
+                'name': self_name
+            },
+            'parents': parents
+        }
+    })


Similarly, could you add some more inline comments explaining these code blocks?

SandeepTuniki added 2 commits March 2, 2026 22:10

Migrate PlaceInfo to V2 API and add verification tests

4f60729

Clean up comments and fix bug in get_series_dates

27b5843

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

SandeepTuniki added 3 commits March 2, 2026 22:26

Untrack verify_live_migration.py to exclude from PR

8951e07

Address review comments: Fix V2 API parsing and optimize BFS

e2a3d92

Fix linting and formatting issues in V2 migration files

1c8e04d

SandeepTuniki mentioned this pull request Mar 3, 2026

chore: Migrate Place Info and Series Dates to V2 API #5959

Closed

SandeepTuniki added 8 commits March 16, 2026 09:31

Merge remote-tracking branch 'origin/master' into migrate-place-info-v2

d3daa46

fix: prioritize specific place types over administrative areas

bcd6392

fix: hardcode United States name for country/USA to match V1 behavior

5740a97

style: fix python lint errors using yapf

f768f55

fix: skip bogus places in get_place_info

4fb6f36

fix: hardcode New York City name for geoId/3651000

8da012e

fix: use recursive traversal in get_series_dates and update unit tests

6bb1972

revert: remove New York City hardcoded name

ba42524

SandeepTuniki changed the title ~~Clone of #5959~~ Migrate Place Info and Series Dates to V2 API Mar 16, 2026

feat: Update goldens from Cloud Build workflow (build 276e2d55-2442-4…

1029f44

…1f1-80c0-2922463dcc7d)

SandeepTuniki mentioned this pull request Mar 18, 2026

Automated Integration Test Goldens Update from CI #6095

Merged

SandeepTuniki and others added 3 commits March 18, 2026 11:04

Merge remote-tracking branch 'origin/master' into migrate-place-info-…

ac9b296

…v2-golden-updates

refactor: Remove hardcoded name override for country/USA

1665c9c

feat: Update goldens from Cloud Build workflow (build 89397212-0490-4…

9804dfd

…b54-9d0c-4c13795f46b5)

SandeepTuniki mentioned this pull request Mar 18, 2026

Automated Integration Test Goldens Update from CI #6098

Merged

SandeepTuniki added 6 commits March 18, 2026 12:00

Fix: remove unused json import in migration_verification_test.py

1f44722

Refactor: remove unused endpoints and move PLACE_TYPE_RANK to constants

cd17e63

Fix: remove unused variable 'ent' in get_series_dates

8e760dc

Refactor: apply minor cleanups in datacommons.py

6d7058e

Test: Update webdriver event page tests to expect United States of Am…

eddf2cf

…erica

Fix: Python style formatting in get_series_dates

908c4a7

SandeepTuniki marked this pull request as ready for review March 18, 2026 08:59

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

server/services/datacommons.py Outdated Show resolved Hide resolved

Optimize: remove unused 'value' from v2observation in get_series_dates

cc0afe3

SandeepTuniki requested a review from juliawu March 18, 2026 09:17

SandeepTuniki mentioned this pull request Mar 18, 2026

[Don't review] Test PR for golden checks #6096

Closed

Fix: apply Python style formatting for v2observation call

26c1c55

juliawu approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Place Info and Series Dates to V2 API#6047

Migrate Place Info and Series Dates to V2 API#6047
SandeepTuniki wants to merge 25 commits intomasterfrom
migrate-place-info-v2

SandeepTuniki commented Mar 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SandeepTuniki commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

juliawu left a comment

Uh oh!

juliawu Mar 18, 2026

Uh oh!

juliawu Mar 19, 2026

Uh oh!

juliawu Mar 19, 2026

Uh oh!

juliawu Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SandeepTuniki commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SandeepTuniki commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

juliawu left a comment

Choose a reason for hiding this comment

Uh oh!

juliawu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

juliawu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

juliawu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

juliawu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SandeepTuniki commented Mar 2, 2026 •

edited

Loading