Data release US CPI pvmap fix#1906
Conversation
|
@HarishC727 : Please add the description of the work done in this PR. Please ensure the changes you are doing are captured in README file as well |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical data ingestion problem for US CPI data, specifically concerning the accurate mapping of geographical place names. By enhancing the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the issue of place resolution in the US CPI data by adding comprehensive place name mappings to us_cpi_pvmap.csv. This approach of including both quoted and unquoted versions of place names is a robust solution for handling variations in source data. The change in us_cpi_metadata.csv to comment out the places_resolved_csv reference indicates a shift in the place resolution strategy, which aligns with the changes in the pvmap. However, there are a couple of issues in the us_cpi_pvmap.csv that need attention: an incomplete entry for a place and an unnecessary empty line.
| Detroit-Warren-Dearborn,observationAbout,geoId/C19820 | ||
| "Detroit-Warren-Dearborn, MI",observationAbout,geoId/C19820 | ||
| Chicago-Naperville-Elgin,observationAbout,geoId/C16980 | ||
| "Chicago-Naperville-Elgin, IL-IN-WI", |
There was a problem hiding this comment.
The entry for "Chicago-Naperville-Elgin, IL-IN-WI" appears to be incomplete. It is missing the observationAbout property and its corresponding value (e.g., a geoId). This will likely lead to parsing errors or incorrect data mapping during processing.
"Chicago-Naperville-Elgin, IL-IN-WI",observationAbout,geoId/C16980
| West South Central(4),observationAbout,usc/WestSouthCentralDivision | ||
| Mountain(4),observationAbout,usc/MountainDivision | ||
| Pacific(4),observationAbout,usc/PacificDivision | ||
| , |
There was a problem hiding this comment.
|
LGTM |
The deletions were happening as the places were in “” format and the pvmap only had half string and places were not getting picked during processing.
Example:
Source: “Detroit-Warren-Dearborn, MI”
Pvmap : Detroit-Warren-Dearborn
On debugging, the data from source was getting split after comma and the place was not getting resolved
I added the full place in pvmap and the issue is resolved.