Releases: datacommonsorg/import
Releases · datacommonsorg/import
v0.4.0
What's Changed
- Import tool fixes by @vish-cs in #470
- Ingestion pipeline fixes by @vish-cs in #471
- Refactor ingestion pipeline by @vish-cs in #473
- Add initial Data Commons Platform database simple import support by @dwnoble in #469
- Updated cdc import script to write nodes to a specified DCP instance by @dwnoble in #476
- Add unit/integration test for ingestion pipeline by @vish-cs in #474
- Use partitioned DML for efficient delete by @vish-cs in #478
- Fix blue-green import (undefined variables) by @jm-rivera in #479
- Skip flattening for mutations across imports by @vish-cs in #480
- Add constraint property graph mutation by @vish-cs in #488
- feat: dc-import-tool mcf resolution was slow for large number of mcfs by @rohitkumarbhagat in #486
- Add a readme for graph ingestion pipeline by @vish-cs in #489
- Update cloud build config to use parameters by @vish-cs in #490
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- Create spanner tables in ingestion pipeline by @n-h-diaz in #426
- Add basic ingestion logic for individual imports by @vish-cs in #428
- Fix mcf parser error when dealing with lists of quoted strings by @jm-rivera in #430
- Support custom ID namespace, SVG prefix, root SVG name by @jm-rivera in #431
- Don't write placeholder nodes to spanner by @n-h-diaz in #435
- Add test for nodes with type Thing by @n-h-diaz in #432
- Add human-readable timestamp fields to RuntimeMetadata by @rohitkumarbhagat in #440
- Update spanner ddl creation by @n-h-diaz in #433
- Add retry logic using Failsafe library for API connection failures by @rohitkumarbhagat in #442
- Ingestion dataflow pipeline integration with workflow by @vish-cs in #443
- feat: Ignore bin directories by @rohitkumarbhagat in #446
- Fix Report Generated At in Summary report by @rohitkumarbhagat in #445
- fix: align cloudbuild jar path with maven output by @rohitkumarbhagat in #447
- Handle deletions in ingestion pipeline by @vish-cs in #444
- Add import workflow tables to spanner schema by @vish-cs in #450
- fix: avoid duration parse failure by @rohitkumarbhagat in #451
- Add version logging to Processor by @rohitkumarbhagat in #452
- Refactor ExternalIdResolver to use v2 resolve apis by @keyurva in #455
- Migrate ApiHelper to /v2/node api by @rohitkumarbhagat in #453
- Data loading improvements (1/3): vectorising data processing by @jm-rivera in #448
- Data loading improvements (2/3): blue-green strategy by @jm-rivera in #449
- Add graph ingestion metrics counters by @vish-cs in #458
- Add cloudbuild config for dataflow template by @vish-cs in #459
- Updates to ingestion for dc_graph_2025_11_07 by @n-h-diaz in #457
- Fix cloud build by @vish-cs in #460
- Add spanner params to flex template metadata by @vish-cs in #462
- Add DC API key to cloudbuild config by @vish-cs in #463
- Add Cloud SQL private IP connection support by @echo-chamber0 in #464
- Use prod V2Resolve API for CoordinatesResolver. by @clincoln8 in #467
New Contributors
- @jm-rivera made their first contribution in #430
- @echo-chamber0 made their first contribution in #464
- @clincoln8 made their first contribution in #467
Full Changelog: 0.1...v0.3.0
0.1
0.1-alpha.1k
Includes the following fixes:
- Support for CSVs from MS Excel (with BOM characters)
- Allows empty column names in CSV
- Propagate exceptions better so we don't fail silently when bad things happen
0.1-alpha.1j
Bug Fixes:
- Fixes exceptions thrown when a series has a mix of numeric and non-numeric values
0.1-alpha.1h
Highlights
- Added support for categorical variables (SVs with
statType: measurementResult) - Performance optimizations with ~25% expected speed gains
Changelog
New checks and verifications
- Added support for categorical variables
- Any non-numeric StatVarObservation values must now be explicitly allowed with the
--allow-non-numeric-obs-values=trueflag. - Categorical variables (SVs with
statType: measurementResult) can be checked for existence by specifying--check-measurement-result=true. - Some common checks (inconsistent values, date gaps) apply to all time series, including categorical variables.
- Any non-numeric StatVarObservation values must now be explicitly allowed with the
Speed Optimizations
- Added a heuristic to date checking, vastly improving speed for the “correct-path”. Expected improvement is ~25%
Summary Report changes
- Added “expand/collapse all” buttons that work for all collapsible tags on the report
- Changed chart style to highlight data points and support datasets with many data points
Bug fixes
- Fixed issue where the line number of the last CSV row was incorrect
- Fixed issue where logs were duplicated when more than two values had the same date
- Fixed flaky ordering of output in some test goldens
0.1-alpha.1g
What’s new:
-
Improvements to speed when using the tool;
- Allow external IDs to be resolved using local side MCF, saving on the need to first get new external IDs updated in the reconciliation API which could take days before those IDs were verified by the tool.
- Optimized performance for an estimated ~10% raw speed boost.
-
Expanded checks to catch more issues and support additional data types;
- Existence checks for “observationAbout” references (behind a new flag
-ep) - Expanded validation to recently introduced statTypes (confidence interval {upper, lower} limit, kurtosis, skewness, growth rate).
- Support schemaless SVs with init-cap mprop
- Existence checks for “observationAbout” references (behind a new flag
-
Added documentation for;
- Tool usage (docs/usage.md)
- Error counters (docs/counters.md)
- Complex Values (docs/complex_values.md)
-
Summary Report improvements;
- Added missing observationPeriods field
- Added table of contents
- Made tables sortable on-click
- Separated the display of time series facets
- Displayed human-readable names for places, taking priority over dcid
- Improved sample place heuristics
-
Bug fixes
- Fix issue where a time series with a single datapoint smaller than -1 would cause a fatal crash
- Fix order of census area code for resolution
0.1-alpha.1f
What's new:
- Fix HTTP exception in DC calls in Java 11.x version
- Fix runtime errors in chart generation
- Remove the requirement for StatVars to have a populationType
- Fix bug in percentile* statType validation
0.1-alpha.1e
This release includes:
- Support for generating an HTML Summary Report
- Enabled by default. To disable, pass -sr=false
- Upgrade log4j version to 2.16.0
- Minor bug fixes and updates
0.1-alpha.1d
This release includes:
- Support for Stat Checker
- Enabled by default. To disable, pass
-s=false
- Enabled by default. To disable, pass
- Support for Resolution (aka resolving local-refs and generating dcids for nodes)
- Defaults to local mode (
-r=LOCAL), for use when you already reference place DCIDs. - To resolve external IDs to DCIDs for places, pass
-r=FULL. This will make Recon API calls. - To disable resolution, pass
-r=NONE
- Defaults to local mode (
- Support for parallel processing of CSV files
- Parallel processing happens when there are multiple CSV files
- Defaults to no parallelism. Set
-n=<number-of-threads>to increase parallelism
- More batching for existence checks
- This is enabled by default. To disable, pass
-e=false
- This is enabled by default. To disable, pass
- Changes in default output directory
- Default is now
dc_generated/in the current directory. To change, set-o=<your-directory>
- Default is now