feat: move data normalization from Python consumers to Rust extractor… #1141
Annotations
4 warnings and 2 notices
|
Complete job
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/setup-python@v5, astral-sh/setup-uv@e92bafb6253dcd438e0484186d7669ea7a8ca1cc. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
📢 Send notification to Discord
embed field value must be shorter than 1024, got 3426
[`accb054`](https://github.com/SimplicityGuy/discogsography/commit/accb054bd4aba1dd2300a05026a0aca2f881e869) feat: move data normalization from Python consumers to Rust extractor (#290) (#294)
* docs: add design spec for moving normalizer logic to Rust extractor (#290)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add implementation plan for normalizer-to-extractor migration (#290)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(extractor): add normalize.rs with generic helpers
Add strip_at_prefixes, unwrap_container, and ensure_list functions
for transforming XML-style JSON conventions into flat format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(extractor): add artist normalization to normalize.rs
Add normalize_item_list helper and normalize_artist function to flatten
members, groups, and aliases from XML container format to flat arrays.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(extractor): add label normalization to normalize.rs
Add normalize_label function to handle parentLabel strip_at_prefixes
and sublabels container flattening.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(extractor): add master normalization to normalize.rs
Add normalize_string_list helper and normalize_master function to handle
@id stripping, artists container, genres, and styles flattening.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(extractor): add release normalization to normalize.rs
Add normalize_release function handling artists, labels, master_id
extraction, genres, styles, extraartists, and formats with @-prefix
stripping. Includes full pipeline integration test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(extractor): address code review — clarify doc, add edge case tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(extractor): wire normalize_record into validator pipeline
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: simplify Python normalizer — structural normalization moved to Rust extractor
- Gut data_normalizer.py to only retain year parsing and normalize_record()
- Replace extract_format_names with inline list comprehension in graphinator
- Rewrite normalizer tests for simplified module
- Update all test fixtures to flat extractor output format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(extractor): reorder normalize after evaluate_rules to preserve rule path resolution
Rules use dot-notation paths like "genres.genre" that match the XML
structure. Moving normalize_record after evaluate_rules ensures rules
operate on the pre-normalized shape while the content hash still
reflects the normalized output consumers see.
Also updates stale comment in tableinator.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(extractor): add coverage tests for defensive branches in normalize.rs
Covers: non-object inputs to normalizers, string/number items in
normalize_item_list, bare string in unwrap_container, non-object
format items. Raises line coverage from 93.4% to 97.7%.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
🚀 Build and push Docker image to GitHub Container Registry - discogsography/schema-init:
schema-init/Dockerfile#L92
SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ENV "POSTGRES_PASSWORD")
More info: https://docs.docker.com/go/dockerfile/rule/secrets-used-in-arg-or-env/
|
|
🚀 Build and push Docker image to GitHub Container Registry - discogsography/schema-init:
schema-init/Dockerfile#L92
SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ENV "NEO4J_PASSWORD")
More info: https://docs.docker.com/go/dockerfile/rule/secrets-used-in-arg-or-env/
|
|
📊 Collect metrics
Docker cache hit for schema-init
|
|
📊 Collect metrics
Service: schema-init, Duration: 184s, Cache Used: true
|
background
wait
wait-all
cancel
Loading