feature: Layer files are deduplicated on creation based on the science data#268
Merged
alastairtree merged 14 commits intomainfrom May 1, 2026
Merged
feature: Layer files are deduplicated on creation based on the science data#268alastairtree merged 14 commits intomainfrom
alastairtree merged 14 commits intomainfrom
Conversation
…librate with database When running SetQualityAndNaN calibration twice with SaveMode.LocalAndDatabase, the second run produced identical data CSV content. DBIndexedDatastoreFileManager found a hash match at v001 in the DB and redirected the handler back to v001, while the layer JSON had already been written referencing data_filename=v002.csv. Fixed by allowing layer files version to be locked. Bumps version to 5.1.1.
Previously, calibration layers were pre-versioned before MATLAB was called, which was fragile and required callers to predict the next available slot upfront. Now layers are always generated at v001 and deduplication/versioning happens in the datastore managers. Two layers are considered identical when their companion CSV hash and content date match. Key changes: - Add content-identity protocol to IFilePathHandler (get_content_identity, get_stored_content_identity, prepare_for_version, get_storage_meta, is_version_blocked_by_sibling) - CalibrationLayerPathHandler overrides these: JSON identity is the companion CSV hash; prepare_for_version rewrites data_filename in JSON when the assigned version differs from v001; is_version_blocked_by_sibling ensures JSON and CSV always land on the same version slot - DatastoreFileManager and DBIndexedDatastoreFileManager updated to use the new protocol for dedup and sibling-aware versioning - Remove upfront set_layer_to_next_viable_version from CalibrationJob and the raise_if_resequenced guard from SetQualityAndNaNCalibration - Remove version-locking from VersionedPathHandler - Tests updated throughout to reflect that MATLAB always receives v001 paths and version bumping is verified via datastore output
alastairtree
commented
Apr 28, 2026
e51c648 to
ac9b987
Compare
Contributor
|
Coverage Report (3.14)
Minimum allowed coverage is Generated by 🐒 cobertura-action against ee0cc09 |
For calibration layer JSON files the content identity for deduplication is now the companion CSV data hash, not the JSON file hash. This means re-versioned JSONs (which only differ in their data_filename reference) correctly deduplicate against an existing record. Changes: - CalibrationLayerPathHandler.get_content_identity: use raw JSON parsing to read data_hash from metadata; fallback to companion CSV hash; final fallback to JSON file hash when neither is available (e.g. empty stub files in the datastore) - CalibrationLayerPathHandler.prepare_for_version: switch to raw JSON parsing so it works on minimal test JSONs and files without a co-located companion CSV - DBIndexedDatastoreFileManager: dedup comparison now also checks file_meta["data_file_hash"] so that records written with the CSV hash stored in metadata are found; stores data_file_hash in file_meta whenever content identity differs from the raw file hash - Tests: fix CSV content and hash computation to survive the pandas round-trip; consolidate duplicate imports
… identity The companion CSV hash for calibration layers is now stored directly in file.hash (via get_content_identity), making the separate data_file_hash field in file_meta redundant. Pre-existing DB records have hash=csv_hash so the dedup comparison reduces to the standard f.hash == identity_hash check.
…r pairs All tests now create genuine CalibrationLayer JSON+CSV pairs (with computed data_hash) via a shared write_calibration_layer_pair helper in tests/util/miscellaneous.py. Removed _generate_layer_json, empty touch() stubs, and raw CSV strings. Tests that need a half-written datastore state (only-JSON, only-CSV) create a real pair then delete the unwanted file.
…handler _companion_csv_path and get_content_identity now use CalibrationLayer.from_file(load_contents=False) to read data_filename and data_hash rather than raw json.loads, keeping the handler strongly typed and consistent with the rest of the calibration layer API.
Collaborator
Author
|
@mhairifin this is a second attempt to replace #265 |
# Conflicts: # tests/test_DBIndexedDatastoreFileManager.py
Collaborator
Author
|
I have now tested this and it migrates all the layer files to the new format and they are now deduped on save as expected. @mhairifin I am merging but please do a review and i will pick up and fixes later |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…librate with database
When running SetQualityAndNaN calibration twice with SaveMode.LocalAndDatabase,
the second run produced identical data CSV content. DBIndexedDatastoreFileManager
found a hash match at v001 in the DB and redirected the handler back to v001,
while the layer JSON had already been written referencing data_filename=v002.csv.
Fixed by allowing layer files version to be locked.
Bumps version to 5.1.1.