Fix unaccessed RI key data lost after second checkpoint purge#1756
Draft
tiagonapoli wants to merge 3 commits intodevfrom
Draft
Fix unaccessed RI key data lost after second checkpoint purge#1756tiagonapoli wants to merge 3 commits intodevfrom
tiagonapoli wants to merge 3 commits intodevfrom
Conversation
After recovery, if an RI key is never accessed before a second checkpoint: - SnapshotAllTreesForCheckpoint skips it (not in liveIndexes) - PurgeOldCheckpointSnapshots deletes the old checkpoint snapshot - Next access fails with 'range index not found' — data is lost This test is expected to FAIL (no fix included). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verify disk file state at each stage: snapshot files after first checkpoint, LiveIndexCount after recovery, snapshot purge after second checkpoint, and data.bftree survival. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add file-level assertions to RIUnaccessedKeyAfterRecoveryAndSecondCheckpointTest verifying snapshot state at each stage. Add RIUnaccessedKeyLostAfterSecondRecoveryTest demonstrating that an unaccessed RI key is lost after: checkpoint -> recover -> access only other keys -> second checkpoint (purges old snapshot) -> second recover (no snapshot to restore from). Both tests currently fail, documenting the bug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After checkpoint recovery, if an RI key is never accessed before a second checkpoint is taken, its data is permanently lost.
Root Cause
BfTree restore is lazy — trees are only restored when accessed. But
PurgeOldCheckpointSnapshots(called atCheckpointCompleted) deletes allsnapshot.*.bftreefiles that don't match the current checkpoint token. Since the unrestored tree was never registered inliveIndexes,SnapshotAllTreesForCheckpointskips it — no new snapshot is created. The old snapshot is then purged, and noflush.bftreeexists either. Next access fails withERR range index not found.Scenario:
snapshot.{token1}.bftreeFlagRecovered=true,TreeHandle=0liveIndexes) — skip othersSnapshotAllTreesForCheckpoint: only snapshots trees inliveIndexes— unaccessed trees skippedPurgeOldCheckpointSnapshots: deletessnapshot.{token1}.bftree(doesn't match token2)flush.bftreeeither — data lostTesting
RIUnaccessedKeyAfterRecoveryAndSecondCheckpointTest— currently fails withERR range index not found, confirming the bug