Skip to content

Conversation

@kev-cao
Copy link
Contributor

@kev-cao kev-cao commented Dec 22, 2025

The OR recovery roachtest was failing due to the schemachange workload truncating a table and the roachtest only deleting from the full backup. In certain scenarios, the SST deleted from the full backup would end up being skipped in the restore due to the span it covered never being covered after truncation.

This commit teaches OR to instead delete an SST from every layer of the backup collection, which should make the test resilient to truncation.

Fixes: #159503

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kev-cao kev-cao force-pushed the roachtest/or-recovery-fix branch from 11d4885 to f05d846 Compare December 22, 2025 20:40
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Dec 22, 2025
@kev-cao kev-cao force-pushed the roachtest/or-recovery-fix branch from f05d846 to e0b8dd9 Compare December 22, 2025 20:45
@kev-cao kev-cao added O-AI-Review-Real-Issue-Found AI reviewer found real issue and removed o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. labels Dec 22, 2025
@kev-cao kev-cao changed the title roachtest: deflake online-restory recovery roachtest: deflake online-restore recovery Dec 22, 2025
@kev-cao kev-cao requested review from Copilot and msbutler December 23, 2025 20:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a flaky roachtest in the online-restore recovery functionality. The test was failing when the schemachange workload truncated tables, because the test only deleted SSTs from the full backup layer. After truncation, the deleted SST could be skipped during restore if its span was no longer covered.

Key Changes:

  • Renamed deleteUserTableSST to deleteSSTFromBackupLayers to reflect the new behavior
  • Modified the function to delete one SST from each layer (directory) of the backup collection instead of just one SST from the full backup
  • Updated SQL query to use a CTE with regex-based directory extraction and window functions to select one SST per layer

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/cmd/roachtest/tests/mixed_version_backup.go Refactored SST deletion logic to delete one SST from each backup layer, making the test resilient to table truncation
pkg/cmd/roachtest/tests/backup_restore_roundtrip.go Updated function call to use the renamed deleteSSTFromBackupLayers method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The OR recovery roachtest was failing due to the schemachange workload
truncating a table and the roachtest only deleting from the full backup.
In certain scenarios, the SST deleted from the full backup would end up
being skipped in the restore due to the span it covered never being
covered after truncation.

This commit teaches OR to instead delete an SST from every layer of the
backup collection, which should make the test resilient to truncation.

Fixes: cockroachdb#159503

Release note: None
@kev-cao kev-cao force-pushed the roachtest/or-recovery-fix branch from e0b8dd9 to 1051aac Compare December 23, 2025 20:33
Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! if you haven't already done so, maybe run the test 5 times to make sure there are no further related flakes?

@kev-cao
Copy link
Contributor Author

kev-cao commented Dec 23, 2025

@msbutler Yup, already ran it a couple of times! Although to be honest the flakes for these tests are so infrequent, I think they'd have to run for quite a while. We usually go a couple months before a flake on this test.

@kev-cao kev-cao added backport-25.3.x Flags PRs that need to be backported to 25.3 backport-25.4.x Flags PRs that need to be backported to 25.4 backport-26.1.x Flags PRs that need to be backported to 26.1 labels Dec 23, 2025
@kev-cao
Copy link
Contributor Author

kev-cao commented Dec 23, 2025

TFTR!

bors r=msbutler

@craig
Copy link
Contributor

craig bot commented Dec 23, 2025

@craig craig bot merged commit 82a0890 into cockroachdb:master Dec 23, 2025
25 checks passed
@blathers-crl
Copy link

blathers-crl bot commented Dec 23, 2025

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #159503: branch-release-25.3, branch-release-25.4.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-25.3.x Flags PRs that need to be backported to 25.3 backport-25.4.x Flags PRs that need to be backported to 25.4 backport-26.1.x Flags PRs that need to be backported to 26.1 O-AI-Review-Real-Issue-Found AI reviewer found real issue target-release-26.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: backup-restore/online-restore-recovery failed

3 participants