Skip to content

Conversation

anishshri-db
Copy link
Contributor

What changes were proposed in this pull request?

Add option to limit deletions per maintenance operation associated with rocksdb state provider

Why are the changes needed?

We see some instances where the changelog deletion can take a really long time. This means that for that partition, we also cannot upload full snapshots which affects recovery/replay scenarios. This problem is much more apparent on resource constrained clusters. So, we add an option to allow for incremental cleanup per maintenance operation invocation.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests

[info] Run completed in 17 seconds, 591 milliseconds.
[info] Total number of tests run: 8
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-53794] Add option to limit deletions per maintenance operation associated with rocksdb state provider [SPARK-53794][SS] Add option to limit deletions per maintenance operation associated with rocksdb state provider Oct 3, 2025
@anishshri-db
Copy link
Contributor Author

cc - @HeartSaVioR - PTAL, thx !

@anishshri-db anishshri-db requested a review from ericm-db October 3, 2025 18:57
getStringConf(COMPRESSION_CONF),
storeConf.reportSnapshotUploadLag)
storeConf.reportSnapshotUploadLag,
storeConf.maxVersionsToDelete)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that minVersionToDelete <= maxVersionsToDelete and add a test?

.map(_._1)
.filter(_ <= maxSnapshotVersionPresent - numVersionsToRetain + 1)
.filter( v =>
if (maxVersionsToDelete != -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can probably do (maxVersionsToDelete != -1) && (v <= minSnapshotVersionPresent + maxVersionsToDelete)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants