[MongoDB Storage] Pre-calculate checksums when compacting #341

rkistner · 2025-08-27T15:13:55Z

Currently, checksums are calculated by summing over all data in each bucket. We then cache it in-memory, and incrementally update it with new data afterwards.

The issue is that for large buckets, the initial summing can be very slow and time out. #338 mitigates the issue by increasing the timeout, but this can still cause large delays when users connect after the process was restarted.

We could theoretically keep a checksum per bucket up-to-date while replicating, but in practice we may need older checksums for the last minute or two, which this won't provide.

So the workaround here is to pre-compute checksums for each bucket as part of the compact process, requiring very little additional overhead. If we assume a daily compact job, this would give a cached checksum that covers most cases unless the majority of the bucket was created in the last day.

Additionally, this starts calculating some stats per bucket: total number and size of operations at the last compact, and since then. This is not 100% accurate/consistent in all cases, but it would be a starting point for scheduling more incremental/on-demand compact jobs based on the number of new operations in each bucket (future PR).

TODO:

After an initial replication, we need another compact before the checksums are cached, which could result in a large period where users cannot sync due to the timeout. We need to run a compact before switching over to the newly-replicated copy.

Alternatives

We could apply a caching technique similar to the current in-memory caching: Cache a series of past checksums, and expire them occasionally.

Caveats with this approach:

It would still not completely solve the case of the first checksum calculation being slow.
Managing/expiring these would become more tricky.

changeset-bot · 2025-08-27T15:13:59Z

🦋 Changeset detected

Latest commit: 5da2232

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages

Name	Type
@powersync/service-module-postgres-storage	Minor
@powersync/service-module-mongodb-storage	Minor
@powersync/service-core-tests	Minor
@powersync/service-module-postgres	Minor
@powersync/service-module-mongodb	Minor
@powersync/service-core	Minor
@powersync/service-module-mysql	Minor
@powersync/service-schema	Minor
@powersync/service-image	Minor
@powersync/service-module-core	Patch
test-client	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

stevensJourney

Looks good to me. I couldn't spot any issues from my side.

rkistner added 4 commits August 27, 2025 13:00

Record bucket checksums and basic stats during compact.

3155aaf

Use compacted checksums.

c4ae9f5

Fix tests and checksum merging.

c6f866b

Refactor merging of checksums; add another test.

f3fbc29

rkistner added 3 commits August 28, 2025 09:46

Merge remote-tracking branch 'origin/main' into compact-checksums

6aa6aec

Populate checksum cache after initial replication.

cd8dea4

Changeset.

c821420

rkistner marked this pull request as ready for review August 28, 2025 08:40

Merge remote-tracking branch 'origin/main' into compact-checksums

5da2232

rkistner requested review from stevensJourney and simolus3 August 28, 2025 08:57

stevensJourney approved these changes Aug 28, 2025

View reviewed changes

rkistner merged commit 6d4a4d1 into main Aug 28, 2025
22 checks passed

rkistner deleted the compact-checksums branch August 28, 2025 11:56

rkistner mentioned this pull request Aug 28, 2025

Initial replication checksums #343

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MongoDB Storage] Pre-calculate checksums when compacting #341

[MongoDB Storage] Pre-calculate checksums when compacting #341

Uh oh!

rkistner commented Aug 27, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Aug 27, 2025 •

edited

Loading

Uh oh!

stevensJourney left a comment

Uh oh!

Uh oh!

Uh oh!

[MongoDB Storage] Pre-calculate checksums when compacting #341

[MongoDB Storage] Pre-calculate checksums when compacting #341

Uh oh!

Conversation

rkistner commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alternatives

Uh oh!

changeset-bot bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

stevensJourney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rkistner commented Aug 27, 2025 •

edited

Loading

changeset-bot bot commented Aug 27, 2025 •

edited

Loading