Summary
Add a background repair mechanism so full-replica regions periodically compare state and heal missed writes, delayed deletes, or partial bootstrap gaps.
Why
Outbox retries help with transient delivery failures, but they do not replace periodic reconciliation. A repair pass is the safety net that makes eventual consistency converge.
Scope
- choose a repair unit (manifest range, time window, project prefix, or hash partition)
- compare local and remote manifests efficiently
- copy missing blobs and reconcile stale metadata
- make repair incremental and rate-limited so it does not impact hot build traffic
- expose repair metrics and add e2e coverage for healing a deliberately diverged node
Notes
This should complement, not replace, normal asynchronous replication.
Summary
Add a background repair mechanism so full-replica regions periodically compare state and heal missed writes, delayed deletes, or partial bootstrap gaps.
Why
Outbox retries help with transient delivery failures, but they do not replace periodic reconciliation. A repair pass is the safety net that makes eventual consistency converge.
Scope
Notes
This should complement, not replace, normal asynchronous replication.