Skip to content

Add anti-entropy repair for full-replica cache replication #9

Description

@pepicrft

Summary

Add a background repair mechanism so full-replica regions periodically compare state and heal missed writes, delayed deletes, or partial bootstrap gaps.

Why

Outbox retries help with transient delivery failures, but they do not replace periodic reconciliation. A repair pass is the safety net that makes eventual consistency converge.

Scope

  • choose a repair unit (manifest range, time window, project prefix, or hash partition)
  • compare local and remote manifests efficiently
  • copy missing blobs and reconcile stale metadata
  • make repair incremental and rate-limited so it does not impact hot build traffic
  • expose repair metrics and add e2e coverage for healing a deliberately diverged node

Notes

This should complement, not replace, normal asynchronous replication.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions