Skip to content

Bootstrap new full-replica regions with snapshot + live catch-up #10

Description

@pepicrft

Summary

When a new region joins a full-replica cache mesh, it should be able to copy the existing dataset from one or more healthy peers and then switch to live replication without serving stale or partial data.

Why

Today the outbox only guarantees delivery to peers that were already configured as replication targets when the write happened. A newly added region needs an explicit bootstrap flow.

Scope

  • define a bootstrap state machine for joining nodes
  • stream existing manifests and blobs from healthy peers
  • add a replication watermark/checkpoint so the node can switch from snapshot copy to live catch-up safely
  • avoid serving incomplete data until bootstrap is complete, or serve with a degraded/read-through mode explicitly
  • add e2e coverage for node join after existing data is already present

Notes

This is specifically for the full-replica design where every region should eventually hold every object.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions