Skip to content

DavidGzzMilan/pg-sync-guard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pg-sync-guard

pg-sync-guard is now an extension-first project for efficient logical-replication verification.

The repository currently centers on:

  • pg_sync_guard/: a PostgreSQL extension that computes and persists stable per-bucket hashes inside each database
  • cmd/syncguard-cli/: the first Go CLI foundation for comparing publisher and subscriber bucket catalogs

The old Python prototype and dashboard have been removed so the repository matches the current product direction.

Current architecture

Install pg_sync_guard on both publisher and subscriber:

  • each side computes local bucket hashes
  • the extension tracks dirty buckets and recomputes only the buckets affected by changes in the normal case
  • the Go CLI compares syncguard.bucket_catalog across both sides
  • an optional control plane database can store verification-job history and divergence records

Repository layout

  • pg_sync_guard/
    • pgrx extension crate
    • dynamic per-database background worker
    • bucket catalog, dirty bucket queue, worker state, and helper SQL functions
  • cmd/syncguard-cli/
    • Go CLI entrypoint
    • currently includes MVP verify command scaffolding
  • internal/
    • shared Go packages for config, DB access, extension reads, comparison, reporting, and control-plane writes
  • docs/PG_EXTENSION.md
    • extension usage, worker model, and example queries
  • docs/CONTROL_PLANE.md
    • optional control-plane schema for verification history
  • docs/REQUIRED_GRANTS.md
    • grants for extension access, table registration, and control-plane access

Extension features

  • stable per-bucket hashes stored in syncguard.bucket_catalog
  • dirty bucket queue in syncguard.dirty_buckets
  • dynamic worker started per database
  • initial full sweep plus incremental recomputation
  • trigger-based dirty marking, including subscriber-side logical replication changes via ENABLE ALWAYS TRIGGER
  • reconstructable runtime state stored in UNLOGGED tables to reduce WAL pressure

Quick start

1. Build / run the extension during development

From pg_sync_guard/:

cargo pgrx run

2. Create the extension

CREATE EXTENSION pg_sync_guard;

3. Register a monitored table

SELECT syncguard_register_table('public', 'my_table', 'id', 5000);

4. Confirm the worker is running

SELECT syncguard_worker_running();

5. Inspect bucket hashes

SELECT *
FROM syncguard.bucket_catalog
ORDER BY schema_name, table_name, bucket_id;

6. Run the CLI verify command

With Go available locally:

go run ./cmd/syncguard-cli verify \
  --publisher-dsn "$SYNCGUARD_PUBLISHER_DSN" \
  --subscriber-dsn "$SYNCGUARD_SUBSCRIBER_DSN"

Optional flags:

  • --schema public
  • --table my_table
  • --consistency-mode stable-watermark
  • --stability-buffer-ms 250
  • --stability-retries 1
  • --min-coverage-pct 80
  • --live-fallback-for-dirty
  • --live-fallback-dirty-age-ms 2000
  • --json
  • --control-dsn "$SYNCGUARD_CONTROL_DSN" --write-control-plane

verify now defaults to stable-watermark mode. In that mode, the CLI:

  • reads each side's database time and syncguard.naptime_ms
  • computes a conservative shared cutoff in the past
  • compares only buckets that are dirty = false and whose last_computed_at is older than that cutoff on both sides
  • optionally re-reads the eligible bucket set to see whether the snapshot stabilizes

This is a best-effort consistency window that reduces false positives from in-flight rehashing. It is not a strict cross-node snapshot guarantee.

If too many buckets are still unstable, verify now reports an explicit coverage warning instead of looking like a fully trustworthy clean run.

For hotter tables, you can enable a more aggressive mode:

go run ./cmd/syncguard-cli verify \
  --publisher-dsn "$SYNCGUARD_PUBLISHER_DSN" \
  --subscriber-dsn "$SYNCGUARD_SUBSCRIBER_DSN" \
  --consistency-mode stable-watermark \
  --min-coverage-pct 80 \
  --live-fallback-for-dirty \
  --live-fallback-dirty-age-ms 2000

In that mode, buckets that remain dirty beyond the configured age threshold are foreground-hashed directly from the live publisher and subscriber tables, so long-dirty ranges do not stay invisible forever.

7. Inspect one mismatched bucket

After verify reports a mismatched bucket, drill into the affected PK window:

go run ./cmd/syncguard-cli inspect \
  --publisher-dsn "$SYNCGUARD_PUBLISHER_DSN" \
  --subscriber-dsn "$SYNCGUARD_SUBSCRIBER_DSN" \
  --schema public \
  --table my_table \
  --bucket-id 42

This reads syncguard.monitored_tables to discover the PK column and bucket size, then compares the full rows inside that bucket range on publisher and subscriber.

Note: inspect requires the CLI role to have direct SELECT on the target application table, not just on the syncguard schema objects.

For each row-level divergence, inspect now also generates a suggested repair SQL plan for the subscriber:

  • INSERT ... ON CONFLICT DO UPDATE when the publisher row should be copied to the subscriber
  • DELETE when the subscriber has an extra row that is missing on the publisher

8. Explicitly apply planned repairs

When you want to execute the suggested repair SQL against the subscriber, use repair:

go run ./cmd/syncguard-cli repair \
  --publisher-dsn "$SYNCGUARD_PUBLISHER_DSN" \
  --subscriber-dsn "$SYNCGUARD_SUBSCRIBER_DSN" \
  --schema public \
  --table my_table \
  --bucket-id 42

repair re-runs the bucket inspection, rebuilds the repair plan, and applies the statements to the subscriber in a single transaction. It never runs implicitly as part of verify or inspect.

Note: subscriber-side repairs fire SyncGuard's dirty-bucket trigger. In the current codebase, those helper functions are meant to run with extension-owner privileges, so the CLI repair role should only need DML rights on the target subscriber table plus the read access documented in docs/REQUIRED_GRANTS.md.

If --control-dsn is provided to repair, the CLI also marks matching open syncguard.divergence_log rows for that bucket as resolved and sets reviewed_at / resolved_at.

Documentation

  • extension guide: docs/PG_EXTENSION.md
  • control plane: docs/CONTROL_PLANE.md
  • grants: docs/REQUIRED_GRANTS.md

Current CLI scope

The current Go CLI foundation includes:

  • config loading from flags and environment variables
  • PostgreSQL connectivity using pgx
  • reads from syncguard.bucket_catalog
  • reads from syncguard.monitored_tables
  • publisher/subscriber bucket comparison
  • stable-watermark verification mode with skipped unstable buckets
  • coverage warnings when stable verification sees too little of the catalog
  • optional live fallback hashing for long-dirty buckets
  • bucket-level row drill-down with inspect
  • suggested subscriber repair SQL from row-level diffs
  • explicit subscriber-side apply flow with repair
  • text or JSON output
  • optional control-plane inserts for validation_runs and divergence_log

The next CLI steps are controlled execution and packaging.

Near-term roadmap

  • execute approved remediation workflows from the CLI
  • package the CLI for .rpm / .deb delivery

Backlog

  • add a real extension upgrade path so SQL/object changes are applied through ALTER EXTENSION UPDATE
  • add a stronger extension-assisted epoch/barrier verification model for true coordinated cross-node comparisons

About

AI powered tool to verify data sync between PG logical replication subscriber and publisher.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors