Skip to content

Scale the DAG #833

@Kukks

Description

@Kukks

We want a schema and flow that:

  • Scales to very deep DAGs (20k+ depth).
  • Keeps sweeping as a cheap bulk operation.
  • Lets us query GetVtxoChain and ListVtxos (is_swept) efficiently.

Core ideas:

  1. Track a depth on every vtxo (longest-path depth).
  2. Every 100 depths, create a marker in a markers table.
  3. Each marker records all parent markers (multi‑parent DAG).
  4. A swept_markers table isolates sweep state as bulk inserts only.
  5. vtxo gets a markers text[] column (the markers that “cover” it).
  6. Drop vtxo.swept; compute is_swept by joining against swept_markers.
  7. GetVtxoChain uses markers to chunk recursion / traversal into healthy pieces.

1. Schema Changes

1.1 vtxo table

ALTER TABLE vtxo
    ADD COLUMN depth INT NOT NULL DEFAULT 0;

ALTER TABLE vtxo
    ADD COLUMN markers TEXT[]; -- e.g. ARRAY['txid:vout', 'txid2:vout2'];

ALTER TABLE vtxo
    DROP COLUMN swept;

CREATE INDEX CONCURRENTLY idx_vtxo_markers_gin
    ON vtxo USING GIN (markers);

1.2 markers table

CREATE TABLE markers (
    -- A marker is a concrete vtxo outpoint: "txid:vout"
    marker_id      TEXT PRIMARY KEY,
    depth          INT NOT NULL,
    parent_markers TEXT[] NOT NULL  -- array of marker_ids (multi-parent DAG)
);

CREATE INDEX idx_markers_parent_gin
    ON markers USING GIN (parent_markers);

1.3 swept_markers table

CREATE TABLE swept_markers (
    marker_id TEXT PRIMARY KEY,
    swept_at  BIGINT NOT NULL   -- unix timestamp
);

2. Ingestion Logic

Goal: For every new vtxo:

  • Compute its depth.
  • Assign its markers text[].
  • Optionally insert a new markers row if it sits on a 100‑depth boundary.

2.1 Depth

For a new tx with inputs parents []Vtxo:

func computeDepth(parents []Vtxo) int {
    maxDepth := 0
    for _, p := range parents {
        if p.Depth > maxDepth {
            maxDepth = p.Depth
        }
    }
    return maxDepth + 1
}

Store depth on the new vtxo.

2.2 Markers and marker rows

  • Interval: MarkerInterval = 100.
  • markerId(v) = v.Txid + ":" + strconv.Itoa(v.Vout).
const MarkerInterval = 100

func assignMarkersAndMaybeCreateMarker(ctx context.Context, db *sql.DB, newVtxo *Vtxo, parents []Vtxo) error {
    // depth already computed
    depth := newVtxo.Depth

    // union parent markers
    parentMarkerSet := make(map[string]struct{})
    for _, p := range parents {
        for _, m := range p.Markers {
            parentMarkerSet[m] = struct{}{}
        }
    }
    parentMarkers := make([]string, 0, len(parentMarkerSet))
    for m := range parentMarkerSet {
        parentMarkers = append(parentMarkers, m)
    }

    if depth%MarkerInterval == 0 {
        // This vtxo becomes a marker
        id := newVtxo.Txid + ":" + strconv.Itoa(newVtxo.Vout)

        _, err := db.ExecContext(ctx, `
            INSERT INTO markers (marker_id, depth, parent_markers)
            VALUES ($1, $2, $3)
            ON CONFLICT (marker_id) DO NOTHING
        `, id, depth, pq.Array(parentMarkers))
        if err != nil {
            return err
        }

        newVtxo.Markers = []string{id}
        return nil
    }

    // Not a marker: inherit parent markers (multi‑parent)
    newVtxo.Markers = parentMarkers
    return nil
}

Insert vtxo and the markers row in the same transaction.


3. Sweeper Logic

Goal: Sweeping should only be bulk inserts into swept_markers. No per‑vtxo updates.

3.1 Sweep from a root marker down

Given a starting marker_id (e.g. root or any “sweep point”), mark it and all descendant markers as swept:

WITH RECURSIVE target_markers AS (
    SELECT marker_id
    FROM markers
    WHERE marker_id = $1

    UNION ALL

    SELECT m.marker_id
    FROM markers m
    JOIN target_markers t
      ON t.marker_id = ANY(m.parent_markers)
)
INSERT INTO swept_markers (marker_id, swept_at)
SELECT marker_id, EXTRACT(EPOCH FROM NOW())::BIGINT
FROM target_markers
ON CONFLICT (marker_id) DO NOTHING;

For multiple starting markers:

WITH RECURSIVE target_markers AS (
    SELECT marker_id
    FROM markers
    WHERE marker_id = ANY($1::text[])

    UNION ALL

    SELECT m.marker_id
    FROM markers m
    JOIN target_markers t
      ON t.marker_id = ANY(m.parent_markers)
)
INSERT INTO swept_markers (marker_id, swept_at)
SELECT DISTINCT marker_id, EXTRACT(EPOCH FROM NOW())::BIGINT
FROM target_markers
ON CONFLICT (marker_id) DO NOTHING;

Sweeper service just calls this with the appropriate root marker(s).


4. ListVtxos / fetchVtxo: is_swept via view

Goal: is_swept must be cheap ($O(1)$) and derived from markers.

CREATE OR REPLACE VIEW vtxo_vw AS
SELECT
    v.*,
    EXISTS (
        SELECT 1
        FROM swept_markers sm
        WHERE sm.marker_id = ANY(v.markers)
    ) AS is_swept
FROM vtxo v;

ListVtxos can now select from vtxo_vw and expose is_swept without touching the sweeper logic.

Example:

SELECT txid, vout, depth, markers, is_swept, amount, expires_at
FROM vtxo_vw
WHERE pubkey = $1
ORDER BY created_at DESC
LIMIT $2;

5. GetVtxoChain(outpoints): chunked DAG via markers

Goal: Given one or more outpoints, return a slim, paginated DAG of child → parents without recursing 20k+ steps in one go.

Two key pieces:

  1. Marker skeleton traversal (very light, every 100 depth).
  2. Gap fill per skeleton segment using depth ranges.

5.1 Marker skeleton (upward)

Start from the marker(s) that cover the initial outpoint(s) and walk parents up, limited per page.

WITH RECURSIVE marker_chain AS (
    -- start marker for this page (derived from vtxo.markers of the starting outpoint)
    SELECT marker_id, parent_markers, depth, 1 AS level
    FROM markers
    WHERE marker_id = $1

    UNION ALL

    SELECT m.marker_id, m.parent_markers, m.depth, mc.level + 1
    FROM markers m
    JOIN marker_chain mc
      ON m.marker_id = ANY(mc.parent_markers)
    WHERE mc.level < $2  -- e.g. limit to 50 markers per page
)
SELECT marker_id, parent_markers, depth
FROM marker_chain;

Client encodes the last marker_id / level into page_token to request the next chunk.

5.2 Gap fill between markers (0–100 depth window)

For each [depth_lo, depth_hi] window (e.g. [1000, 1100]) defined by two markers, fetch the actual tx graph:

SELECT
    t.txid,
    COALESCE(v.expires_at, 0) AS expires_at,
    t.type,
    array_remove(array_agg(DISTINCT p.txid), NULL) AS spends
FROM tx t
JOIN vtxo v
  ON v.txid = t.txid
LEFT JOIN vtxo p
  ON p.spent_by = t.txid
WHERE v.depth BETWEEN $1 AND $2   -- e.g. 1000 and 1100
GROUP BY t.txid, v.expires_at, t.type
ORDER BY v.depth DESC;            -- or ASC, depending on API semantics

Each row maps cleanly to IndexerChain:

message IndexerChain {
  string txid = 1;
  int64  expires_at = 2;
  IndexerChainedTxType type = 3;
  repeated string spends = 4;
}

The service:

  • Uses markers to find depth windows per page.
  • Runs the gap‑fill query for that window.
  • Builds IndexerChain results.
  • Returns a page_token encoding the next marker / window.

6. Why this works

  • Sweeper: Only runs bulk inserts into a small swept_markers table.
  • VTXO rows: Immutable w.r.t sweep; no 20k‑row UPDATEs.
  • is_swept: Simple EXISTS over markers[] × swept_markers.
  • GetVtxoChain: Works by:
    • traversing a 100× compressed marker graph (recursive, but light),
    • then fetching bounded depth ranges per page (no deep recursion on raw vtxos).

This keeps writes cheap, reads predictable, and makes the DAG depth effectively unbounded while still API‑friendly.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions