Skip to content

feat(scraper): add two-stage tip/finalized scraper indexing#8193

Draft
paulbalaji wants to merge 7 commits intomainfrom
pbio/scraper-two-stage-process-7fa5
Draft

feat(scraper): add two-stage tip/finalized scraper indexing#8193
paulbalaji wants to merge 7 commits intomainfrom
pbio/scraper-two-stage-process-7fa5

Conversation

@paulbalaji
Copy link
Collaborator

@paulbalaji paulbalaji commented Feb 24, 2026

Summary

  • add two-stage scraper architecture for message dispatches: finalized + tip
  • keep finalized enriched pipeline unchanged
  • add tip-stage message scrape task (message_dispatch_tip) with reorg_period = None
  • write tip-stage rows only to raw_message_dispatch via reorg-tolerant upsert
  • add cursor stage isolation via cursor_type (finalized | tip) to avoid cursor collisions

Why

  • improve near-head responsiveness without breaking finalized authoritative state
  • avoid sequence-aware finalized store side effects by isolating tip writes

DB Changes

  • migration: m20260224_000007_add_cursor_type
  • add cursor.cursor_type (default finalized)
  • add index: (domain, cursor_type, height)

Migration Apply Notes

  • scraper startup does not auto-run migrations (it only opens DB connections)
  • migration must be applied by ops/migration job (or migration CLI) before rolling out scraper binary

Simple SQL equivalent:

ALTER TABLE cursor
  ADD COLUMN IF NOT EXISTS cursor_type TEXT NOT NULL DEFAULT 'finalized';

CREATE INDEX IF NOT EXISTS cursor_domain_type_height_idx
  ON cursor (domain, cursor_type, height);

rollback:

DROP INDEX IF EXISTS cursor_domain_type_height_idx;
ALTER TABLE cursor DROP COLUMN IF EXISTS cursor_type;

Testing Plan

  • cargo fmt -p scraper
  • CXX=g++ CC=gcc CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=gcc cargo test -p scraper test_failed_build_chain_scrapers -- --nocapture
  • cd rust/main && cargo clippy --features aleo,integration_test -- -D warnings
  • deploy scraper + run migration in staging
  • verify metrics for message_dispatch_tip and raw_message_dispatch_tip
  • validate tip rows appear quickly in raw_message_dispatch while finalized message_view remains authoritative

Linked PRs


Note

Medium Risk
Adds a DB migration and changes cursor persistence/query behavior, plus introduces an additional long-running indexing task; rollout requires migration ordering and could affect indexing progress/metrics if misconfigured.

Overview
Adds a second, near-tip dispatch indexing stage to the scraper: when a chain has a reorg period configured, it now spawns an additional message_dispatch_tip syncer with reorg_period = None to pick up the latest dispatches quickly.

Introduces cursor stream isolation by adding cursor.cursor_type (finalized vs tip) via a new migration and updating BlockCursor to read/write per-kind cursors (with legacy fallback when the column is missing). Tip-stage writes are routed through a new TipMessageStore that only upserts into raw_message_dispatch and emits a separate raw_message_dispatch_tip metric.

Updates run-locally E2E invariant checks to use exact label matching (fetch_metric_exact) and to account for raw dispatch counts coming from both finalized and tip metrics.

Written by Cursor Bugbot for commit 216ee60. This will update automatically on new commits. Configure here.

Co-authored-by: Paul Balaji <paulbalaji@users.noreply.github.com>
@paulbalaji
Copy link
Collaborator Author

@claude review

@claude
Copy link
Contributor

claude bot commented Feb 24, 2026

Claude finished @paulbalaji's task in 2m 5s —— View job


Review: Two-Stage Tip/Finalized Scraper Indexing

  • Read all changed files
  • Run /claude-review skill
  • Post inline comments (7 posted)
  • Post review summary

Overall Assessment

Well-structured change. The two-stage architecture cleanly separates concerns: finalized pipeline remains unchanged and authoritative, tip stage writes only to raw_message_dispatch via the existing reorg-tolerant upsert. Cursor isolation via cursor_type is the right approach.

What Looks Good

  • Clean separation: TipMessageStore only implements raw dispatch storage — no enriched tables, no sequence awareness. This avoids reorg side effects on finalized state.
  • Backward compatible migration: cursor_type defaults to "finalized", so existing cursor rows resume correctly without data migration.
  • Reuse of store_raw_message_dispatches: The ON CONFLICT(msg_id) UPDATE upsert handles tip-stage reorgs gracefully.
  • Conditional tip indexer: Only runs when reorg_period is configured (non-None), avoiding redundant work on chains already indexing at tip.

Items to Address

  1. Add logging when tip indexer is skipped (agent.rs:270-276) — operators should see why no message_dispatch_tip task was spawned for a chain.

  2. Document stale row edge case (tip_dispatches.rs:53-67) — messages that appear in an uncle block but are never re-included will persist as stale rows in raw_message_dispatch. Since finalized pipeline is authoritative this is acceptable, but worth documenting.

  3. Cursor table growth (migration:23-34) — the append-only cursor pattern now produces 2x rows per chain. Consider tracking whether a pruning strategy is needed.

Questions for the Author

  1. Tip cursor default height (agent.rs:417): The tip cursor defaults to the current block height at startup (via get_finalized_block_number on the tip indexer). This means on first run, messages dispatched between index.from and startup are only covered by the finalized pipeline. Is that the intended behavior?

  2. get_finalized_block_number semantics (agent.rs:402): With reorg_period = None, this returns the latest block (not finalized). The variable name tip_height is correct, but a brief comment would help readers.

No Issues Found

  • Migration up/down is correct and symmetric
  • CursorKind enum is well-scoped
  • Generated entity model matches the migration
  • db/mod.rs re-export of CursorKind is correct
  • Test in agent.rs still passes (uses ReorgPeriod::None so tip_chain_setup will be None)

@paulbalaji
Copy link
Collaborator Author

Addressed latest review feedback in b43f9c7:\n- log when tip indexer is skipped if is none\n- clarify semantics under tip config\n- clarify intentional near-head seed for tip cursor\n- document stale raw-row edge case in tip store docs\n- fix clippy on migration iden naming ()\n\nValidation:\n- (pass)

@paulbalaji
Copy link
Collaborator Author

Follow-up (clean formatting): addressed latest review feedback in b43f9c7501.

Changes:

  • log when tip indexer is skipped if reorg_period is none
  • clarify get_finalized_block_number semantics under tip config
  • clarify intentional near-head seed for tip cursor
  • document stale raw-row edge case in tip store docs
  • fix clippy on migration iden naming while keeping DB column name cursor_type

Validation:

  • cd rust/main && cargo clippy --features aleo,integration_test -- -D warnings (pass)

@paulbalaji
Copy link
Collaborator Author

Addressed latest review comment in e4dc8a34df.

Change:

  • tip cursor seed now uses tip_height.saturating_sub(1) so first sync pass includes current tip block while staying near-head.

Validation:

  • cd rust/main && cargo clippy -p scraper -- -D warnings (pass)
  • cd rust/main && cargo test -p scraper test_failed_build_chain_scrapers -- --nocapture (pass)

@paulbalaji
Copy link
Collaborator Author

Addressed latest monorepo PR comments in 02cf9cc03b.

Changes:

  • Pre-migration startup fallback in BlockCursor::new:
    • if cursor_type column is missing, finalized cursor falls back to legacy domain-only query
    • tip cursor falls back to default tip height with warning instead of hard-failing startup
  • Migration made idempotent with IF NOT EXISTS/IF EXISTS SQL for column/index add/drop

Validation:

  • cd rust/main && cargo fmt
  • cd rust/main && cargo clippy -p scraper -- -D warnings (pass)
  • cd rust/main && cargo test -p scraper test_failed_build_chain_scrapers -- --nocapture (pass)

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@paulbalaji
Copy link
Collaborator Author

Addressed latest cursor comments:

  • Added write fallback in BlockCursor::update for pre-migration DBs.
  • CursorKind::Finalized: falls back to legacy insert (domain,height) so finalized cursor still persists.
  • CursorKind::Tip: intentionally skips legacy write to avoid polluting legacy single-stream cursor state.
  • Tightened missing-column matcher to only trigger on missing cursor_type signatures (removed broad generic no such column match).

Checks run:

  • cargo fmt
  • cargo clippy -p scraper -- -D warnings
  • CXX=g++ CC=gcc CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=gcc cargo test -p scraper test_failed_build_chain_scrapers -- --nocapture

@paulbalaji
Copy link
Collaborator Author

Addressed rust e2e failures from the linked logs in 8745868376.

Root cause:

  • run-locally metric helper used prefix label matching (data_type="message_dispatch), so invariants for message_dispatch also matched message_dispatch_tip.
  • After tip-stage indexing, scraper invariant checks over-counted dispatches (e.g. 20 expected 10, 33 expected 31) and timed out.

Fix:

  • Added fetch_metric_exact in utils/run-locally/src/metrics.rs (exact label-value match).
  • Switched scraper termination invariant checks to exact matching in:
    • utils/run-locally/src/invariants/termination_invariants.rs
    • utils/run-locally/src/starknet/mod.rs
    • utils/run-locally/src/cosmos/termination_invariants.rs
    • utils/run-locally/src/cosmosnative/mod.rs
  • Re-exported helper in utils/run-locally/src/main.rs.

Validation:

  • cd rust/main && cargo fmt
  • cd rust/main && cargo check -p run-locally --features starknet,cosmos,cosmosnative

@paulbalaji
Copy link
Collaborator Author

Follow-up fix for the next CI failures in 216ee60346.

From the new logs:

  1. Scraper has scraped 29 raw message dispatches, expected 31
  2. Relayer has indexed 0 gas payments, expected 4/8 (cosmos + cosmosnative)

Root causes + fixes:

  • Raw dispatch invariant only counted raw_message_dispatch (finalized label). With two-stage indexing, some rows are first inserted by tip (raw_message_dispatch_tip) and finalized only upserts, so finalized-only counter can undercount.
    • Updated scraper invariant to compare raw_message_dispatch + raw_message_dispatch_tip against expected total.
  • Cosmos/cosmosnative relayer gas payment check used exact matching with wrong label key (gas_payment), but relayer emits data_type="gas_payments".
    • Updated relayer-side check labels to gas_payments in both files.

Validation:

  • cd rust/main && cargo fmt
  • cd rust/main && cargo check -p run-locally --features starknet,cosmos,cosmosnative,sealevel,radix

@hyper-gonk
Copy link
Contributor

hyper-gonk bot commented Feb 25, 2026

🦀 Rust Agent Docker Image Built Successfully

Service Tag
agent 216ee60-20260225-003438
Full image paths
gcr.io/abacus-labs-dev/hyperlane-agent:216ee60-20260225-003438

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants