Skip to content

sc-client-db: add prefetched indexed transactions support#12086

Draft
skunert wants to merge 21 commits into
masterfrom
skunert/storage-chain-sync-low-level
Draft

sc-client-db: add prefetched indexed transactions support#12086
skunert wants to merge 21 commits into
masterfrom
skunert/storage-chain-sync-low-level

Conversation

@skunert
Copy link
Copy Markdown
Contributor

@skunert skunert commented May 15, 2026

We need to pass prefetched indexed transaction data from the block import pipeline to the backend so it can be inserted into the TRANSACTION column. The flow will include a custom block import that downloads referenced data over the network, the attaches it to the BlockImportParameters. Backend then inserts them into the DB.

skunert added 15 commits May 15, 2026 16:28
Substrate plumbing required by the cumulus storage-chain-sync block-import wrapper:

- sp-database: add multibump (atomic refcount delta) for the indexed-transaction column.
- sc-client-db: classify_indexed_extrinsics + IndexedTransactionMeta + apply_index_ops Renew/Insert refcount accounting; surface PREFETCHED_INDEXED_TRANSACTIONS_INTERMEDIATE_KEY and forward prefetched bytes through apply_block.
- sc-client-api: re-export the prefetched-transactions intermediate key alongside backend traits.
- sc-network/bitswap: gate Prefix + schema re-exports under a new 'test-helpers' feature for downstream test crates; widen Prefix::to_bytes to pub under the feature.
- sc-service: wire the prefetched-transactions intermediate through the client's apply_block pipeline.
- sp-transaction-storage-proof: add IndexedTransactionInfo + TransactionStorageApi v2 returning per-entry (content_hash, hashing, codec, extrinsic_index).
litep2p's bitswap response decoder reconstructs the response CID by re-hashing block.data with the algorithm declared in the wire prefix (see /home/sebastian/.cargo/registry/src/.../litep2p-0.13.3/src/protocol/libp2p/bitswap/mod.rs:248). For consumers of `request_bitswap_blocks_unverified` that don't know the hash function up front, this re-hashing produces a CID that does not equal the wantlist CID and PendingBatch.record_responses (CID-keyed HashMap lookup) silently dropped every such response.

The storage-chain sync fetcher uses placeholder Blake CIDs on the unverified path because the hashing algorithm is only known after the bytes are fetched and locally hashed against each supported algorithm. With CID-keyed matching, every Sha2-256 or Keccak-256 response was filtered out at the litep2p shim layer, causing `request_bitswap_blocks_unverified` to time out and the wrapper to retry forever.

Switch PendingBatch's response store from `HashMap<Cid, ResponseType>` to `Vec<ResponseType>` accumulated in arrival order. is_complete() resolves when count == wantlist size; the substrate-side classify_response and classify_response_unverified already handle CID matching (verified path) or positional + prefix matching (unverified path) once the raw responses reach them.

Drop the now-unused select_best_response_per_cid helper and its 3 unit tests. Add 4 regression tests (pending_batch_accepts_sha2_response_for_blake_wantlist, pending_batch_accepts_keccak_response_for_blake_wantlist, pending_batch_preserves_response_order_across_algorithms, pending_batch_caps_responses_at_wantlist_size) that fail on the old CID-keyed implementation and pass on the new positional one.
…ions

Adds 42 tests in sc-client-db (mod indexed_transaction_backend_tests)
covering ref-counting lifecycles and the storage-chain prefetched-renew
path against three backends:

- kvdb-memorydb (existing test backend)
- kvdb-rocksdb (real on-disk via tempdir)
- parity-db (real on-disk via tempdir)

Layer A (21 tests): adapter-level scenarios calling sp_database::Database
directly. Drop+reopen between phases for paritydb/rocksdb to bypass
commit_overlay caching and observe true on-disk state.

Layer B (21 tests): Backend-level scenarios exercising apply_index_ops
and prune lifecycles. Mix of existing prefetched_* scenarios reproduced
parametrically plus new scenarios for redundant-prefetch, same-block
insert+renew, and sequential prefetched renews.

Result: 36 pass, 6 deterministically/flakily fail — all 6 failures on
parity-db, documenting:
- adapter-level data loss for same-commit store+reference on a fresh hash
- refcount leaks through full block lifecycle in redundant-prefetch and
  same-block insert+renew scenarios

New public test helper Backend::new_test_with_tx_storage_source accepts
a caller-supplied DatabaseSource so tests can target real backends.
Collapses the 42-test backend matrix from three explicit #[test] wrappers
per scenario into a single #[rstest] fn with three #[case::<name>(...)]
attributes per scenario. Net ~150 LOC reduction.

- New dev-dep: rstest = { workspace = true } (already pinned at workspace
  root, used widely across substrate/cumulus/polkadot).
- BackendKind enum + DbFactory and BackendFactory consolidate the three
  per-backend run_with_* helpers into one parametric constructor.
- Test names move from suffix style (a1_store_then_get_paritydb) to
  rstest namespace (a1_store_then_get::case_2_paritydb). Filterable via
  cargo test ::case_2_paritydb for parity-db variants only.

Pass/fail signature unchanged: same 4 deterministic + 2 flaky parity-db
failures, all other backends green.
@skunert skunert added A5-run-CI Run CI on draft PR T0-node This PR/Issue is related to the topic “node”. labels May 15, 2026
@skunert
Copy link
Copy Markdown
Contributor Author

skunert commented May 15, 2026

/cmd fmt

github-actions Bot and others added 2 commits May 15, 2026 21:13
Replaces the bare `Vec<([u8; 32], Vec<u8>)>` shape of
`BlockImportParams::prefetched_indexed_transactions` and
`BlockImportOperation::set_prefetched_indexed_transactions` with a dedicated
`PrefetchedIndexedTransactions { ops, renew_payloads }` struct.

- `renew_payloads` retains the existing semantics: `(content_hash, bytes)`
  pairs the backend consumes inside `apply_index_ops` when servicing a
  runtime-produced `IndexOperation::Renew` whose hash is not yet in the local
  TRANSACTION column. No behavioural change for current tip-sync consumers.

- `ops` is new. It carries synthetic `IndexOperation`s that the backend uses
  ONLY when the runtime did not call `update_transaction_index` (i.e.
  `operation.index_ops` is empty). This is the substrate-side prerequisite for
  cumulus storage-chain gap-sync backfill: the wrapper will derive synthetic
  Insert ops for store extrinsics (tail bytes already in the body) and
  synthetic Renew ops for entries it bitswap-fetches; the backend's
  `try_commit_operation` then picks `effective_ops` with a strict precedence
  rule -- runtime-produced ops always win when present, synthetic ops are a
  pure fallback. No cumulus consumer enabled yet; gating remains in the
  upstream cumulus wrapper's `should_intercept` filter.

Includes 8 new sc-client-db unit tests covering the precedence rule:
- `synthetic_ops_take_effect_when_runtime_index_ops_empty`
- `runtime_index_ops_win_over_synthetic`
- `empty_both_falls_back_to_plain_body`
- `synthetic_renew_uses_prefetched_payload`
- `synthetic_renew_without_prefetched_references_existing`
- `synthetic_insert_extracts_tail_from_body`
- `synthetic_insert_oversized_size_falls_back_to_full_extrinsic`
- `multiple_synthetic_ops_per_block_apply_in_order`

The existing 39-test indexed-transaction backend matrix continues to pass.
The test helper `insert_block_with_prefetched` keeps its existing signature
(thin wrapper around `set_prefetched_indexed_transactions` with empty ops).
A new helper `insert_block_with_synthetic_ops` exercises both paths.
@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/25999919185
Failed job name: fmt

skunert added 3 commits May 17, 2026 21:35
- prdoc: collapse verbose prose, keep functional descriptions
- backend.rs: tighten struct/field/method comments
- lib.rs: reduce comment noise in production code and tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A5-run-CI Run CI on draft PR T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant