Skip to content

Conversation

@tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Nov 19, 2025

Overview

This pr was an idea from @renaynay in which we create an exchange wrapper so that the exchange can check our store before making public requests for data. This would help the edge case i ran into today where the p2p store was missing 40 blocks randomly

@tac0turtle tac0turtle changed the title add exchange wrapper to first check da store for height before reques… feat: p2p exchange wrapper Nov 19, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 19, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 2, 2025, 9:56 AM

@claude
Copy link
Contributor

claude bot commented Nov 19, 2025

Claude finished @tac0turtle's task —— View job


Comprehensive Code Review

I've completed a thorough review of the P2P exchange wrapper implementation. This is a good optimization that addresses the edge case of missing blocks in the P2P store by checking the DA store first. Here's my detailed analysis:


🟢 Strengths

Architecture & Design

  • Clean wrapper pattern: The exchangeWrapper elegantly wraps the existing P2P exchange without modifying the underlying infrastructure
  • Dependency injection: The getter functions (storeGetter, storeGetterByHeight) provide good flexibility and testability
  • Zero-copy optimization: The wrapper avoids unnecessary data copying by directly returning headers from the DA store when available
  • Generic implementation: Proper use of Go generics (H header.Header[H]) maintains type safety across different header types

Integration

  • Minimal invasiveness: Changes are well-contained to the sync service initialization (lines 322-328 in sync_service.go)
  • Backward compatibility: The wrapper gracefully handles cases where DA store is nil, maintaining existing behavior

🟡 Issues & Recommendations

1. CRITICAL: Performance Issue - Unnecessary Data Fetching

Location: pkg/sync/sync_service.go:102-104, 82-84

The getter functions retrieve full block data when only headers are needed:

// HeaderSyncService getter - fetches FULL BLOCK
getter := func(ctx context.Context, s store.Store, hash header.Hash) (*types.SignedHeader, error) {
    h, _, err := s.GetBlockByHash(ctx, hash)  // ❌ Fetches block data unnecessarily
    return h, err
}

As @julienrbrt correctly noted in the review comment, GetHeader is less expensive.

Impact:

  • Fetches entire block data (potentially large) when only the header (~few KB) is needed
  • Increases memory pressure and I/O overhead
  • Could significantly degrade performance under heavy load

Recommendation:

// For hash-based lookup
getter := func(ctx context.Context, s store.Store, hash header.Hash) (*types.SignedHeader, error) {
    // Need to map hash to height first, then use GetHeader
    // OR add a new Store method: GetHeaderByHash(ctx, hash)
    h, _, err := s.GetBlockByHash(ctx, hash)
    return h, err
}

// For height-based lookup - ALREADY CORRECT ✓
getterByHeight := func(ctx context.Context, s store.Store, height uint64) (*types.SignedHeader, error) {
    return s.GetHeader(ctx, height)  // ✓ Efficient
}

Action Required:

  • Either add a GetHeaderByHash method to the store interface
  • OR accept the current implementation as a pragmatic trade-off (document this in code)
  • The GetByHeight path is already optimal using GetHeader

2. Error Handling: Silent Failure on Store Errors

Location: pkg/sync/exchange_wrapper.go:23, 35

if h, err := ew.getter(ctx, ew.daStore, hash); err == nil && !h.IsZero() {
    return h, nil
}
// Falls through silently on error

Issue: All DA store errors (network failures, disk errors, corruption) are silently ignored and fall through to P2P.

Recommendation:

if h, err := ew.getter(ctx, ew.daStore, hash); err != nil {
    // Log the error but continue to P2P fallback
    // Consider metrics for monitoring store health
    if !errors.Is(err, store.ErrNotFound) {
        // log.Debug().Err(err).Msg("DA store lookup failed, falling back to P2P")
    }
} else if !h.IsZero() {
    return h, nil
}

Severity: Medium - Could mask serious issues like disk corruption or database failures.


3. Concurrency Safety: Data Race Potential

Location: pkg/sync/sync_service.go:322-328

The exchangeWrapper is created during setupP2PInfrastructure (called from Start), but there's no synchronization to ensure the daStore and getter functions are not modified concurrently.

Current State:

  • The implementation appears safe because these fields are only set during initialization
  • However, there's no explicit documentation or enforcement

Recommendation:

// Add comment to document initialization safety contract
type SyncService[H header.Header[H]] struct {
    // ... existing fields ...
    
    // daStore, getter, getterByHeight are set once during initialization
    // and must not be modified after Start() is called
    daStore           store.Store
    getter            storeGetter[H]
    getterByHeight    storeGetterByHeight[H]
}

Severity: Low - Current implementation is safe, but better documentation would help.


4. Test Coverage: Missing Edge Cases

Location: pkg/sync/exchange_wrapper_test.go

Missing Test Cases:

  1. ✗ Zero header returned from store (currently only tests error case)
  2. ✗ Context cancellation during store lookup
  3. ✗ Concurrent calls to Get/GetByHeight
  4. ✗ Nil header with nil error from getter
  5. ✗ Performance comparison (store hit vs P2P fallback)

Recommendation:

t.Run("Zero Header Returned", func(t *testing.T) {
    mockEx := extmocks.NewMockExchange[*types.SignedHeader](t)
    mockEx.On("Get", ctx, hash).Return(expectedHeader, nil)
    
    getter := func(ctx context.Context, s store.Store, h header.Hash) (*types.SignedHeader, error) {
        return &types.SignedHeader{}, nil // Zero header
    }
    
    ew := &exchangeWrapper[*types.SignedHeader]{
        Exchange: mockEx,
        daStore:  mocks.NewMockStore(t),
        getter:   getter,
    }
    
    h, err := ew.Get(ctx, hash)
    assert.NoError(t, err)
    assert.Equal(t, expectedHeader, h) // Should fallback to P2P
})

t.Run("Context Cancellation", func(t *testing.T) {
    ctx, cancel := context.WithCancel(context.Background())
    cancel() // Cancel immediately
    
    getter := func(ctx context.Context, s store.Store, h header.Hash) (*types.SignedHeader, error) {
        return nil, ctx.Err() // Should respect context
    }
    
    mockEx := extmocks.NewMockExchange[*types.SignedHeader](t)
    mockEx.On("Get", ctx, hash).Return(nil, context.Canceled)
    
    ew := &exchangeWrapper[*types.SignedHeader]{
        Exchange: mockEx,
        daStore:  mocks.NewMockStore(t),
        getter:   getter,
    }
    
    _, err := ew.Get(ctx, hash)
    assert.Error(t, err)
})

Severity: Medium - Current tests validate happy path but miss important edge cases.


5. Documentation: Missing Rationale and Behavior

Location: pkg/sync/exchange_wrapper.go:13-18

The exchangeWrapper struct lacks documentation explaining:

  • Why this wrapper exists (the edge case of missing 40 blocks)
  • The fallback behavior
  • The trade-offs (extra store lookup latency)

Recommendation:

// exchangeWrapper wraps a header.Exchange to check the local DA store before
// making network requests. This addresses the edge case where the P2P store
// may be missing blocks that exist in the DA store.
//
// Behavior:
// - On Get/GetByHeight, first checks DA store using provided getter functions
// - If found and non-zero, returns immediately (fast path)
// - Otherwise, falls back to network exchange (slow path)
//
// Trade-offs:
// - Adds store lookup latency on cache miss (~1-5ms typically)
// - Reduces unnecessary P2P traffic and handles store inconsistencies
// - Silently ignores store errors (falls back to P2P)
type exchangeWrapper[H header.Header[H]] struct {
    header.Exchange[H]
    daStore        store.Store
    getter         storeGetter[H]
    getterByHeight storeGetterByHeight[H]
}

Severity: Low - Code is readable but context would help future maintainers.


6. Metrics & Observability: No Instrumentation

Location: pkg/sync/exchange_wrapper.go:20-42

There are no metrics to track:

  • DA store hit rate
  • Fallback to P2P rate
  • Store lookup latency
  • Error rates

Recommendation:

// Add prometheus metrics
var (
    storeHitCounter = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "sync_exchange_store_hits_total",
            Help: "Number of successful DA store lookups",
        },
        []string{"type"}, // "hash" or "height"
    )
    
    storeMissCounter = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "sync_exchange_store_misses_total",
            Help: "Number of DA store misses requiring P2P fallback",
        },
        []string{"type"},
    )
)

func (ew *exchangeWrapper[H]) Get(ctx context.Context, hash header.Hash) (H, error) {
    if ew.daStore != nil && ew.getter != nil {
        if h, err := ew.getter(ctx, ew.daStore, hash); err == nil && !h.IsZero() {
            storeHitCounter.WithLabelValues("hash").Inc()
            return h, nil
        }
        storeMissCounter.WithLabelValues("hash").Inc()
    }
    return ew.Exchange.Get(ctx, hash)
}

Severity: Medium - Critical for production debugging and performance tuning.


🔒 Security Analysis

No Critical Security Issues Identified

  • Input validation: Hash and height are validated by underlying store
  • No credential exposure risk
  • No injection vulnerabilities
  • Context cancellation properly propagated
  • No panic risks (proper nil checks)

Minor Consideration:

  • If DA store is compromised, it could serve malicious headers before P2P validation
  • However, this is mitigated by header validation in the sync layer (not shown in this PR)

⚡ Performance Analysis

Positive Impacts:

  • ✅ Reduces P2P traffic when blocks exist in DA store
  • ✅ Faster lookups for local data (no network roundtrip)
  • ✅ Handles edge case of inconsistent P2P store

Negative Impacts:

  • ⚠️ Adds store lookup latency on every miss (~1-5ms)
  • ⚠️ GetBlockByHash fetches unnecessary data (see Issue ABCI interface #1)
  • ⚠️ No caching of negative lookups (repeated misses hit store every time)

Optimization Opportunities:

  1. Add a small negative lookup cache (e.g., last 100 missing hashes)
  2. Use GetHeader instead of GetBlockByHash for HeaderSyncService
  3. Consider async prefetching for sequential height queries

📊 Test Coverage Analysis

Current Coverage: 70.83% (14 lines missing coverage)

Lines Missing Coverage (per Codecov):

  • pkg/sync/sync_service.go: 10 missing, 4 partials

Test Quality Assessment:

  • ✅ Tests cover basic hit/miss scenarios
  • ✅ Proper use of mocks
  • ✅ Table-driven test structure could be beneficial but not critical
  • ⚠️ Missing edge cases (see Issue Fraud proofs gossiping #4)
  • ⚠️ No integration tests with real store

🎯 Priority Recommendations

Must Fix Before Merge:

  1. Address @julienrbrt's comment: Optimize header fetching in sync_service.go:102-104

Should Fix:

  1. Add logging for DA store errors (don't fail silently)
  2. Add metrics for observability
  3. Improve test coverage with edge cases

Nice to Have:

  1. Add comprehensive documentation
  2. Consider negative lookup caching
  3. Add benchmark tests

✅ Summary

This PR implements a pragmatic solution to the missing blocks edge case. The wrapper pattern is clean and the integration is minimal. The main concerns are:

  1. Performance: Fetching full blocks when only headers are needed (per @julienrbrt's review)
  2. Observability: No metrics to track hit/miss rates and debug issues
  3. Error handling: Silent failures on store errors could mask problems
  4. Testing: Missing edge cases in test coverage

The implementation is fundamentally sound and safe, but would benefit from the optimizations noted above before merge.

Overall Assessment: ✅ Approve with Changes


@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

❌ Patch coverage is 70.83333% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.58%. Comparing base (2b5a1df) to head (a256a1c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/sync/sync_service.go 56.25% 10 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2855      +/-   ##
==========================================
+ Coverage   64.49%   64.58%   +0.08%     
==========================================
  Files          81       82       +1     
  Lines        7374     7403      +29     
==========================================
+ Hits         4756     4781      +25     
- Misses       2076     2078       +2     
- Partials      542      544       +2     
Flag Coverage Δ
combined 64.58% <70.83%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants