Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Nov 12, 2025

Implement failover via RAFT

  • Improve Cache startup/shutdown with parallelization
  • Publish to RAFT cluster in executor
  • Sync DB after each block created in executor
  • Add new RaftReceiver to sync when in aggregator follower mode
  • Introduce failoverState to switch between follower/leader mode
  • Provide RAFT node details via http endpoint

@github-actions
Copy link
Contributor

github-actions bot commented Nov 12, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedDec 3, 2025, 4:47 PM

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 41.48816% with 519 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.05%. Comparing base (ded4f34) to head (695324e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/raft/node.go 12.50% 168 Missing ⚠️
pkg/raft/node_mock.go 45.08% 74 Missing and 21 partials ⚠️
block/internal/syncing/raft_retriever.go 0.00% 60 Missing ⚠️
node/full.go 32.81% 36 Missing and 7 partials ⚠️
node/failover.go 74.45% 22 Missing and 13 partials ⚠️
block/internal/syncing/syncer.go 28.88% 30 Missing and 2 partials ⚠️
block/internal/executing/executor.go 6.89% 23 Missing and 4 partials ⚠️
pkg/raft/election.go 79.26% 12 Missing and 5 partials ⚠️
pkg/rpc/server/http.go 6.66% 13 Missing and 1 partial ⚠️
block/internal/syncing/assert.go 57.89% 4 Missing and 4 partials ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2836      +/-   ##
==========================================
- Coverage   64.53%   62.05%   -2.48%     
==========================================
  Files          81       88       +7     
  Lines        7382     8083     +701     
==========================================
+ Hits         4764     5016     +252     
- Misses       2076     2490     +414     
- Partials      542      577      +35     
Flag Coverage Δ
combined 62.05% <41.48%> (-2.48%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe added 4 commits November 17, 2025 10:55
* main:
  fix: remove duplicate error logging in light node shutdown (#2841)
  chore: fix incorrect function name in comment (#2840)
  chore: remove sequencer go.mod (#2837)
* main:
  build(deps): Bump the go_modules group across 2 directories with 3 updates (#2846)
  build(deps): Bump github.com/dvsekhvalnov/jose2go from 1.7.0 to 1.8.0 in /test/e2e (#2851)
  build(deps): Bump github.com/consensys/gnark-crypto from 0.18.0 to 0.18.1 in /test/e2e (#2844)
  build(deps): Bump github.com/cometbft/cometbft from 0.38.17 to 0.38.19 in /test/e2e (#2843)
  build(deps): Bump github.com/dvsekhvalnov/jose2go from 1.6.0 to 1.7.0 in /test/e2e (#2845)
@alpe alpe changed the title [WIP] feat: High availabilty via RAFT feat: High availabilty via RAFT Nov 17, 2025
alpe added 4 commits November 18, 2025 11:00
(cherry picked from commit c44cd77e665f6d5d463295c6ed61c59a56d88db3)
@alpe alpe marked this pull request as ready for review November 19, 2025 09:54
@alpe
Copy link
Contributor Author

alpe commented Nov 19, 2025

@claude review again

@claude
Copy link
Contributor

claude bot commented Nov 19, 2025

Claude finished @alpe's task —— View job


Comprehensive Code Review

I've analyzed this high-availability RAFT implementation PR. Here's my detailed review across multiple dimensions:


1. Architecture & Design Patterns

✅ Strengths

Modular RAFT Integration: The separation of concerns is well-executed:

  • pkg/raft/node.go: Core RAFT node abstraction
  • pkg/raft/election.go: Leader election orchestration with dynamic role switching
  • node/failover.go: Failover state management decoupled from business logic
  • Integration points are well-defined through interfaces (e.g., Runnable, RaftNode)

Clean State Machine: The DynamicLeaderElection.Run() in pkg/raft/election.go:54-152 implements a robust state machine with clear transitions between leader/follower modes.

Factory Pattern: The use of factory functions for leader/follower initialization (leaderFactory, followerFactory in node/full.go:97-107) enables clean separation and testability.

⚠️ Issues & Recommendations

1. JSON Serialization Performance Concern (pkg/raft/node.go:211)

data, err := json.Marshal(state) // todo:use protobuf

Critical Issue: At 100ms block times, JSON serialization overhead can impact consensus performance significantly. This TODO should be prioritized.

Recommendation:

  • Implement protobuf serialization immediately
  • Add benchmarks comparing JSON vs protobuf performance
  • Estimate impact: JSON typically 2-5x slower than protobuf for structured data

2. Silent Message Drops (pkg/raft/node.go:291-297)

select {
case f.applyCh <- RaftApplyMsg{Index: log.Index, State: &state}:
default:
    f.logger.Warn().Msg("apply channel full, dropping message")
}

Issue: Dropped messages could lead to nodes falling behind without recovery mechanism.

Recommendation:

  • Add metrics to track dropped messages
  • Implement backpressure mechanism
  • Consider increasing buffer size or making it configurable
  • Document expected behavior when messages are dropped

3. Bootstrap Configuration Validation (node/full.go:90-123)

case !nodeConfig.Node.Aggregator && !nodeConfig.Raft.Enable:
    // sync mode without raft
default:
    return nil, fmt.Errorf("raft config must be used in sequencer setup only")

Issue: Error message is unclear - doesn't explain invalid configuration.

Recommendation: Improve error message to explain valid configurations.


2. Error Handling & Resilience

⚠️ Critical Issues

1. Leadership Transfer Failure Handling (pkg/raft/election.go:99-102)

if err := d.node.leadershipTransfer(); err != nil && !errors.Is(err, raft.ErrNotLeader) {
    return err // Node stops entirely
}

Issue: If leadership transfer fails (e.g., no suitable leader), the node stops. This could cause cluster instability.

Recommendation:

  • Implement retry logic with exponential backoff
  • Add configurable timeout for leadership transfer
  • Consider degraded mode instead of full stop
  • Log detailed error context for debugging

2. Missing Context Propagation (block/internal/syncing/raft_retriever.go:103-134)

func (r *raftRetriever) consumeRaftBlock(ctx context.Context, state *raft.RaftBlockState) error {
    // ctx is passed but not checked in tight loops
}

Issue: Long-running operations don't check context cancellation, potentially blocking shutdown.

Recommendation: Add context checks in processing loops.

3. Database Flush Without Error Handling (block/internal/executing/executor.go:434-439)

if e.raftNode != nil {
    headerBytes, err := header.MarshalBinary()
    if err != nil {
        return fmt.Errorf("failed to marshal header: %w", err)
    }

Missing: The code after this (from your review) should verify database flush completed before RAFT broadcast. If RAFT broadcast succeeds but DB flush fails, state inconsistency occurs.

Recommendation:

  • Ensure DB flush completes before RAFT broadcast
  • Add explicit error handling for flush operations
  • Consider transaction boundaries

4. Validation Gap in RAFT State (pkg/raft/types.go:17-24)

func (s RaftBlockState) assertValid(next RaftBlockState) error {
    if s.Height > next.Height {
        return fmt.Errorf("invalid height: %d > %d", s.Height, next.Height)
    }
    // ...
}

Issue: Only checks monotonic increase, doesn't validate:

  • Gap detection (height jumps)
  • Hash chain continuity
  • Timestamp reasonableness (could be far future)

Recommendation: Add comprehensive validation including gap detection and chain continuity checks.


3. Security Considerations

⚠️ Issues

1. No Mutual TLS for RAFT Communication (pkg/raft/node.go:87-89)

transport, err := raft.NewTCPTransport(cfg.RaftAddr, addr, 3, 10*time.Second, os.Stderr)

Issue: RAFT cluster communication is unencrypted. In production, this exposes:

  • Block data in transit
  • Potential MITM attacks
  • No node authentication beyond configuration

Recommendation:

  • Implement TLS for RAFT transport
  • Add certificate-based authentication
  • Document security requirements in production
  • Consider network isolation requirements

2. No Rate Limiting on RAFT Apply (pkg/raft/node.go:279-300)
Issue: No protection against DoS through excessive RAFT proposals.

Recommendation:

  • Add rate limiting per node
  • Implement backpressure mechanisms
  • Monitor proposal rates

3. Peer Address Validation (pkg/raft/node.go:344-364)

// we can skip address validation as they come from a local configuration

Issue: Comment suggests trust in configuration, but misconfiguration can cause issues.

Recommendation: Add basic sanity checks (valid port ranges, non-empty addresses, reachability probes).


4. Concurrency & Resource Management

⚠️ Issues

1. Potential Goroutine Leak (pkg/raft/election.go:66-82)

startWorker := func(name string, workerFunc func(ctx context.Context) error) {
    workerCancel()
    workerCtx, cancel := context.WithCancel(ctx)
    workerCancel = cancel
    wg.Add(1)
    go func(childCtx context.Context) {
        defer wg.Done()
        if err := workerFunc(childCtx); err != nil && !errors.Is(err, context.Canceled) {
            select {
            case errCh <- fmt.Errorf(name+" worker exited unexpectedly: %s", err):
            default: // do not block
            }
        }
    }(workerCtx)
}

Issue: If errCh is full and worker errors occur, errors are silently dropped. This could mask critical failures.

Recommendation:

  • Increase error channel buffer size
  • Log when errors can't be sent
  • Consider using a more robust error aggregation pattern

2. Race Condition Risk (block/internal/syncing/syncer.go:177-189)

func (s *Syncer) isCatchingUpState() bool {
    return len(s.heightInCh) != 0 || func() bool {
        currentHeight, err := s.store.Height(s.ctx)
        // ...
        return s.raftRetriever != nil && s.raftRetriever.Height() > currentHeight
    }()
}

Issue: Multiple store height checks without synchronization could race with concurrent updates.

Recommendation: Add mutex or use atomic operations for height comparisons.

3. Shutdown Ordering (node/failover.go:177-232)

defer func() {
    shutdownCtx, done := context.WithTimeout(context.Background(), 3*time.Second)
    defer done()
    _ = f.rpcServer.Shutdown(shutdownCtx)
}()

Issue: Hard-coded 3-second timeout may not be sufficient for graceful shutdown under load.

Recommendation:

  • Make shutdown timeout configurable
  • Implement phased shutdown (stop accepting new requests → drain existing → force close)
  • Log shutdown progress

4. BoltDB Concurrent Access (pkg/raft/node.go:67-74)

logStore, err := raftboltdb.NewBoltStore(filepath.Join(cfg.RaftDir, "raft-log.db"))
stableStore, err := raftboltdb.NewBoltStore(filepath.Join(cfg.RaftDir, "raft-stable.db"))

Note: BoltDB doesn't support concurrent writers. Ensure RAFT library handles this correctly (it likely does, but worth documenting).


5. Performance & Resource Efficiency

⚠️ Issues

1. Memory Allocation in Hot Path (pkg/raft/node.go:206-222)

func (n *Node) Broadcast(_ context.Context, state *RaftBlockState) error {
    data, err := json.Marshal(state) // Allocates every call
    // ...
}

Issue: JSON marshaling allocates on every block production (every 100ms).

Recommendation:

  • Use sync.Pool for buffer reuse
  • Switch to protobuf (lower allocation)
  • Profile allocation hotspots

2. Unbounded Channel Operations (block/internal/syncing/raft_retriever.go:62)

applyCh := make(chan raft.RaftApplyMsg, 1)

Issue: Buffer size of 1 is very small; could cause blocking on high throughput.

Recommendation:

  • Make buffer size configurable based on expected block rate
  • Monitor channel saturation
  • Document expected throughput

3. Blocking Operations in Election Loop (pkg/raft/election.go:96)

raftSynced := d.node.waitForMsgsLanded(d.node.Config().SendTimeout) == nil

Issue: Blocks election loop, delaying leadership decisions.

Recommendation: Consider async status checks or timeout optimization.


6. Testing Coverage & Quality

✅ Strengths

  • Good unit test coverage for helper functions (splitPeerAddr, deduplicateServers in pkg/raft/node_test.go)
  • Comprehensive election state machine tests (pkg/raft/election_test.go)
  • E2E failover test (test/e2e/failover_e2e_test.go)

⚠️ Gaps

1. Missing Integration Tests:

  • No tests for RAFT network partitions (split-brain scenarios)
  • No tests for concurrent leadership changes
  • Missing crash recovery tests
  • No tests for state validation edge cases

2. Low Coverage on Critical Paths (from codecov report):

  • pkg/raft/node.go: 12.50% coverage (168 lines missing)
  • block/internal/syncing/raft_retriever.go: 0.00% coverage (60 lines missing)
  • node/failover.go: 74.45% coverage (35 lines missing)

Recommendation:

  • Add chaos engineering tests (network delays, partitions, crashes)
  • Test leadership transfer under various loads
  • Test recovery from various failure modes
  • Increase coverage for error paths

3. Mock Coverage (pkg/raft/node_mock.go):

  • 45% coverage for mocks suggests tests aren't exercising all code paths

4. Missing Benchmarks:

  • No performance benchmarks for RAFT broadcast latency
  • No benchmarks for failover time
  • No load testing for sustained throughput

7. Documentation & Code Quality

⚠️ Issues

1. Insufficient Documentation:

  • SetApplyCallback (pkg/raft/node.go:274) has good docs, but most RAFT methods lack detailed documentation
  • No architecture documentation for RAFT integration
  • Missing operational runbook (how to recover from failures, add/remove nodes, monitor health)

2. Magic Numbers:

ticker := time.NewTicker(300 * time.Millisecond) // election.go:83
ReadTimeout:  10 * time.Second, // failover.go:158

Recommendation: Extract to named constants with explanatory comments.

3. Logging Inconsistency:

  • Mix of Debug/Info/Warn/Error levels
  • Some critical operations lack logging (e.g., database flushes)
  • Consider structured logging for correlation across cluster

8. Specific Code Issues

🔴 Critical

1. Double Sign Risk (pkg/raft/election.go:95-104)

raftSynced := d.node.waitForMsgsLanded(d.node.Config().SendTimeout) == nil
if !raftSynced || !runnable.IsSynced(d.node.GetState()) {
    d.logger.Info().Msg("became leader, but not synced. Pass on leadership")
    if err := d.node.leadershipTransfer(); err != nil && !errors.Is(err, raft.ErrNotLeader) {
        return err
    }
    continue
}

Good: Checks sync status before becoming leader to prevent double signing.

Issue: If leadership transfer fails, the node might stop entirely, reducing cluster capacity.

Recommendation: Implement more sophisticated handling - perhaps stay as follower until properly synced.

2. Store Flush Timing (per review comment thread)
The discussion around Commit() and flush behavior is important. Current implementation flushes before RAFT broadcast, which is correct, but:

Recommendation:

  • Document the critical ordering: DB flush → RAFT broadcast → P2P broadcast
  • Add assertions/tests to verify this ordering
  • Consider making SyncWrites configurable with clear performance trade-off documentation

⚠️ High Priority

3. Peer Duplication Check (pkg/raft/node.go:137-139)

if svrs := deduplicateServers(cfg.Servers); len(svrs) != len(cfg.Servers) {
    return fmt.Errorf("duplicate peers found in config: %v", cfg.Servers)
}

Issue: Deduplicates by ID only, but error message suggests checking Address too. Could have same ID with different addresses.

Recommendation: Check both ID and Address for duplicates, or clarify which is the key.

4. Error Propagation (block/internal/executing/executor.go:330-332)

if e.raftNode != nil && !e.raftNode.IsLeader() {
    return errors.New("not raft leader")
}

Issue: Generic error message doesn't help with debugging.

Recommendation: Include current leader ID and node state in error.


9. Configuration & Operational Concerns

⚠️ Issues

1. No Default Validation (pkg/config/config.go:322-325)

if err := c.Raft.Validate(); err != nil {
    return err
}

Question: What does Raft.Validate() check? Need to review config validation for:

  • Heartbeat timeout vs election timeout ratios
  • Send timeout vs block time relationships
  • Minimum cluster size

2. Missing Health Checks:

  • No RAFT-specific health endpoint beyond basic RPC
  • Should expose: current role, cluster size, sync status, last applied index

Recommendation: Add /raft/status endpoint with detailed cluster health.

3. Metrics Gaps:

  • No RAFT-specific metrics (proposals/sec, apply latency, leadership changes)
  • Should track: failover count, leadership tenure, sync lag

10. Comparison with Review Comments

Addressing existing review comments:

  1. Cache parallelization (per julienrbrt comment): Agreed this is a good improvement
  2. P2P height tracking (syncer.go:310): Your explanation makes sense - RAFT provides the source of truth
  3. ⚠️ Protobuf TODO (node.go:211): As mentioned, this is critical for performance - should not wait
  4. isCatchingUpState naming (per julienrbrt): Consider isRaftSynced() or isClusterSynced() to be more specific
  5. ⚠️ Commit/Flush behavior: The discussion about SyncWrites is important - current approach is safe but needs documentation

Summary & Recommendations

Must Fix Before Merge

  1. Implement Protobuf serialization for RAFT state (pkg/raft/node.go:211)
  2. Add error handling for dropped RAFT messages with metrics (pkg/raft/node.go:296)
  3. Improve leadership transfer failure handling (pkg/raft/election.go:99-102)
  4. Add TLS support for RAFT transport or document security requirements
  5. Fix validation gaps in RaftBlockState.assertValid() (pkg/raft/types.go)

High Priority

  1. Increase test coverage for critical paths (currently 41.48% patch coverage)
  2. Add integration tests for failure scenarios (network partition, crash recovery)
  3. Document operational procedures (recovery, scaling, monitoring)
  4. Add comprehensive health checks and metrics
  5. Review and test shutdown ordering under load

Nice to Have

  1. Make timeouts configurable (currently hard-coded)
  2. Add performance benchmarks for RAFT operations
  3. Improve error messages with contextual information
  4. Add structured logging with correlation IDs
  5. Consider implementing backpressure for apply channel

Architecture Validation

The overall architecture is sound:

  • ✅ Clean separation between RAFT consensus and business logic
  • ✅ Well-designed state machine for leadership election
  • ✅ Good use of interfaces for testability
  • ✅ Proper handling of sync-before-leadership to prevent double signing

However, the implementation needs refinement in error handling, testing, and performance optimization before production readiness.


Overall Assessment: This is a solid foundation for HA via RAFT, but needs attention to the critical issues above, particularly around error resilience, security, and test coverage before production deployment. The architectural decisions are sound, but implementation details need hardening.

@evstack evstack deleted a comment from claude bot Nov 19, 2025
@alpe alpe requested a review from julienrbrt November 19, 2025 11:35
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first walkthrough, I'll go read about github.com/hashicorp/raft and come back to review election.go and node.go

return nil
}

// Height returns the current height stored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to know the height of the p2p (go header) store? (I am still reviewing, this may get clearer). We can have the app height from the evolve store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the node switches from sync to aggregator mode, the internal state is key to prevent double signing.
The Syncer has a isCatchingUpState method now that checks the stores for any height > current.
it is called within the leader election loop to transfer leadership in case it is not fully synced, yet.

}

// SetApplyCallback sets a callback to be called when log entries are applied
func (n *Node) SetApplyCallback(ch chan<- RaftApplyMsg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what is this for? the go doc is very light

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel is passed by the syncer to receive first level state updates from within the raft cluster. This should be the fastest communication channel available.

}()

// Check raft leadership if raft is enabled
if e.raftNode != nil && !e.raftNode.IsLeader() {
Copy link
Member

@julienrbrt julienrbrt Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated: i wonder how this will play with different sequencers.
In #2797 you can get to that path without node key (to sign). I suppose we'll need to add a condition for based sequencing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was only preparing for single sequencer. Base would not work with raft as there are no aggregators.

leaderFactory := func() (raftpkg.Runnable, error) {
logger.Info().Msg("Starting aggregator-MODE")
nodeConfig.Node.Aggregator = true
nodeConfig.P2P.Peers = "" // peers are not supported in aggregator mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand this. is the aggregator broadcasting to no one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the aggregator is required to broadcast to at least one node part of a larger mesh other wise p2p will not work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more who calls whom. The aggregator gets called not otherwise. Starting all nodes with p2p-peer setup makes sense though. When a ha cluster is setup, the raft leader gets the aggregator role and I clear the peers when the p2p stack is restarted.
There is an error thrown somewhere when peers are not empty.

node/full.go Outdated
func initRaftNode(nodeConfig config.Config, logger zerolog.Logger) (*raftpkg.Node, error) {
raftDir := nodeConfig.Raft.RaftDir
if raftDir == "" {
raftDir = filepath.Join(nodeConfig.RootDir, "raft")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should be using DefaultConfig() value if empty.

return fmt.Errorf("not leader")
}

data, err := json.Marshal(state) // todo:use protobuf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the todo? size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should migrate to protobuf here. json will cause overhead, at 100ms we need to minimise it as much as possible

* main:
  chore: reduce log noise (#2864)
  fix: sync service for non zero height starts with empty store (#2834)
  build(deps): Bump golang.org/x/crypto from 0.43.0 to 0.45.0 in /execution/evm (#2861)
  chore: minor improvement for docs (#2862)
alpe added 3 commits November 20, 2025 17:24
* main:
  chore: bump da (#2866)
  chore: bump  core (#2865)
* main:
  chore: fix some comments (#2874)
  chore: bump node in evm-single (#2875)
  refactor(syncer,cache): use compare and swap loop and add comments (#2873)
  refactor: use state da height as well (#2872)
  refactor: retrieve highest da height in cache (#2870)
  chore: change from event count to start and end height (#2871)
github-merge-queue bot pushed a commit that referenced this pull request Nov 21, 2025
## Overview

Speed up cache write/loads via parallel execution.  

Pulled from  #2836
github-merge-queue bot pushed a commit that referenced this pull request Nov 21, 2025
## Overview

Minor updates to make it easier to trace errors

Extracted from #2836
alpe added 5 commits November 24, 2025 16:21
* main:
  chore: remove extra github action yml file (#2882)
  fix(execution/evm): verify payload status (#2863)
  feat: fetch included da height from store (#2880)
  chore: better output on errors (#2879)
  refactor!: create da client and split cache interface (#2878)
  chore!: rename `evm-single` and `grpc-single` (#2839)
  build(deps): Bump golang.org/x/crypto from 0.42.0 to 0.45.0 in /tools/da-debug in the go_modules group across 1 directory (#2876)
  chore: parallel cache de/serialization (#2868)
  chore: bump blob size (#2877)

// Propose block to raft to share state in the cluster
if e.raftNode != nil {
headerBytes, err := header.MarshalBinary()
Copy link
Contributor

@tac0turtle tac0turtle Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in the flow of this function we decode once and encode twice this data. i wonder if we can make it only decode. This can be a follow up to not inflate this pr.

database ds.Batching,
logger zerolog.Logger,
) (ln *LightNode, err error) {
p2pClient, err := p2p.NewClient(conf.P2P, nodeKey.PrivKey, database, genesis.ChainID, logger, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is reasoning behind moving this from the composing part to the constructor?

Copy link
Contributor Author

@alpe alpe Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not strong reason for this other than make it consistent with full node constructor.
With full-nodes, the p2p client is setup in the failover.go so that it can be reset when the sync-node becomes leader and peers must be empty.

tac0turtle
tac0turtle previously approved these changes Nov 28, 2025
Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me. what sort of latency does the node have for the switch?

alpe added 2 commits December 3, 2025 13:07
* main:
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
  refactor: omit unnecessary reassignment (#2892)
  build(deps): Bump the all-go group across 5 directories with 6 updates (#2881)
  chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889)
  fix: ensure consistent network ID usage in P2P subscriber (#2884)
  build(deps): Bump golangci/golangci-lint-action from 9.0.0 to 9.1.0 (#2885)
  build(deps): Bump actions/checkout from 5 to 6 (#2886)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants