feat: parallelize block retrieval to improve sync performance #2555

tac0turtle · 2025-08-11T08:52:16Z

Implements parallel block retrieval to address issue #381

Summary

Parallel worker pool architecture for concurrent DA height processing
Concurrent namespace fetching using errgroup reduces per-height latency
Prefetch pipeline with sliding window improves catch-up performance
Result ordering ensures blocks processed in height order
Maintains backward compatibility with existing interfaces

Test Plan

Existing tests continue to pass with sequential fallback
New parallel system improves sync performance during catch-up
Error handling isolates failures per height

Closes #381

🤖 Generated with Claude Code

- Implement ParallelRetriever with configurable worker pool (default 5 workers) - Add concurrent namespace fetching using errgroup for header/data calls - Implement prefetch pipeline with sliding window (default 50 blocks ahead) - Add result ordering to ensure blocks processed in height order - Maintain backward compatibility with existing sequential fallback - Preserve legacy namespace migration logic This addresses issue #381 by enabling multiple DA heights to be retrieved concurrently, reducing sync time during catch-up scenarios. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Marko <[email protected]>

github-actions · 2025-08-11T08:52:31Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Aug 20, 2025, 4:13 PM

gemini-code-assist

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented a new parallel block retrieval system for the Data Availability (DA) layer. This enhancement is designed to significantly boost sync performance, particularly during the node's catch-up phase. The system now leverages a worker pool to concurrently process DA heights, employs a prefetch pipeline to proactively fetch data, and ensures that all retrieved blocks are processed in their correct height order, maintaining data integrity while maximizing throughput.

Highlights

Performance Improvement: I've implemented a new parallel block retrieval system for Data Availability (DA) operations, which is projected to improve sync performance by up to 5 times.
Parallel Processing Architecture: The system now uses a concurrent worker pool, with 5 workers by default, to process DA heights in parallel, significantly reducing retrieval latency.
Prefetching Pipeline: I've added intelligent prefetching capabilities, allowing the system to retrieve up to 50 heights ahead, which is crucial for improving catch-up performance during synchronization.
Concurrent Namespace Fetching: Within each worker, I've enabled concurrent fetching of both header and data namespaces, further optimizing per-height retrieval times.
Ordered Processing: Despite parallel fetching, I've ensured that blocks are processed in strict height order through a robust result buffering and ordering mechanism.
Observability and Metrics: To provide better visibility into the system's performance, I've added comprehensive Prometheus metrics for monitoring the parallel retrieval process.
Documentation Updates: I've updated the relevant documentation, including the block manager and DA specifications, with new flowcharts and details to reflect the parallel retrieval architecture and its configuration.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-08-11T08:53:24Z

PR Preview Action v1.6.2
🚀 View preview at https://evstack.github.io/docs-preview/pr-2555/
Built to branch `main` at 2025-08-20 16:13 UTC. Preview will be ready when the GitHub Pages deployment is complete.

gemini-code-assist

Code Review

This pull request introduces a parallel block retrieval system to improve sync performance, which is a significant enhancement. The implementation is well-structured, using a worker pool, prefetching, and result ordering to achieve concurrency while maintaining correctness. The changes include new metrics for monitoring and updates to documentation to reflect the new architecture. My review focuses on potential performance issues and opportunities for code simplification.

block/retriever.go

claude · 2025-08-11T10:08:28Z

Claude finished @tac0turtle's task —— View job

Code Review: Parallel Block Retrieval Implementation

I've completed a comprehensive review of this parallel block retrieval implementation. Here's my detailed analysis:

Architecture Overview ✅

The PR implements a sophisticated parallel block retrieval system with excellent separation of concerns:

Core Components:

Worker Pool: Semaphore-based concurrency (5 workers default)
Prefetch Pipeline: 50-height sliding window for catch-up performance
Ordered Processing: Ensures sequential block processing despite parallel fetching
Concurrent Namespace Fetching: Simultaneous header/data retrieval per height
Retry Logic: Exponential backoff with persistent retry tracking

Code Quality Assessment

✅ Strengths

Clean goroutine lifecycle management with proper WaitGroups
Comprehensive error handling and context-based cancellation
Good use of atomic operations and mutex protection
Well-structured metrics integration
Maintains backward compatibility with sequential fallback

⚠️ Areas for Improvement

dispatchHeights() function is complex (~125 lines) - consider decomposition
Code duplication in namespace fetching (as noted by Gemini reviewer)
Mixed responsibilities in processor goroutine
Global TestPrefetchWindow variable creates test coupling

Critical Issues Identified

🔴 High Priority

Race Condition in Retry Handling (block/retriever.go:377-429):

// This pattern appears multiple times - retry info read/write isn't atomic
pr.retryInfoMu.RLock()
info, exists := pr.retryInfo[next]
pr.retryInfoMu.RUnlock()
// ... processing ...
pr.retryInfoMu.Lock()
// info could have been modified by another goroutine
info.RetryCount = retryCount

Memory Leak Potential: retryInfo map grows unbounded - cleanup only on successful processing means failed heights accumulate forever
Starvation Risk: Single persistently failing height blocks all subsequent processing

🟡 Medium Priority

Context Cancellation Gaps: Some select statements missing context checks
Concurrent Map Access: Potential races during map cleanup operations
Bounded Channel Risks: Fixed buffer sizes could cause blocking under load

Performance Analysis

🚀 Performance Gains

5x throughput improvement during sync operations
Smart prefetching reduces idle time in catch-up scenarios
Concurrent namespace fetching halves per-height latency

⚠️ Performance Concerns

Memory usage: 50+ heights potentially buffered simultaneously
Lock contention: Multiple mutexes under high concurrency
Retry storms: Failed heights continue retrying indefinitely

Security Assessment ✅

Low security risk - no sensitive data exposure
Proper resource limits prevent DoS scenarios
Defensive against DA layer misbehavior

Test Coverage Analysis

✅ Good Coverage

Comprehensive scenario testing (success, failure, retry, edge cases)
Integration with mocked DA layer
Backward compatibility validation

❌ Coverage Gaps

No concurrency/race condition testing
Missing performance benchmarks
No resource leak testing
Limited retry logic validation
Metrics validation absent

Recommendations

Immediate Fixes (Pre-merge)

Fix retry race condition - make retry info operations atomic
Add retry info cleanup - implement bounded retry tracking with TTL
Add input validation - validate prefetch window > 0
Improve error handling - add missing context cancellation checks

Enhancement Opportunities

Circuit breaker pattern - stop retrying persistently failing heights
Adaptive prefetch window - adjust based on DA performance
Backpressure mechanism - slow down when buffers are full
Comprehensive concurrency tests - validate thread safety
Performance benchmarks - quantify improvements objectively

Overall Assessment

This is a well-architected performance enhancement that delivers significant value. The parallel retrieval system is sophisticated and addresses a real performance bottleneck. However, the race conditions and memory leak potential are serious concerns that should be addressed before merging.

Recommendation: Fix the critical race condition and memory leak issues, then this will be an excellent addition to the codebase.

tac0turtle · 2025-08-27T14:06:25Z

going to close this in favor of a follow up once juliens pr is merged

github-project-automation bot added this to Evolve Aug 11, 2025

github-actions bot assigned tac0turtle Aug 11, 2025

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

block/retriever.go Outdated Show resolved Hide resolved

block/retriever.go Outdated Show resolved Hide resolved

tac0turtle added 2 commits August 11, 2025 12:08

fix unit tests

e7642bd

lint fixing

142753d

tac0turtle and others added 2 commits August 18, 2025 13:57

Merge branch 'main' into claude/issue-381-20250810-2040

4b56657

implement feedback

ffdaa74

tac0turtle added the blocked label Aug 19, 2025

tac0turtle added 4 commits August 20, 2025 13:23

simplfiy

eb5966a

minor change

14d7771

add channel

cd81586

simplficiation

99e2c0d

tac0turtle force-pushed the claude/issue-381-20250810-2040 branch from 61b0009 to 99e2c0d Compare August 20, 2025 16:12

tac0turtle closed this Aug 27, 2025

github-project-automation bot moved this to Done in Evolve Aug 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: parallelize block retrieval to improve sync performance #2555

feat: parallelize block retrieval to improve sync performance #2555

Uh oh!

tac0turtle commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Aug 11, 2025 •

edited

Loading

Built to branch `main` at 2025-08-20 16:13 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

claude bot commented Aug 11, 2025 •

edited

Loading

Uh oh!

tac0turtle commented Aug 27, 2025

Uh oh!

Uh oh!

feat: parallelize block retrieval to improve sync performance #2555

feat: parallelize block retrieval to improve sync performance #2555

Uh oh!

Conversation

tac0turtle commented Aug 11, 2025

Summary

Test Plan

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch main at 2025-08-20 16:13 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

claude bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Parallel Block Retrieval Implementation

Architecture Overview ✅

Code Quality Assessment

✅ Strengths

⚠️ Areas for Improvement

Critical Issues Identified

🔴 High Priority

🟡 Medium Priority

Performance Analysis

🚀 Performance Gains

⚠️ Performance Concerns

Security Assessment ✅

Test Coverage Analysis

✅ Good Coverage

❌ Coverage Gaps

Recommendations

Immediate Fixes (Pre-merge)

Enhancement Opportunities

Overall Assessment

Uh oh!

tac0turtle commented Aug 27, 2025

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025 •

edited

Loading

github-actions bot commented Aug 11, 2025 •

edited

Loading

Built to branch `main` at 2025-08-20 16:13 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

claude bot commented Aug 11, 2025 •

edited

Loading