Skip to content

Conversation

@carlhoerberg
Copy link
Member

Summary

  • Introduce a buffer pool for IO::Buffered read/write buffers to save memory when handling many concurrent connections
  • Most connections are idle most of the time, so releasing buffers back to a shared pool when not actively in use allows memory to be reused across connections
  • Write buffers are released after flush, read buffers released on close

Test plan

  • All existing tests pass (1218 examples, 0 failures)
  • Manual testing with many concurrent idle connections to verify memory reduction

🤖 Generated with Claude Code

@claude
Copy link

claude bot commented Dec 12, 2025

Claude finished @carlhoerberg's task —— View job


PR Code Review - Buffer Pool Implementation

Todo List:

  • Read and analyze changed files for potential issues
  • Check for bugs, security vulnerabilities, performance issues
  • Verify error handling and Crystal anti-patterns
  • Report findings

Issues Found

1. Race Condition in Buffer Pool (src/stdlib/io_buffered.cr:88-108)

The fill_buffer method has a race condition where it releases the buffer before waiting for readability, then acquires a new one. Between the release and acquire, another fiber could take the released buffer, forcing allocation of a new one:

# Release existing buffer before waiting
if use_pool && self.is_a?(Socket)
  if (p = pool) && (in_buf = @in_buffer) && !in_buf.null?
    p.release(in_buf)
    @in_buffer = Pointer(UInt8).null  # ← Buffer released here
  end
  Crystal::EventLoop.current.wait_readable(self)  # ← Yields to other fibers
end

in_buf = if use_pool && (p = pool)
           @in_buffer ||= p.acquire  # ← May allocate new buffer

2. Missing Error Handling (src/stdlib/io_buffered.cr:107)

No error handling for unbuffered_read which could raise exceptions, potentially leaving the IO in an inconsistent state with a partially initialized buffer.

3. Potential Memory Leak (src/stdlib/io_buffered.cr:190-192)

In the close method, if flush raises an exception, the read buffer won't be released back to the pool:

flush if @out_count > 0  # ← Could raise exception
ensure
  unbuffered_close

The buffer release happens before the ensure block.

4. Performance Issue - Socket Type Check (src/stdlib/io_buffered.cr:93)

The self.is_a?(Socket) check happens on every fill_buffer call for socket connections. This runtime type check could be expensive in high-throughput scenarios.

5. Crystal Anti-pattern - Buffer Size Mismatch (src/stdlib/io_buffered.cr:89,137)

The condition @buffer_size == pool.buffer_size is checked multiple times without caching. If buffer sizes don't match, the pool isn't used but there's no logging/warning about this misconfiguration.


@carlhoerberg carlhoerberg force-pushed the buffer-pool-for-socket-buffers branch 2 times, most recently from 2b2a721 to ac9c6c8 Compare December 12, 2025 01:09
Introduce a buffer pool for IO::Buffered read/write buffers to save memory
when handling many concurrent connections. Most connections are idle most
of the time, so releasing buffers back to a shared pool when not actively
in use allows memory to be reused across connections.

- Add BufferPool class: thread-safe pool with acquire/release operations
- Override IO::Buffered methods to use pooled buffers:
  - Read buffers: wait for socket readability before acquiring buffer,
    release immediately when all buffered data is consumed
  - Write buffers: released after flush
- Initialize pools at startup using configured socket_buffer_size

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@carlhoerberg carlhoerberg force-pushed the buffer-pool-for-socket-buffers branch from ac9c6c8 to 25893c8 Compare December 14, 2025 17:44
carlhoerberg and others added 2 commits December 14, 2025 19:23
Reports per-fiber stack usage and summary statistics for read_loop
fibers to help diagnose memory usage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Refactors BoolChannel to use atomic state and activate/deactivate mechanisms
instead of a dedicated fiber per instance, reducing memory overhead for idle
connections.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@carlhoerberg
Copy link
Member Author

Dropping memory usage for 10.000 connections, each with one channel, from 637MiB to 413MiB, a 35% reduction.

viktorerlingsson added a commit that referenced this pull request Jan 12, 2026
Replace the spawned send_loop fiber with atomic operations and direct
channel state manipulation. This eliminates a fiber allocation that
could fail mid-constructor, which would leave orphaned fibers accessing
uninitialized instance variables and cause segfaults.

This change mitigates the issue in #1595 where BoolChannel.new could fail
after spawning deliver_loop but before initializing all BoolChannel fields.

Changes extracted from PR #1549.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
viktorerlingsson added a commit that referenced this pull request Jan 13, 2026
## Summary

This PR extracts the BoolChannel changes from #1549 

- Eliminate the spawned fiber in `BoolChannel` constructor by replacing
it with atomic operations and direct channel state manipulation
- This mitigates the segfault issue in #1595 where `BoolChannel.new`
could fail after spawning a fiber but before initializing all instance
variables, leaving orphaned fibers that access uninitialized memory
- Also mitigates #1595 just by creating less fibers, resulting in fewer
memory maps

## Changes

- Replace `spawn(name: "BoolChannel#send_loop")` with atomic state
management using `Atomic(Bool)`
- Introduce `StateChannel` class that extends `Channel(Nil)` with
`activate`/`deactivate` methods
- Use `EndlessQueue` to implement channels that signal state without
blocking on send
- Remove the `send_loop` fiber entirely, reducing fiber allocations by
one per BoolChannel instance

Co-authored-by: Carl Hörberg <[email protected]>
viktorerlingsson added a commit that referenced this pull request Jan 13, 2026
This PR extracts the BoolChannel changes from #1549

- Eliminate the spawned fiber in `BoolChannel` constructor by replacing
it with atomic operations and direct channel state manipulation
- This mitigates the segfault issue in #1595 where `BoolChannel.new`
could fail after spawning a fiber but before initializing all instance
variables, leaving orphaned fibers that access uninitialized memory
- Also mitigates #1595 just by creating less fibers, resulting in fewer
memory maps

- Replace `spawn(name: "BoolChannel#send_loop")` with atomic state
management using `Atomic(Bool)`
- Introduce `StateChannel` class that extends `Channel(Nil)` with
`activate`/`deactivate` methods
- Use `EndlessQueue` to implement channels that signal state without
blocking on send
- Remove the `send_loop` fiber entirely, reducing fiber allocations by
one per BoolChannel instance

Co-authored-by: Carl Hörberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants