Llamafiler HTTP server stability/reliability fixes #767

vlasky · 2025-06-03T03:20:06Z

Remove aggressive "cancel oldest client" logic in Worker::begin()

Previously, when all worker threads were busy, the code would forcibly cancel the oldest active connection to make room for a new one. This approach:

Prematurely terminates in-flight requests, leading to broken or truncated responses.
Forces cleanup of file descriptors mid-stream, causing spurious "Illegal seek"/EBADF errors.
Circumvents the TCP backlog queuing provided by the OS, instead dropping live clients and degrading user experience.

By deleting this block, we let the kernel's listen backlog naturally queue incoming connections until a worker becomes available. As a result:

Active requests are no longer killed arbitrarily under load.
File descriptors aren't closed unexpectedly, eliminating related "static asset pread" failures.
The server benefits from standard TCP connection handling without manual interference.

This change improves reliability under high concurrency and avoids unintended side effects from thread cancellation.

Fix partial write handling in send_binary()

The current send_binary() implementation treats any write() that returns
less than the requested byte count as a failure, immediately setting
close_connection_ = true and returning false.

This is incorrect behavior. Per POSIX, write() may legitimately return
fewer bytes than requested when:

The socket send buffer is nearly full
Network congestion causes backpressure
Large write sizes exceed kernel buffers
System is under memory pressure

These partial writes are normal, not error conditions. The current code
incorrectly drops active connections during normal operation, especially
under load when partial writes become more common.

This commit replaces the single write() call with a proper write loop
that:

Accumulates partial writes until all data is sent
Retries on EAGAIN/EWOULDBLOCK without closing the connection
Only treats actual errors (not partial writes) as failures
Maintains the existing error logging behavior

This fix prevents spurious connection drops during large responses or
when the network is congested, significantly improving reliability under
production load.

Fixes: Connection drops during static file serving
Fixes: Increased failure rate under high load

Increase file transfer buffer from 512 to 16384 bytes

The 512-byte buffer size results in excessive system calls when serving
files. For a 1MB file, this requires 2048 read/write operations.

Using 16KB reduces system call overhead by 32x and better matches typical
OS page sizes and network buffer defaults. This should improve throughput
for static file serving while maintaining reasonable stack usage.

Previously, when all worker threads were busy, the code would forcibly cancel the oldest active connection to make room for a new one. This approach: * Prematurely terminates in-flight requests, leading to broken or truncated responses. * Forces cleanup of file descriptors mid-stream, causing spurious "Illegal seek"/EBADF errors. * Circumvents the TCP backlog queuing provided by the OS, instead dropping live clients and degrading user experience. By deleting this block, we let the kernel's listen backlog naturally queue incoming connections until a worker becomes available. As a result: * Active requests are no longer killed arbitrarily under load. * File descriptors aren't closed unexpectedly, eliminating related "static asset pread" failures. * The server benefits from standard TCP connection handling without manual interference. This change improves reliability under high concurrency and avoids unintended side effects from thread cancellation.

The current send_binary() implementation treats any write() that returns less than the requested byte count as a failure, immediately setting close_connection_ = true and returning false. This is incorrect behavior. Per POSIX, write() may legitimately return fewer bytes than requested when: - The socket send buffer is nearly full - Network congestion causes backpressure - Large write sizes exceed kernel buffers - System is under memory pressure These partial writes are normal, not error conditions. The current code incorrectly drops active connections during normal operation, especially under load when partial writes become more common. This commit replaces the single write() call with a proper write loop that: - Accumulates partial writes until all data is sent - Retries on EAGAIN/EWOULDBLOCK without closing the connection - Only treats actual errors (not partial writes) as failures - Maintains the existing error logging behavior This fix prevents spurious connection drops during large responses or when the network is congested, significantly improving reliability under production load. Fixes: Connection drops during static file serving Fixes: Increased failure rate under high load

The 512-byte buffer size results in excessive system calls when serving files. For a 1MB file, this requires 2048 read/write operations. Using 16KB reduces system call overhead by 32x and better matches typical OS page sizes and network buffer defaults. This should improve throughput for static file serving while maintaining reasonable stack usage.

llamafile/server/client.cpp

llamafile/server/worker.cpp

- Fix URL prefix normalization to handle double slashes (fixes mozilla-ai#787) - Consolidates consecutive slashes (// -> /) - Ensures prefix starts with single slash - Removes trailing slash (except for root) - Matches old server behavior exactly - Fix .args file loading order (fixes mozilla-ai#783) - Load .args before determining program type - Allows --server --v2 flags in .args to work correctly - Fix connection stability issues (addresses mozilla-ai#767) - Remove aggressive client cancellation when workers are busy - Let TCP backlog handle connection queuing naturally - Fix partial write handling with simple retry logic - Increase file transfer buffer from 512B to 16KB These minimal changes make Server v2 production-ready while maintaining backward compatibility. All fixes follow existing patterns from the old server implementation.

- Add FLAG_http_write_timeout (default 60s) for configurable write timeouts - Replace busy-loop with poll() when socket writes return EAGAIN/EWOULDBLOCK - Add EINTR handling for signal-interrupted system calls - Fix partial write handling in safe_writev() across multiple iovecs - Document new flag in man page

vlasky · 2025-12-03T23:54:28Z

Commit 521f7b7 addresses remaining issues with socket write handling that weren't fully solved by the previous partial write fix (item 2).

Problems addressed:

Busy-loop on EAGAIN/EWOULDBLOCK: When write() returned EAGAIN/EWOULDBLOCK (socket buffer full), the code would immediately retry in a tight loop, wasting CPU cycles. Now we use poll() on the socket to efficiently wait for it to become writable, as recommended. We have a new setting --http-write-timeout flag (default: 60 seconds), which is the timeout for socket write operations.
Missing EINTR handling: Signal-interrupted system calls weren't being retried, which could cause spurious connection failures.
Incomplete partial write handling in safe_writev(): The vectored I/O function only called writev() once and didn't handle partial writes across multiple iovecs. Now we properly handle this by tracking progress across the iovec array and retrying until all data is sent or an error occurs.

vlasky added 3 commits June 3, 2025 12:51

github-actions bot added the llamafile label Jun 3, 2025

jart requested changes Jun 27, 2025

View reviewed changes

llamafile/server/client.cpp Show resolved Hide resolved

llamafile/server/worker.cpp Show resolved Hide resolved

anivar mentioned this pull request Aug 16, 2025

Fix Server v2 production issues (#767, #783, #787) #788

Open

vlasky requested a review from jart December 4, 2025 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llamafiler HTTP server stability/reliability fixes #767

Llamafiler HTTP server stability/reliability fixes #767

vlasky commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

vlasky commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Llamafiler HTTP server stability/reliability fixes #767

Are you sure you want to change the base?

Llamafiler HTTP server stability/reliability fixes #767

Conversation

vlasky commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

vlasky commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants