-
Notifications
You must be signed in to change notification settings - Fork 47
Enhance memory pressure handling with collection compaction #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add comprehensive collection compaction during systemd memory pressure events to reduce memory usage and improve GC performance. Changes: - Add Array#trim_to_size monkey patch to stdlib for in-place compaction with configurable threshold (default 2x capacity vs size) - Implement compact_collections cascade pattern across entire hierarchy: * Server → UserStore, ParameterStore, MQTT Brokers, VHostStore * VHostStore → internal hash + cascade to all VHosts * VHost → queues, exchanges, consumers, connections, shovels, upstreams * Queue → deliveries, consumers, unacked + MessageStore * Exchange → bindings in all types (direct, topic, fanout, headers, consistent_hash) * Client → channels hash + cascade to all Channels * Channel → unacked, tx arrays, consumers, IO::Memory objects * MQTT → Broker clients, Session unacked messages * Stores → UserStore, ParameterStore, ShovelStore, UpstreamStore - Call dontneed on all MFiles (memory-mapped segments) to release physical memory pages while keeping virtual mapping intact - Update launcher memory pressure handler to: * Log when pressure is detected * Run full collection compaction before GC * Log elapsed time for monitoring Collections are compacted when capacity > size * 2 using: - Array: trim_to_size for in-place buffer reduction - Hash/Deque: .dup reassignment to create right-sized copy - IO::Memory: new instance when empty Expected benefits: - 10-50 MB memory savings during typical pressure events - Faster GC cycles due to fewer large objects - Better recovery from memory spikes - Reduced swap usage on memory-constrained systems 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Claude finished @carlhoerberg's task —— View job Code Review Complete ✅I've analyzed the entire collection compaction implementation across 21 files. Here are the issues that need fixing: Issues Found1. Thread Safety Issue in Array#trim_to_size (src/stdlib/array.cr:19-28)The def trim_to_size(*, extra : Int32 = 0, threshold : Int32 = 2) : Bool
# ...
rewind
resize_to_capacity(@size + extra)
true
endProblem: The Fix: Add synchronization or make this method thread-safe if used concurrently. 2. Missing Error Handling for Memory Allocation (src/stdlib/array.cr:26)The resize_to_capacity(@size + extra)Problem: If memory allocation fails during compaction (which is likely during memory pressure), this will raise an exception and crash the compaction process, leaving collections in an inconsistent state. Fix: Wrap in proper error handling to gracefully handle allocation failures. 3. Race Condition in MessageStore Compaction (src/lavinmq/message_store.cr:588-599)Collection compaction occurs without proper synchronization with concurrent file operations: def compact_collections
@deleted = @deleted.dup if @deleted.capacity > @deleted.size * 2
@segments = @segments.dup if @segments.capacity > @segments.size * 2
# ...
@segments.each_value(&.dontneed)
@acks.try &.each_value(&.dontneed)
endProblem: The hash reassignment with Fix: Use 4. Excessive MFile dontneed Calls During Memory Pressure (src/lavinmq/message_store.cr:597-598)@segments.each_value(&.dontneed)
@acks.try &.each_value(&.dontneed)Problem: Calling Fix: Consider throttling or limiting dontneed calls, or only calling it on segments that haven't been accessed recently. Performance Concerns5. Aggressive 2x Threshold May Cause Overhead (Multiple files)The 2x capacity threshold is used consistently but may be too aggressive for some high-frequency collections: @deliveries = @deliveries.dup if @deliveries.capacity > @deliveries.size * 2Problem: During normal operation with frequent allocation/deallocation patterns, this could cause excessive compaction overhead. Suggestion: Consider making the threshold configurable or using different thresholds for different collection types. Minor Issues6. Potential Memory Leak in Channel (src/lavinmq/amqp/channel.cr:834)@next_msg_body_tmp = IO::Memory.new if @next_msg_body_tmp.size == 0 && @next_msg_body_tmp.capacity > 0Issue: This creates a new Fix: Explicitly call Recommendation: Focus on fixing issues #1 and #3 as they pose the highest risk of memory corruption and race conditions during memory pressure events. |
|
How will this play with multi-threading? |
Summary
Enhances systemd memory pressure monitoring to proactively compact collections and reduce memory usage beyond just calling GC. This addresses the issue where Crystal collections (Array, Hash, Deque) never shrink their internal buffers, leading to wasted memory after temporary spikes.
Problem
Currently, LavinMQ only calls
GC.collectwhen systemd detects memory pressure. However, Crystal's collections retain their maximum capacity even after elements are removed. A queue that grew to 10,000 messages keeps that buffer size even when drained to 10 messages, wasting memory.Solution
Implements a cascading
compact_collectionspattern throughout the entire object hierarchy:Core Components
trim_to_sizemethod for in-place buffer compaction.dupreassignment to create right-sized copiesdontneedon memory-mapped files to release physical pagesCompaction Cascade
Compaction Threshold
Collections are compacted when
capacity > size * 2, providing aggressive memory reclamation during pressure events while avoiding excessive compaction overhead during normal operation.Changes
Array#trim_to_sizewithrewindandensure_capacityhelperscompact_collectionsin 21 files across the codebaseExpected Benefits
Test Plan
make bin/lavinmq CRYSTAL_FLAGS=make testNotes
dontneedis safe - kernel can reclaim physical pages without breaking mmap🤖 Generated with Claude Code