Replies: 4 comments
-
|
Most likely, it's the application doing something differently over time. Perhaps it uses basic.get instead of consume? Garbage collection is a side effect of doing some other work. To learn what the CPU is really doing, you can use |
Beta Was this translation helpful? Give feedback.
-
|
@Mistra we cannot know what your applications are doing. Chances are, they are opening more and more connections or declaring more and more queues, including leaking such resources. RabbitMQ has metrics for each object category for this and other reasons. There are dozens of metrics available, instead of looking at one of them, use others to correlate, then use your applications' metrics to correlate further. There is a separate metric for |
Beta Was this translation helpful? Give feedback.
-
|
On an unrelated nodes, two node clusters are explicitly recommended against. |
Beta Was this translation helpful? Give feedback.
-
|
I don't know if this instance of Kazoo uses these config files but they both use a recommended way of reducing CPU footprint in mostly idle environments and allow for an unlimited number of channels at the same time. You can set Besides a resource leak that hasn't been identified (it could be in a different virtual host, for example), my only other hypothesis was the use of periodic GC for all processes which is disabled by default since Jun 2017 (8d52a09). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
We are observing a progressive increase in CPU usage over time on a RabbitMQ cluster with Kazoo application connected.
The increase appears to be almost linear, no sudden jumps, and continues even under low message traffic and very low queue churn.
Once the Kazoo consumers disconnect, CPU usage drops immediately, suggesting a correlation between Kazoo connections and the observed CPU load.
From rabbitmq-diagnostics runtime_thread_stats, we can see that the dirty_cpu schedulers show a growing percentage of time spent in gc and gc_full over time.
It looks like RabbitMQ is increasingly spending CPU cycles performing garbage collection rather than message processing.
Only with Kazoo connected we are able to replicate this behavior otherwise with simple consumers and producers we are not able to reproduce this issue.
Environment
Observations
Reproduction steps
Expected behavior
It appears that RabbitMQ’s dirty schedulers are increasingly busy performing garbage collection while we expect them to remain stable.
The load persists even when message volume is low, and resets when those consumers are disconnected.
We would appreciate any guidance or suggestions on how to further diagnose or mitigate this behavior.
Additional context
We are attaching some screenshots and an anonymized report.
anonymized_report.log
Beta Was this translation helpful? Give feedback.
All reactions