-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix(nats source): prevent slow consumer log explosion #24230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(nats source): prevent slow consumer log explosion #24230
Conversation
pront
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why these traces weren't rate limited by default. It's worth understanding this before going ahead with this PR.
| message = "NATS slow consumer for subscription.", | ||
| subscription_id = %self.subscription_id, | ||
| component_id = %self.component_id, | ||
| internal_log_rate_secs = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant because the default value is already 10.
| counter!( | ||
| "nats_slow_consumer_events_total", | ||
| "component_id" => self.component_id, | ||
| "subscription_id" => self.subscription_id.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this high cardinality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll most likely only ever be equal to 1, since we spin up a single subscription per source -- personally I've never seen anything besides that for Vector.
| "Expected nats_slow_consumer_events_total to be > 0, got {}", | ||
| value | ||
| ); | ||
| println!("Slow consumer events recorded: {}", value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove println!s statements from tests.
@pront that's fair, I'll sit on this one a bit till I have more time to look into this my hypothesis so far is that the NATS server disconnects automatically slow consumers, causing the subscription to be re-created which then skips the rate limit. |
Summary
This PR fixes log spam due to Vector becoming a slow consumer on NATS.
Before:
After:
Vector configuration
How did you test this PR?
via unit tests, running locally
Change Type
Is this a breaking change?
If a user were using a metric to track the number of times the "slow consumer" message appeared in the logs, this would be a breaking change.
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Closes #24229
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.