Data loss on stop #1547

allenluce · 2018-03-30T23:26:06Z

Up to the last 15 seconds of aggregated data is lost when shutting down the statsd server (with statsd.Stop()). Even when using an aggregator with the flush interval quite low, some seconds of data don't end up getting pushed to the backend.

Is there a recommended way to flush data to prevent this from happening?

The text was updated successfully, but these errors were encountered:

truthbk · 2018-05-15T19:32:14Z

@allenluce you're 100% right, it looks like statsd.Stop() here doesn't flush the aggregator with whatever it may contain at that point, the process shuts down without emptying those packets.

I don't believe we have a way around this at the moment. The flushes happen periodically as you already know, so depending on when during the flush interval you request the stop() you might lose 1s or 15s. We'd have to add some logic to the shutdown code. That would make the process teardown a little slower, but it does seem like the right thing to do. There are still things that can go wrong at the forwarder level... so we'd have to make this a best-effort thing.

We'll look into it. Thank you for bringing this up.

visciang · 2018-05-18T07:53:32Z

This issue is very annoying if you run the agent in a "side container" alongside a AWS Fargate Task (a short living "docker run"). When the main task ends, the agent container is stopped and it doesn't flush metrics / events / APM / etc.

The "side car" pattern only works for AWS Fargate Services (long living tasks).

As a workaround we currently deploy the bunch of agents as a AWS Fargate Services, used by Tasks to report datadog metrics.

baxang · 2021-01-28T06:08:54Z

Seems like #4129 addressed this issue.

sgnn7 · 2021-01-28T20:48:25Z

@allenluce / @visciang Can you try out the new version of the agent to see if this issue is resolved now?

miketheman · 2021-09-30T14:14:19Z

Seems similar to #3940

truthbk added the [deprecated] team/agent-core Deprecated. Use metrics-logs / shared-components labels instead.. label May 15, 2018

jaredpetersen mentioned this issue Apr 23, 2024

Datadog Agent won't flush when receiving SIGTERM #3940

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data loss on stop #1547

Data loss on stop #1547

allenluce commented Mar 30, 2018

truthbk commented May 15, 2018

visciang commented May 18, 2018

baxang commented Jan 28, 2021

sgnn7 commented Jan 28, 2021

miketheman commented Sep 30, 2021

Data loss on stop #1547

Data loss on stop #1547

Comments

allenluce commented Mar 30, 2018

truthbk commented May 15, 2018

visciang commented May 18, 2018

baxang commented Jan 28, 2021

sgnn7 commented Jan 28, 2021

miketheman commented Sep 30, 2021