Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf to telegraf communication failures #16401

Open
Hipska opened this issue Jan 15, 2025 · 5 comments
Open

Telegraf to telegraf communication failures #16401

Hipska opened this issue Jan 15, 2025 · 5 comments
Assignees
Labels
bug unexpected problem or unintended behavior

Comments

@Hipska
Copy link
Contributor

Hipska commented Jan 15, 2025

Relevant telegraf.conf

Sender:

[[outputs.influxdb_v2]]
  alias = "backbone"
  urls = ["http://127.0.0.1:8086"]
  bucket_tag = "role"

Receiver:

# Accept metrics over InfluxDB 2.x HTTP API
[[inputs.influxdb_v2_listener]]
  ## Address and port to host InfluxDB listener on
  ## (Double check the port. Could be 9999 if using OSS Beta)
  service_address = ":8086"

  ## Maximum undelivered metrics before rate limit kicks in.
  ## When the rate limit kicks in, HTTP status 429 will be returned.
  ## 0 disables rate limiting
  # max_undelivered_metrics = 0

  ## Maximum duration before timing out read of the request
  # read_timeout = "10s"
  ## Maximum duration before timing out write of the response
  # write_timeout = "10s"

  ## Maximum allowed HTTP request body size in bytes.
  ## 0 means to use the default of 32MiB.
  # max_body_size = "32MiB"

  ## Optional tag to determine the bucket.
  ## If the write has a bucket in the query string then it will be kept in this tag name.
  ## This tag can be used in downstream outputs.
  ## The default value of nothing means it will be off and the database will not be recorded.
  bucket_tag = "role"

  ## Set one or more allowed client CA certificate file names to
  ## enable mutually authenticated TLS connections
  # tls_allowed_cacerts = ["/etc/telegraf/clientca.pem"]

  ## Add service certificate and key
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"

  ## Optional token to accept for HTTP authentication.
  ## You probably want to make sure you have TLS configured above for this.
  # token = "some-long-shared-secret-token"

  ## Influx line protocol parser
  ## 'internal' is the default. 'upstream' is a newer parser that is faster
  ## and more memory efficient.
  parser_type = "upstream"

Logs from Telegraf

Jan 15 07:00:47 grw-pol-u1 telegraf[664564]: 2025-01-15T06:00:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": EOF
Jan 15 07:11:47 grw-pol-u1 telegraf[664564]: 2025-01-15T06:11:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": read tcp 127.0.0.1:45974->127.0.0.1:8086: read: connection reset by peer
Jan 15 07:16:37 grw-pol-u1 telegraf[664564]: 2025-01-15T06:16:37Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CORE&org=": net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:38982->127.0.0.1:8086: write: broken pipe
Jan 15 07:25:47 grw-pol-u1 telegraf[664564]: 2025-01-15T06:25:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=PE&org=": read tcp 127.0.0.1:33632->127.0.0.1:8086: read: connection reset by peer
Jan 15 07:32:57 grw-pol-u1 telegraf[664564]: 2025-01-15T06:32:57Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=PE&org=": EOF
Jan 15 07:39:37 grw-pol-u1 telegraf[664564]: 2025-01-15T06:39:37Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=PE&org=": read tcp 127.0.0.1:38394->127.0.0.1:8086: read: connection reset by peer
Jan 15 07:43:47 grw-pol-u1 telegraf[664564]: 2025-01-15T06:43:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": EOF
Jan 15 07:47:57 grw-pol-u1 telegraf[664564]: 2025-01-15T06:47:57Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=PE&org=": net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:41082->127.0.0.1:8086: write: broken pipe
Jan 15 07:53:47 grw-pol-u1 telegraf[664564]: 2025-01-15T06:53:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": write tcp 127.0.0.1:43808->127.0.0.1:8086: use of closed network connection
Jan 15 07:55:37 grw-pol-u1 telegraf[664564]: 2025-01-15T06:55:37Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": EOF
Jan 15 08:01:57 grw-pol-u1 telegraf[664564]: 2025-01-15T07:01:57Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CORE&org=": read tcp 127.0.0.1:60788->127.0.0.1:8086: read: connection reset by peer
Jan 15 08:02:47 grw-pol-u1 telegraf[664564]: 2025-01-15T07:02:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=PE&org=": read tcp 127.0.0.1:60534->127.0.0.1:8086: read: connection reset by peer
Jan 15 08:45:47 grw-pol-u1 telegraf[664564]: 2025-01-15T07:45:47Z E! [outputs.influxdb_v2::backbone] When writing to [http://127.0.0.1:8086/api/v2/write]: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:53202->127.0.0.1:8086: write: broken pipe

System info

Red Hat Enterprise Linux release 9.4 (Plow)
Telegraf 1.33.1 (git: HEAD@44f3a504)

Docker

No response

Steps to reproduce

  1. Run 2 telegraf instances
  2. Generate metrics
  3. Observe logs indicating connection resets, broken pipes, ...

Expected behavior

Telegraf to be able to communicate with another telegraf reliably without issues.

Actual behavior

Connection issues at random times.

Additional info

No response

@Hipska Hipska added the bug unexpected problem or unintended behavior label Jan 15, 2025
@srebhan
Copy link
Member

srebhan commented Jan 28, 2025

@Hipska do you see any error messages on the receiver (influxdb_v2_listener) side? Please also enable debugging and check if there is something obvious...

@srebhan srebhan self-assigned this Jan 28, 2025
@Hipska
Copy link
Contributor Author

Hipska commented Jan 28, 2025

This is a combined log of sender (2168801) and receiver (2321693), both in debug mode:

Jan 28 17:11:39 grw-pol-u1 telegraf[2168801]: 2025-01-28T16:11:39Z D! [outputs.influxdb_v2::backbone] Wrote batch of 1 metrics in 525.782µs
Jan 28 17:11:39 grw-pol-u1 telegraf[2168801]: 2025-01-28T16:11:39Z D! [outputs.influxdb_v2::backbone] Buffer fullness: 0 / 3147366 metrics
Jan 28 17:11:43 grw-pol-u1 telegraf[2321693]: 2025-01-28T16:11:43Z D! [outputs.http::grw-pol-u1] Wrote batch of 5 metrics in 861.009µs
Jan 28 17:11:43 grw-pol-u1 telegraf[2321693]: 2025-01-28T16:11:43Z D! [outputs.http::grw-pol-u1] Buffer fullness: 0 / 3147366 metrics
Jan 28 17:11:49 grw-pol-u1 telegraf[2168801]: 2025-01-28T16:11:49Z D! [outputs.influxdb_v2::backbone] Buffer fullness: 4 / 3147366 metrics
Jan 28 17:11:49 grw-pol-u1 telegraf[2168801]: 2025-01-28T16:11:49Z E! [agent] Error writing to outputs.influxdb_v2::backbone: Post "http://127.0.0.1:8086/api/v2/write?bucket=CARRIER+ETHERNET&org=": net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:50230->127.0.0.1:8086: write: broken pipe
Jan 28 17:11:53 grw-pol-u1 telegraf[2321693]: 2025-01-28T16:11:53Z D! [outputs.http::grw-pol-u1] Buffer fullness: 0 / 3147366 metrics
Jan 28 17:11:59 grw-pol-u1 telegraf[2168801]: 2025-01-28T16:11:59Z D! [outputs.influxdb_v2::backbone] Wrote batch of 4 metrics in 1.13294ms
Jan 28 17:11:59 grw-pol-u1 telegraf[2168801]: 2025-01-28T16:11:59Z D! [outputs.influxdb_v2::backbone] Buffer fullness: 0 / 3147366 metrics

You can see a successful write on 17:11:39, then buffer has 4 metrics on 17:11:49 which the write encountered an error. Nothing relevant on logs of receiver. And then the 4 metrics seem to have successfully written only at the next flush interval (17:11:59).

Is there is anything relevant I could share? (kernel, ... settings?)

@srebhan
Copy link
Member

srebhan commented Jan 28, 2025

The error

net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:50230->127.0.0.1:8086: write: broken pipe

hints on a closed connection/network issue. I checked the code and if the receiver would close the connection for any reason it would write some output to the log file. So this seems to be a connectivity issue outside of Telegraf's choice...

Is there a possibility that you run into too many open connections or some other system limitation?

@Hipska
Copy link
Contributor Author

Hipska commented Jan 28, 2025

That could indeed be possible. Where / how to check this?

@srebhan
Copy link
Member

srebhan commented Jan 29, 2025

I don't know to be honest. Would need to google it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants