Skip to content

Kafka error when sending a large number of transactions #3636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
GoldUser32 opened this issue Mar 27, 2025 · 9 comments
Open
1 task done

Kafka error when sending a large number of transactions #3636

GoldUser32 opened this issue Mar 27, 2025 · 9 comments

Comments

@GoldUser32
Copy link

Self-Hosted Version

Self-Hosted Version Sentry 25.2.0

CPU Architecture

x86_64

Docker Version

28.0.1

Docker Compose Version

2.33.1

Machine Specification

  • My system meets the minimum system requirements of Sentry

Steps to Reproduce

When the applications sends about 600k transactions in hour to sentry. I get a transaction-consumer container error

I've I tried cleaning out the
kafka kafka-consumer-groups --bootstrap-server kafka:9092 --all-groups --all-topics --reset-offsets --to-latest --execute
also tried stopping the containers, deleting the kafka volume, and restarting again .install.sh.
This works for a while. then the transactions don't appear in sentry again.

my sentry .env file include
SENTRY_EVENT_RETENTION_DAYS=30

from docker-compose.yml
kafka:
<<: *restart_policy
image: "confluentinc/cp-kafka:7.6.1"
# ports:
# - 9092
environment:
# https://docs.confluent.io/platform/current/installation/docker/config-reference.html#cp-kakfa-example
KAFKA_PROCESS_ROLES: "broker,controller"
KAFKA_CONTROLLER_QUORUM_VOTERS: "[email protected]:29093"
KAFKA_CONTROLLER_LISTENER_NAMES: "CONTROLLER"
KAFKA_NODE_ID: "1001"
CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
KAFKA_LISTENERS: "PLAINTEXT://0.0.0.0:29092,INTERNAL://0.0.0.0:9093,EXTERNAL://0.0.0.0:9092,CONTROLLER://0.0.0.0:29093"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://127.0.0.1:29092,INTERNAL://kafka:9093,EXTERNAL://kafka:9092"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "PLAINTEXT:PLAINTEXT,INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: "PLAINTEXT"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: "1"
KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: "1"
#KAFKA_LOG_CLEANUP_POLICY: delete
KAFKA_LOG_CLEANER_ENABLE: true
KAFKA_LOG_CLEANUP_POLICY: delete
KAFKA_LOG_RETENTION_HOURS: "12"
KAFKA_MESSAGE_MAX_BYTES: "700000000" #50MB or bust
KAFKA_MAX_REQUEST_SIZE: "600000000" #50MB on requests apparently too
# KAFKA_MAX_RECORDS_PER_USER_OP:
CONFLUENT_SUPPORT_METRICS_ENABLE: "false"
KAFKA_LOG4J_LOGGERS: "kafka.cluster=WARN,kafka.controller=WARN,kafka.coordinator=WARN,kafka.log=WARN,kafka.server=WARN,state.change.logger=WARN"
KAFKA_LOG4J_ROOT_LOGLEVEL: "DEBUG"
KAFKA_TOOLS_LOG4J_LOGLEVEL: "DEBUG"
ulimits:
nofile:
soft: 8192
hard: 8192
volumes:
- "sentry-kafka:/var/lib/kafka/data"
- "sentry-kafka-log:/var/lib/kafka/log"
- "sentry-secrets:/etc/kafka/secrets"
healthcheck:
<<: *healthcheck_defaults
test: ["CMD-SHELL", "nc -z localhost 9092"]
interval: 10s
timeout: 10s
retries: 30

from relay/config.yml
limits:
max_concurrent_requests: 100000
max_concurrent_queries: 1000
max_thread_count: 800

Expected Result

working sentry

Actual Result

“File “/.venv/lib/python3.13/site-packages/arroyo/backends/kafka/consumer.py”, line 422, in poll
transactions-consumer-1 | raise OffsetOutOfRange(str(error))
transactions-consumer-1 | arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str=“fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)”}"”

Event ID

No response

@GoldUser32
Copy link
Author

Image Image

@GoldUser32
Copy link
Author

I also tried doing as described in this issue #1894 (comment)
it only works for a short time

Image

@aldy505
Copy link
Collaborator

aldy505 commented Mar 29, 2025

What is your server specifications? Usually this means the consumers can't keep up with the throughput.

@GoldUser32
Copy link
Author

Self-Hosted Version Sentry run on baremetal server Delll PowerEdge R630: 2xCPU Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, 4xDDR4 32Gb, 2xSSD 4T GB. What throughput do you mean?

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Apr 2, 2025
@GoldUser32
Copy link
Author

I update sentry to latest version 25.3.0 and now see errors in clickhouse container

clickhouse-1 | 2025.04.02 09:53:59.938628 [ 535 ] {} ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
clickhouse-1 |
clickhouse-1 | 0. Poco::Net::SocketImpl::error(int, String const&) @ 0x0000000015b3dbf2 in /usr/bin/clickhouse
clickhouse-1 | 1. Poco::Net::SocketImpl::peerAddress() @ 0x0000000015b40376 in /usr/bin/clickhouse
clickhouse-1 | 2. DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0x000000000c896cc6 in /usr/bin/clickhouse
clickhouse-1 | 3. DB::HTTPServerRequest::HTTPServerRequest(std::shared_ptrDB::IHTTPContext, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x000000001315451b in /usr/bin/clickhouse
clickhouse-1 | 4. DB::HTTPServerConnection::run() @ 0x0000000013152ba4 in /usr/bin/clickhouse
clickhouse-1 | 5. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in /usr/bin/clickhouse
clickhouse-1 | 6. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in /usr/bin/clickhouse
clickhouse-1 | 7. Poco::PooledThread::run() @ 0x0000000015c7a667 in /usr/bin/clickhouse
clickhouse-1 | 8. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in /usr/bin/clickhouse
clickhouse-1 | 9. ? @ 0x00007f9e36cac609 in ?
clickhouse-1 | 10. ? @ 0x00007f9e36bd1353 in ?
clickhouse-1 | (version 23.8.11.29.altinitystable (altinity build))

@GoldUser32
Copy link
Author

@aldy505 any updates? I tried fix it with changed rust-consumer to consumer in my docker-compose.yml and reduild containers. Also add in clickhouse/config.xml
<listen_host>0.0.0.0</listen_host>
<listen_host>9000</listen_host>
<tcp_port>9000</tcp_port>

@aldy505
Copy link
Collaborator

aldy505 commented Apr 4, 2025

Hello. Sorry this fell out of my radar. For that ClickHouse errors, can you ingest events/data just fine? Staying in rust-consumer is fine as long as you can ingest data.

It's a known issue anyway, you can safely ignore it. See getsentry/snuba#5707

@GoldUser32
Copy link
Author

I see errors only, only in discover. Developers send just this data.
After added listen host, ports in config.xml. I have new error in click:<Warning> Application: Listen [9000]:9004 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address.
How configure click right?
Thank you, I return rust-consumer in docker compose config back now.

Image

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Apr 4, 2025
@aldy505
Copy link
Collaborator

aldy505 commented Apr 7, 2025

After added listen host, ports in config.xml. I have new error in click: Application: Listen [9000]:9004 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address.

This is an error on your ClickHouse config. You should remove these (if you still have it):

<listen_host>0.0.0.0</listen_host>
<listen_host>9000</listen_host>
<tcp_port>9000</tcp_port>

Listen Host of 9000 is wrong, since it should be port, not host. By default, ClickHouse listen to all interface (meaning 0.0.0.0 and [::] [the IPv6 equivalent]) on port 9000 for TCP and 8123 for HTTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Community
Status: No status
Development

No branches or pull requests

2 participants