Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low Throughput (~4000 TPS) with Envoy Configuration in Kubernetes Setup #38513

Open
Tharsanan1 opened this issue Feb 21, 2025 · 3 comments
Open

Comments

@Tharsanan1
Copy link

Hi, I recently conducted a performance test using the following Envoy configuration:

admin:
      access_log_path: /tmp/admin_access.log
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 9901

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager"
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: k8s_backend
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router"
              dynamic_stats: false
              suppress_envoy_headers: true



  clusters:
  - name: k8s_backend
    connect_timeout: 5s
    type: LOGICAL_DNS
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: k8s_backend
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: backend.apk-perf-test.svc.cluster.local
                port_value: 80  # Adjust this port to match your service's target port
    health_checks:
    - timeout: 5s
      interval: 10s
      unhealthy_threshold: 2
      healthy_threshold: 2
      http_health_check:
        path: "/health"  # Adjust this path if your service has a health endpoint

This setup is deployed in a Kubernetes environment. You can find the related deployment YAML in this file

I also downloaded the Envoy config dump from the test setup, available here

Test Summary
Setup: Two JMeter servers (AKS Standard_F8s_v2) send requests(POST payload size 50b and netty backend return the body as it is) to a Kubernetes cluster within the same VNet.
Backend: A Netty-based backend with a CPU limit of 2000m and a memory limit of 4GB.
Envoy Pod: Configured with a CPU limit of 1000m and a memory limit of 512MB.

Observations
With Envoy in the pipeline, I’m only achieving ~4000 TPS (transactions per second), which seems unusually low.
Resource utilization shows Envoy fully consuming its allocated 1000m CPU, while the Netty backend only uses ~350m CPU. This suggests the bottleneck isn’t the backend.

Jmeter logs: here

To confirm, I bypassed Envoy and sent requests directly from JMeter to the Netty backend, achieving ~30,000 TPS. This indicates the Netty backend is capable of handling much higher throughput.

Question
Is there anything in the Envoy configuration above that could explain the low TPS (~4000)? Could this be due to a misconfiguration or resource limitation specific to Envoy? Any suggestions for optimization would be appreciated!

@Tharsanan1 Tharsanan1 added the triage Issue requires triage label Feb 21, 2025
@Tharsanan1
Copy link
Author

Update on this. When I set the concurrency to 2, the TPS jumped to ~8200.
I tried increasing the concurrency to 3 and 4 but the i got 7000, 6000 for them. So the best i got is for concurrency = 2.

@adisuissa adisuissa added area/perf and removed triage Issue requires triage labels Feb 21, 2025
@adisuissa
Copy link
Contributor

Please look at https://www.envoyproxy.io/docs/envoy/latest/faq/performance/how_to_benchmark_envoy as there may be some config knobs that can be set that will increase the throughput.
cc @KBaichoo who may have some suggestions.

@wbpcode
Copy link
Member

wbpcode commented Feb 22, 2025

First, please keep the concurrency have same value with the CPU cores that used by the Envoy. Second, ensure your backend has good enough performance. Third, ensure your benchmark client have good enough performance (I will suggest you to use wrk, nighthawk, fortio, hey, etc).

And also check the #19103. We do observed some performance degration. But it shouldn't be this bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants