Low Throughput (~4000 TPS) with Envoy Configuration in Kubernetes Setup #38513

Tharsanan1 · 2025-02-21T09:10:18Z

Hi, I recently conducted a performance test using the following Envoy configuration:

admin:
      access_log_path: /tmp/admin_access.log
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 9901

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager"
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: k8s_backend
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router"
              dynamic_stats: false
              suppress_envoy_headers: true



  clusters:
  - name: k8s_backend
    connect_timeout: 5s
    type: LOGICAL_DNS
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: k8s_backend
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: backend.apk-perf-test.svc.cluster.local
                port_value: 80  # Adjust this port to match your service's target port
    health_checks:
    - timeout: 5s
      interval: 10s
      unhealthy_threshold: 2
      healthy_threshold: 2
      http_health_check:
        path: "/health"  # Adjust this path if your service has a health endpoint

This setup is deployed in a Kubernetes environment. You can find the related deployment YAML in this file

I also downloaded the Envoy config dump from the test setup, available here

Test Summary
Setup: Two JMeter servers (AKS Standard_F8s_v2) send requests(POST payload size 50b and netty backend return the body as it is) to a Kubernetes cluster within the same VNet.
Backend: A Netty-based backend with a CPU limit of 2000m and a memory limit of 4GB.
Envoy Pod: Configured with a CPU limit of 1000m and a memory limit of 512MB.

Observations
With Envoy in the pipeline, I’m only achieving ~4000 TPS (transactions per second), which seems unusually low.
Resource utilization shows Envoy fully consuming its allocated 1000m CPU, while the Netty backend only uses ~350m CPU. This suggests the bottleneck isn’t the backend.

Jmeter logs: here

To confirm, I bypassed Envoy and sent requests directly from JMeter to the Netty backend, achieving ~30,000 TPS. This indicates the Netty backend is capable of handling much higher throughput.

Question
Is there anything in the Envoy configuration above that could explain the low TPS (~4000)? Could this be due to a misconfiguration or resource limitation specific to Envoy? Any suggestions for optimization would be appreciated!

The text was updated successfully, but these errors were encountered:

Tharsanan1 · 2025-02-21T10:57:41Z

Update on this. When I set the concurrency to 2, the TPS jumped to ~8200.
I tried increasing the concurrency to 3 and 4 but the i got 7000, 6000 for them. So the best i got is for concurrency = 2.

adisuissa · 2025-02-21T14:49:26Z

Please look at https://www.envoyproxy.io/docs/envoy/latest/faq/performance/how_to_benchmark_envoy as there may be some config knobs that can be set that will increase the throughput.
cc @KBaichoo who may have some suggestions.

wbpcode · 2025-02-22T05:51:48Z

First, please keep the concurrency have same value with the CPU cores that used by the Envoy. Second, ensure your backend has good enough performance. Third, ensure your benchmark client have good enough performance (I will suggest you to use wrk, nighthawk, fortio, hey, etc).

And also check the #19103. We do observed some performance degration. But it shouldn't be this bad.

Tharsanan1 added the triage Issue requires triage label Feb 21, 2025

adisuissa added area/perf and removed triage Issue requires triage labels Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Throughput (~4000 TPS) with Envoy Configuration in Kubernetes Setup #38513

Low Throughput (~4000 TPS) with Envoy Configuration in Kubernetes Setup #38513

Tharsanan1 commented Feb 21, 2025

Tharsanan1 commented Feb 21, 2025

adisuissa commented Feb 21, 2025

wbpcode commented Feb 22, 2025

Low Throughput (~4000 TPS) with Envoy Configuration in Kubernetes Setup #38513

Low Throughput (~4000 TPS) with Envoy Configuration in Kubernetes Setup #38513

Comments

Tharsanan1 commented Feb 21, 2025

Tharsanan1 commented Feb 21, 2025

adisuissa commented Feb 21, 2025

wbpcode commented Feb 22, 2025