Skip to content

Linkerd Benchmark Setup

Alejandro Pedraza edited this page May 27, 2021 · 3 revisions

We make use of Kinvolk's service-mesh-benchmark framework to perform our benchmarks, which compare Linkerd's latency and resource consumption against Istio and a baseline (no-service-mesh) case. An earlier version of that framework was used in 2019 to generate the results described in this article. The automation has changed since, but please take a look at that article, in particular to learn more about how coordinated omission is taken into account, which is critical in how the results are calculated.

Lokomotive and Terraform

The service-mesh-benchmark project's README contains most of the information required to set up a testing cluster.

Note that we used a fork of the lokomotive project, updated with the latest versions of Linkerd (2.10.2) and Istio (1.10.0).

Also, this required using a Terraform version from the 0.13 branch. You can find it here.

Orchestrator Configuration

The benchmark runs on servers provisioned by Equinix Metal, and also uses S3, DynamoDB and Route 53 to store state and identify the cluster. Those accounts info needs to be entered into configs/lokocfg.vars, while their tokens/credentials have to be provided as environment variables, as described in the README.

Once you do that, the selected datacenter and server types have to be specified in configs/equinix-metal-cluster.lokocfg. After many experiments, the most consistent results were provided by selecting the dfw2 datacenter using c2.medium.x86 for the controller and s3.xlarge.x86 for the load generator and the workers. These values are to be set in the entries facility, controller_type and node_type respectively.

Running the benchmarks

Running scripts/run_benchmarks.sh takes care of running the benchmark runs. Just make sure that you set in run_benchmarks() main loop the series of RPS and the number of repetitions you desire.

We also made a small change right after linkerd's installation in that function, adding linkerd check to make sure the control plane was ready before installing emojivoto.

Results

Read Upload Grafana dashboard for instructions on how to setup the Grafana charts.

Result data is taken from the wrk2 benchmark cockpit chart for each run as follows:

Latency distribution

Taken from the chart "Latency percentile histogram (milliseconds)".

Application proxy memory/CPU usage

Taken from the charts "Sidecar Memory usage - applications (max. across all sidecars)" and "Sidecar CPU usage - applications (max. across all sidecars)", which show the maximum memory/CPU used across all the sidecar proxy containers in the emojivoto namespaces for the duration of the run. We report on the maximum values attained for this duration.

Control plane memory/CPU usage

Taken from the charts "Memory usage - Service mesh control plane" and "CPU utilisation - Service mesh control plane", which show the sum of the memory/CPU used across all the non-sidecar containers in the control plane namespace for the duration of the run. We report on the maximum values attained for this duration.