LiteLLM Load Tests

This repository provides FastAPI endpoints and Locust scenarios for automated load testing of LiteLLM. These tests help track latency regressions across key endpoints:

chat/completions
v1/responses
embeddings

Use these tests to detect performance regressions or validate performance improvements. See docs/CONFIG.md for all available overrides and default settings.

Requirements

Python 3.11+
Poetry 1.6+

poetry install
poetry shell  # optional

Required Environment

Set these variables in the shell before launching the server or scripts:

export LOCUST_API_KEY=...
export LOAD_TEST_BEARER_TOKEN=...
export LOCUST_HOST=...                  # optional global host override
# Or set per-test hosts when LOCUST_HOST is not provided:
export LOCUST_CHAT_HOST=...
export LOCUST_RESPONSES_HOST=...
export LOCUST_EMBEDDINGS_HOST=...

You can also override the host per request by including a host value inside the payload for chat, responses, or embeddings. Request-level overrides take precedence over LOCUST_HOST and the scenario-specific environment variables.

Other parameters (test duration, spawn rate, host, etc.) are optional. Refer to docs/CONFIG.md for details.

API Usage

Start the orchestrator:

poetry run uvicorn server:app --reload

Run all tests:

curl -X POST http://localhost:8000/run-load-tests \
  -H "Authorization: Bearer ${LOAD_TEST_BEARER_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"chat": {"duration_seconds": 120}}'

Run a single test with custom overrides:

curl -X POST http://localhost:8000/run-load-tests/chat \
  -H "Authorization: Bearer ${LOAD_TEST_BEARER_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"user_count": 5, "spawn_rate": 2}'

Responses include request totals, failure counts, latency percentiles, and LiteLLM-specific overhead statistics, making it easy to analyze performance trends.

Direct Locust Runs

Each script in load_tests/ can also be invoked directly without the API:

poetry run python load_tests/chat-completions_load-test.py

Output metrics match those returned by the FastAPI orchestrator, allowing flexible testing workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
helpers		helpers
load_tests		load_tests
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiteLLM Load Tests

Requirements

Required Environment

API Usage

Direct Locust Runs

About

Uh oh!

Releases

Packages

Languages

BerriAI/Automated_Perf_Tests

Folders and files

Latest commit

History

Repository files navigation

LiteLLM Load Tests

Requirements

Required Environment

API Usage

Direct Locust Runs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages