Litmus
A terminal-based LLM benchmarking and evaluation tool built with OpenTUI. Compare multiple language models side-by-side, and analyze results with evals.
- Run identical prompts across multiple LLMs simultaneously
- Streams real time responses into. You can then view more details about that specific response including tokens etc.
- Supports basically any model via Openrouter - who knows what will work and what won't.
- Attach images to prompts (see Image Attachments below)
- Press
Ctrl+Vin the prompt to attach
- Press
- Run automated evaluations using dedicated judge models
- Multi-criteria scoring (accuracy, relevance, reasoning, tool use)
- Pairwise comparisons and ranking
- Detailed reasoning and score breakdowns
- SQLite database for all benchmark runs and results
- Searchable history of past runs
- Track performance over time
Requires Bun
bun add -g litmus-ailitmusCreate a .env file in your working directory or export the variables:
export OPENROUTER_API_KEY=your_key_herelitmus- Select Models - Choose from available models in the dropdown
- Enter Prompt - Type your test prompt or select from templates
- Enable Tools - Toggle tools to test function calling (optional)
- Generate - Press
Enterorgto run the benchmark - Evaluate - Press
ein the Evaluation view to run LLM-as-judge scoring
OPENROUTER_API_KEY=your_key_here # Required - get from https://openrouter.ai
EXA_API_KEY=your_key_here # Optional - for web search tool (https://exa.ai)Litmus evaluates models on:
- Accuracy - Correctness of information
- Completeness - Thoroughness of response
- Relevance - How well it addresses the prompt
- Clarity - Communication quality
- Tool Use - Proper function calling (when applicable)
- Overall Score - Weighted combination
- Ctrl+K - Toggle console
- Tab - Cycle focus
- Escape - Back/Focus nav
- g - Generate responses
- Enter - Add model (when focused)
- Space - Toggle tool
- d - Remove last model
- Ctrl+V - Paste image from clipboard
- i - Open image input dialog
- x - Remove last attached image
- c - Clear all attached images
- / - Search/add models
- e - Run evaluation
- Left/Right - Select judge model
- q - Back to history
- / - Focus search
- Enter - Select run
- Delete - Remove run
# Install dependencies
bun install
# Run development mode
bun dev
# Build for production
bun build
# Run tests
bun testMIT License - see LICENSE file for details.


