GitHub - PClmnt/litmus: Application built with OpenTUI / AI SDK / Openrouter allowing you to quickly compare and eval LLMs.

Litmus

A terminal-based LLM benchmarking and evaluation tool built with OpenTUI. Compare multiple language models side-by-side, and analyze results with evals.

Features

Model Comparison

Run identical prompts across multiple LLMs simultaneously
Streams real time responses into. You can then view more details about that specific response including tokens etc.
Supports basically any model via Openrouter - who knows what will work and what won't.
Attach images to prompts (see Image Attachments below)
- Press Ctrl+V in the prompt to attach

Evals using LLM-as-Judge

Run automated evaluations using dedicated judge models
Multi-criteria scoring (accuracy, relevance, reasoning, tool use)
Pairwise comparisons and ranking
Detailed reasoning and score breakdowns

Persistent Storage

SQLite database for all benchmark runs and results
Searchable history of past runs
Track performance over time

Installation

Requires Bun

bun add -g litmus-ai

litmus

Environment Setup

Create a .env file in your working directory or export the variables:

export OPENROUTER_API_KEY=your_key_here

Quick Start

litmus

Basic Workflow

Select Models - Choose from available models in the dropdown
Enter Prompt - Type your test prompt or select from templates
Enable Tools - Toggle tools to test function calling (optional)
Generate - Press Enter or g to run the benchmark
Evaluate - Press e in the Evaluation view to run LLM-as-judge scoring

Configuration

Environment Variables

OPENROUTER_API_KEY=your_key_here  # Required - get from https://openrouter.ai
EXA_API_KEY=your_key_here         # Optional - for web search tool (https://exa.ai)

Evaluation Criteria

Litmus evaluates models on:

Accuracy - Correctness of information
Completeness - Thoroughness of response
Relevance - How well it addresses the prompt
Clarity - Communication quality
Tool Use - Proper function calling (when applicable)
Overall Score - Weighted combination

Keyboard Shortcuts

Global

Ctrl+K - Toggle console
Tab - Cycle focus
Escape - Back/Focus nav

Benchmark View

g - Generate responses
Enter - Add model (when focused)
Space - Toggle tool
d - Remove last model
Ctrl+V - Paste image from clipboard
i - Open image input dialog
x - Remove last attached image
c - Clear all attached images
/ - Search/add models

Evaluation View

e - Run evaluation
Left/Right - Select judge model
q - Back to history

History View

/ - Focus search
Enter - Select run
Delete - Remove run

Development

# Install dependencies
bun install

# Run development mode
bun dev

# Build for production
bun build

# Run tests
bun test

License

MIT License - see LICENSE file for details.

🐛 Issue Tracker

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Model Comparison

Evals using LLM-as-Judge

Persistent Storage

Installation

Environment Setup

Quick Start

Basic Workflow

Configuration

Environment Variables

Evaluation Criteria

Keyboard Shortcuts

Global

Benchmark View

Evaluation View

History View

Development

License

About

Uh oh!

Releases

Packages

Languages

PClmnt/litmus

Folders and files

Latest commit

History

Repository files navigation

Features

Model Comparison

Evals using LLM-as-Judge

Persistent Storage

Installation

Environment Setup

Quick Start

Basic Workflow

Configuration

Environment Variables

Evaluation Criteria

Keyboard Shortcuts

Global

Benchmark View

Evaluation View

History View

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages