Skip to content
/ litmus Public

Application built with OpenTUI / AI SDK / Openrouter allowing you to quickly compare and eval LLMs.

Notifications You must be signed in to change notification settings

PClmnt/litmus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Litmus

A terminal-based LLM benchmarking and evaluation tool built with OpenTUI. Compare multiple language models side-by-side, and analyze results with evals.

Features

Model Comparison

  • Run identical prompts across multiple LLMs simultaneously
  • Streams real time responses into. You can then view more details about that specific response including tokens etc.
  • Supports basically any model via Openrouter - who knows what will work and what won't.
  • Attach images to prompts (see Image Attachments below)
    • Press Ctrl+V in the prompt to attach

Evals using LLM-as-Judge

  • Run automated evaluations using dedicated judge models
  • Multi-criteria scoring (accuracy, relevance, reasoning, tool use)
  • Pairwise comparisons and ranking
  • Detailed reasoning and score breakdowns

Persistent Storage

  • SQLite database for all benchmark runs and results
  • Searchable history of past runs
  • Track performance over time

Installation

Requires Bun

bun add -g litmus-ai
litmus

Environment Setup

Create a .env file in your working directory or export the variables:

export OPENROUTER_API_KEY=your_key_here

Quick Start

litmus

Basic Workflow

  1. Select Models - Choose from available models in the dropdown
  2. Enter Prompt - Type your test prompt or select from templates
  3. Enable Tools - Toggle tools to test function calling (optional)
  4. Generate - Press Enter or g to run the benchmark
  5. Evaluate - Press e in the Evaluation view to run LLM-as-judge scoring

Configuration

Environment Variables

OPENROUTER_API_KEY=your_key_here  # Required - get from https://openrouter.ai
EXA_API_KEY=your_key_here         # Optional - for web search tool (https://exa.ai)

Evaluation Criteria

Litmus evaluates models on:

  • Accuracy - Correctness of information
  • Completeness - Thoroughness of response
  • Relevance - How well it addresses the prompt
  • Clarity - Communication quality
  • Tool Use - Proper function calling (when applicable)
  • Overall Score - Weighted combination

Keyboard Shortcuts

Global

  • Ctrl+K - Toggle console
  • Tab - Cycle focus
  • Escape - Back/Focus nav

Benchmark View

  • g - Generate responses
  • Enter - Add model (when focused)
  • Space - Toggle tool
  • d - Remove last model
  • Ctrl+V - Paste image from clipboard
  • i - Open image input dialog
  • x - Remove last attached image
  • c - Clear all attached images
  • / - Search/add models

Evaluation View

  • e - Run evaluation
  • Left/Right - Select judge model
  • q - Back to history

History View

  • / - Focus search
  • Enter - Select run
  • Delete - Remove run

Development

# Install dependencies
bun install

# Run development mode
bun dev

# Build for production
bun build

# Run tests
bun test

License

MIT License - see LICENSE file for details.

About

Application built with OpenTUI / AI SDK / Openrouter allowing you to quickly compare and eval LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published