Automated benchmark tracking and performance monitoring for Lance
This repository provides continuous performance monitoring for Lance by:
- Running benchmarks automatically on new commits
- Tracking performance metrics over time in LanceDB
- Supporting both Rust (Criterion) and Python (pytest-benchmark) benchmarks
- Providing historical backfill capabilities
- Schedule Benchmarks (
schedule-benchmarks.yml) - Runs every 6 hours- Fetches latest commit from lance-format/lance
- Checks if results exist in the database
- Triggers benchmark runs for new commits
-
Run Rust Benchmarks (
run-rust-benchmarks.yml) - Reusable workflow- Runs Criterion benchmarks for a specific Rust crate
- Currently benchmarks: lance-io, lance-linalg, lance-encoding
- Publishes results using
publish_criterion.py
-
Run Python Benchmarks (
run-python-benchmarks.yml) - Reusable workflow- Builds Lance Python package with maturin
- Generates test datasets
- Runs pytest benchmarks
- Publishes results using
publish_pytest.py
Results are stored in a LanceDB database with the following schema:
- TestBed: System information (CPU, memory, OS)
- DutBuild: Device Under Test (name, version, commit timestamp)
- Result: Benchmark results with statistics and raw values
- SummaryValues: Min, max, mean, median, quartiles, std dev
- Throughput: Optional throughput metrics
Database location: s3://lance-bench-results (or ~/.lance-bench locally)
publish_criterion.py- Parse and publish Rust Criterion benchmark resultspublish_pytest.py- Parse and publish Python pytest-benchmark resultspublish_util.py- Shared utilities (TestBed creation)
schedule_benchmarks.py- Check for new commits and trigger benchmarksbackfill_benchmarks.py- Backfill results for historical commits
- Python 3.12+
- uv (Python package manager)
- AWS credentials (for S3 access to results database)
# Install dependencies
uv sync
# Set environment variables
export LANCE_BENCH_URI="s3://lance-bench-results"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"Required secrets for CI workflows:
LANCE_BENCH_DB_URI- Database URI (S3 or local path)BENCH_S3_USER_ACCESS_KEY- AWS access keyBENCH_S3_USER_SECRET_KEY- AWS secret keySCHEDULER_GITHUB_TOKEN- GitHub PAT withactions:writeandcontents:readpermissions
Note: The default
GITHUB_TOKENcannot trigger other workflows, soSCHEDULER_GITHUB_TOKENis required for the scheduler.
# Run Criterion benchmarks with JSON output
cd lance/rust/lance-io
cargo criterion --benches --message-format=json > criterion-output.json
# Publish results
uv run python scripts/publish_criterion.py \
criterion-output.json \
--testbed-name "my-machine" \
--dut-name "lance" \
--dut-version "0.15.0+abc1234" \
--dut-timestamp 1702345678# Run pytest benchmarks with JSON output
pytest benchmarks/ --benchmark-json=pytest-output.json --benchmark-only
# Publish results
uv run python scripts/publish_pytest.py \
pytest-output.json \
--testbed-name "my-machine" \
--dut-name "lance" \
--dut-version "0.15.0+abc1234" \
--dut-timestamp 1702345678Note: Both
--dut-versionand--dut-timestampare required. For pytest, these can be auto-extracted fromcommit_infoin the JSON if available.
# Set required environment variables
export GITHUB_TOKEN="your-github-token"
export LANCE_BENCH_URI="s3://lance-bench-results"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
# Run backfill script
uv run python scripts/backfill_benchmarks.pyConfiguration in backfill_benchmarks.py:
MAX_COMMITS- Number of commits to process (default: 10)COMMIT_INTERVAL- Process every Nth commit (default: 1)
Trigger workflows manually via GitHub Actions UI:
- Go to Actions tab
- Select workflow (e.g., "Schedule Benchmarks for New Lance Commits")
- Click "Run workflow"
lance-bench/
├── .github/workflows/ # GitHub Actions workflows
│ ├── schedule-benchmarks.yml # Automated scheduler (runs 4x daily)
│ ├── run-benchmarks.yml # Orchestrator for Rust benchmarks
│ ├── run-rust-benchmarks.yml # Reusable Rust benchmark workflow
│ └── run-python-benchmarks.yml # Reusable Python benchmark workflow
├── scripts/ # Python scripts
│ ├── publish_criterion.py # Publish Rust benchmark results
│ ├── publish_pytest.py # Publish Python benchmark results
│ ├── publish_util.py # Shared publishing utilities
│ ├── schedule_benchmarks.py # Scheduler script
│ └── backfill_benchmarks.py # Backfill historical results
├── packages/
│ └── lance_bench_db/ # Database package
│ ├── models.py # Data models (Result, TestBed, etc.)
│ └── dataset.py # Database connection utilities
├── pyproject.toml # Python dependencies
└── uv.lock # Locked dependencies
- Add Criterion benchmarks to the Lance repository
- Update
run-benchmarks.ymlto include the new crate path
- Add pytest benchmarks to
lance/python/python/ci_benchmarks/benchmarks/ - Python benchmarks are automatically discovered by pytest
Results are stored with this structure:
Result(
id: str, # UUID
dut: DutBuild, # Device info (name, version, timestamp)
test_bed: TestBed, # System info (CPU, memory, OS)
benchmark_name: str, # Full benchmark name
values: list[float], # Raw measurements (nanoseconds)
summary: SummaryValues, # Statistical summary
units: str, # "nanoseconds"
throughput: Throughput?, # Optional throughput info
metadata: str, # JSON string of full benchmark data
timestamp: int # Unix timestamp when result was created
)"DUT version could not be determined"
- Ensure
--dut-versionis provided orcommit_info.idexists in pytest JSON
"Database connection failed"
- Check AWS credentials are set correctly
- Verify
LANCE_BENCH_URIis accessible - For S3: Ensure IAM permissions include
s3:GetObject,s3:PutObject
Scheduler workflow not triggering benchmarks
- Verify
SCHEDULER_GITHUB_TOKENsecret is set with correct permissions - Check workflow logs for API rate limits or authentication errors
Benchmarks failing to build
- Rust benchmarks: Ensure protobuf-compiler is installed
- Python benchmarks: Verify maturin build succeeds with Rust toolchain
- Fork the repository
- Create a feature branch
- Make your changes
- Test with local database (
~/.lance-bench) - Submit a pull request
Apache License 2.0 - See LICENSE file for details