Analyze Lean 4 projects: extract dependency graphs with verification status and spec relationships.
probe-lean walks the Lean environment of a built project and produces structured JSON describing every declaration, its dependencies (type and term), source locations, sorry-based verification status, and spec relationships. Output follows the Schema 2.0 envelope format; see docs/SCHEMA.md for the full specification.
- Lean 4 toolchain (
elan,lake) -- install via elan - The target project must build with
lake build - Toolchain match: probe-lean must be installed for the same Lean version as the target project (
.oleanfiles are version-specific). Check withcat <target-project>/lean-toolchain. - For large projects using Mathlib, run
lake exe cache getin the target project first to download pre-built.oleanfiles
No git clone required — the installer downloads a pre-built binary directly from GitHub releases.
Auto-detect the Lean version from a target project:
curl -sSfL https://raw.githubusercontent.com/Beneficial-AI-Foundation/probe-lean/main/tools/bash/install.sh \
| bash -s -- --from-project ./my-lean-projectOr specify a Lean version explicitly:
curl -sSfL https://raw.githubusercontent.com/Beneficial-AI-Foundation/probe-lean/main/tools/bash/install.sh \
| bash -s -- --lean-version v4.28.0-rc1Ensure ~/.local/bin is in your PATH:
export PATH="$PATH:$HOME/.local/bin"The installer:
- Detects the required Lean version (from
--from-project,--lean-version, or interactive menu) - Checks if
probe-lean-<version>is already installed (skips if so, unless--force) - Downloads a pre-built binary from GitHub releases
- Falls back to building from source if no pre-built binary is available
The binary is installed to ~/.local/bin/probe-lean-<version> with a symlink at ~/.local/bin/probe-lean. Interpreter .olean files are installed to ~/.local/lib/probe-lean-<version>/ (per-version, so multiple versions can coexist).
| Option | Description |
|---|---|
--from-project <path> |
Auto-detect Lean version from target project's lean-toolchain |
--lean-version <ver> |
Explicit Lean version (e.g., v4.28.0-rc1) |
--force |
Rebuild/reinstall even if already installed |
[VERSION] |
Positional version argument (same as --lean-version) |
If you have the repository cloned (e.g., for development), you can run the scripts directly:
git clone https://github.com/Beneficial-AI-Foundation/probe-lean
cd probe-lean
./tools/bash/install.sh --from-project ../my-lean-project
# or with Python:
uv run tools/python/install.py --from-project ../my-lean-projectdocker build --build-arg LEAN_VERSION=v4.28.0-rc1 -t probe-lean .
docker run --rm -v /path/to/project:/project probe-lean extract /projectThe Docker image downloads the pre-built binary during build — no repository clone needed.
For CI integration in downstream repos:
- uses: Beneficial-AI-Foundation/probe-lean/action@main
with:
project-path: .The action auto-detects the Lean version, builds probe-lean, and runs extraction. See action/action.yml for all options.
# Analyze a Lean project (builds with lake, extracts atoms, detects sorries)
probe-lean extract ./my-lean-project
# Skip sorry detection (faster, graph structure only)
probe-lean extract ./my-lean-project --skip-verify
# Multi-library project: build only specific libraries
probe-lean extract ./my-lean-project --library "Extraction,Spqr"Output lands in <target-project>/.verilib/probes/lean_<pkg>_<ver>.json by default.
Most Lean verification projects depend on Mathlib. To check whether a target project uses Mathlib, look in its lakefile:
lakefile.toml: a[[require]]block withname = "mathlib"lakefile.lean: a line likerequire mathlib from git ...
You can also check lake-manifest.json for a "mathlib" entry (this file is generated and lists all resolved dependencies).
If Mathlib is present, always download the pre-built cache first -- compiling Mathlib from source is extremely slow:
cd ../my-lean-project
lake exe cache get # download pre-built Mathlib .olean files (~5 min)
cd ../probe-lean
probe-lean extract ../my-lean-projectprobe-lean will warn you if it detects a missing Mathlib cache.
For detailed walkthroughs with real projects (curve25519-dalek-lean-verify, ArkLib, VCV-io), see docs/USAGE.md.
| Command | Description |
|---|---|
extract |
Analyze a Lean 4 project: extract atoms, detect sorries, compute specs |
probe-lean extract <PROJECT_PATH> [OPTIONS]| Option | Description |
|---|---|
-o, --output <PATH> |
Output file path (default: .verilib/probes/lean_<pkg>_<ver>.json) |
-m, --module <PREFIX> |
Filter to specific module prefix |
-l, --library <LIBS> |
Comma-separated list of library names to build (default: defaultTargets from lakefile.toml, falling back to all [[lean_lib]] entries) |
--skip-verify |
Skip sorry detection (graph structure only) |
--from-file <FILE> |
Use existing build output for sorry detection |
For the full command reference with examples, see docs/USAGE.md. For the complete JSON schema specification, see docs/SCHEMA.md.
Running probe-lean extract produces a JSON envelope. Each entry in data describes a declaration and its dependencies:
{
"schema": "probe-lean/extract",
"schema-version": "2.0",
"tool": { "name": "probe-lean", "version": "0.2.0", "command": "extract" },
"source": {
"repo": "https://github.com/org/project",
"commit": "abc123d",
"language": "lean",
"package": "MyProject",
"package-version": "0.1.0"
},
"timestamp": "2026-03-17T12:00:00Z",
"data": {
"probe:MyModule.helper": {
"display-name": "helper",
"kind": "def",
"language": "lean",
"dependencies": ["probe:MyModule.MyType"],
"type-dependencies": ["probe:MyModule.MyType"],
"term-dependencies": [],
"code-module": "MyModule",
"code-path": "MyModule.lean",
"code-text": { "lines-start": 5, "lines-end": 8 },
"is-hidden": false,
"is-extraction-artifact": false,
"is-ignored": false,
"is-relevant": true,
"rust-source": null,
"specs": ["probe:MyModule.helper_spec"],
"primary-spec": "probe:MyModule.helper_spec",
"verification-status": "verified"
}
}
}- Build -- reads
defaultTargetsfromlakefile.toml(falling back to all[[lean_lib]]entries) and runslake build <lib1> ...to produce.oleanfiles (automatically skipped when build cache is up-to-date; overridable via--library) - Atomize -- walks the Lean environment, extracts declarations with type and term dependencies
- Filter -- applies config-based flags (
is-hidden,is-extraction-artifact,is-ignored,is-relevant) - Verify -- parses sorry warnings from build output to determine verification status; axioms,
*External.leandeclarations, and auto-generated declarations without source location are marked"trusted"with atrusted-reason("axiom","external", or"auto-generated") for trust-base classification (skippable via--skip-verify) - Specs -- computes reverse theorem edges (
specs,primary-spec) for each atom - Schema 2.0 output -- wraps atoms in a metadata envelope with git commit, package info, and timestamps
lake build tests
.lake/build/bin/testsThe version is defined once in lakefile.toml and propagated everywhere via ProbeLean/Version.lean:
lakefile.toml → tools/gen-version.sh → ProbeLean/Version.lean
↓
Constants.toolVersion (JSON output)
CLI --version (Main.lean)
After bumping the version in lakefile.toml, run:
./tools/gen-version.shCI verifies the generated file stays in sync.
Apache-2.0