Embedded fill-in-the-middle code completion — local, offline, in-process.
A self-contained Rust crate that downloads a small quantized qwen2.5-coder model once, caches it, and runs completion inference in your process via candle. No daemon, no API key, no network after the first run.
fim-engine gives an editor Copilot-style inline completion without a cloud round-trip. "Fill in the middle" means it completes the gap at the cursor given the code before and the code after — exactly the shape an inline suggestion needs.
It is the completion backend shared by
mnml (ghost-text suggestions) and
tmnl (⌘I command completion), kept
as its own crate so candle's large dependency tree compiles once and a consuming
app's incremental rebuilds stay fast.
- Offline & private — inference runs in-process; nothing leaves the machine after the one-time model download.
- Pure Rust — no external daemon, no C/C++ build dependencies, no OpenSSL (rustls for the download).
- Managed model — downloads a quantized GGUF + tokenizer to a shared cache on first use, with a progress callback; instant on every run after.
- Metal acceleration — the default
metalfeature runs on the Apple GPU (~10× faster than CPU for the 1.5B model); build--no-default-featuresfor pure CPU elsewhere. - Two model sizes —
Qwen1_5B(fast, the inline default) or a largerQwen3B(smarter multi-line completion) viaModelChoice.
cargo add fim-engineLoading is a ~1 GB download on the first call, so do it on a worker thread:
use fim_engine::{FimEngine, ModelChoice};
// Blocking — run on a worker thread, never the UI thread.
let cache = fim_engine::default_cache_dir();
let mut engine = FimEngine::load(&cache, ModelChoice::Qwen1_5B, &|p| {
eprintln!("{}: {}/{:?}", p.label, p.received, p.total);
})?;
// Complete the gap between `prefix` and `suffix`.
let insert = engine.complete(
"fn add(a: i32, b: i32) -> i32 {\n ", // prefix — code before the cursor
"\n}", // suffix — code after the cursor
64, // max tokens
)?;
println!("suggestion: {insert}");
# Ok::<(), String>(())complete returns only the text to insert — never the surrounding code. It is
CPU/GPU-bound (~100–400 ms for the 1.5B model); call it off the UI thread.
[default_cache_dir] resolves a host-agnostic location —
$XDG_CACHE_HOME/fim-engine, else ~/.cache/fim-engine — so every consumer
shares one download instead of duplicating ~1 GB per app. [is_model_cached]
reports whether a given ModelChoice is already on disk.
| Feature | Default | Effect |
|---|---|---|
metal |
✅ | GPU inference via Apple Metal (macOS). Build --no-default-features on Linux / for CPU-only. |
fim-engine is one of a small family of terminal-native Rust tools:
| Project | What it is | |
|---|---|---|
| tmnl | A GPU-accelerated terminal | uses fim-engine for ⌘I completion |
| mnml | A terminal IDE | uses fim-engine for ghost-text |
| mixr | A terminal DJ app | — |
| tmnl-protocol | The binary wire protocol | — |
| fim-engine | Embedded code completion | ← you are here |
Contributions are welcome — see CONTRIBUTING.md. The roadmap
lives in .local/PLAN.md and the release history in
CHANGELOG.md.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
The model weights are downloaded at runtime from the Hugging Face CDN and are not part of this crate; the qwen2.5-coder model is licensed separately by its authors.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.