Add LFM2.5 (Liquid Foundation Model 2.5) support#3400
Conversation
Add model implementation and example for LiquidAI's LFM2.5 hybrid architecture that combines attention and short convolution layers. Supports LFM2.5-1.2B and LFM2.5-1.2B-Thinking variants.
|
Any progress on this? |
EricLBuehler
left a comment
There was a problem hiding this comment.
Hey @Jacqkues! Thanks for the PR.
This looks great, and I added a few small comments.
| }) | ||
| } | ||
|
|
||
| fn mask(&mut self, t: usize) -> Result<Tensor> { |
There was a problem hiding this comment.
This caches masks only by t and builds a square (seq_len, seq_len) mask. But after a prefix cache, attention has shape (batch, heads, seq_len, index_pos + seq_len).
This will either fail to broadcast or apply the wrong mask for chunked continuation/prefix-cache use. As such, this should mirror existing Llama/quantized LFM2 behavior where we key by (seq_len, kv_len) and use crate::utils::build_causal_mask(seq_len, index_pos, device), with cache.mask(seq_len, index_pos) at the call site.
| impl Which { | ||
| fn model_id(&self) -> &'static str { | ||
| match self { | ||
| Which::Lfm2_5_1_2B => "LiquidAI/LFM2.5-1.2B", |
There was a problem hiding this comment.
I dont see this model on HF Hub, but I did see LiquidAI/LFM2.5-1.2B-Instruct. Perhaps this is a typo?
There was a problem hiding this comment.
Ah yes it was a typo !
…, device) as quantized_lfm2.rs
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
candle-transformers/src/models/lfm2.rs: model implementation for LiquidAI's LFM2.5 hybrid architecture (attention + short convolution layers)candle-examples/examples/lfm2/: example for text generation with LFM2.5-1.2B and LFM2.5-1.2B-Thinking variantslfm2module incandle-transformers/src/models/mod.rsModel details
LFM2.5 is a hybrid architecture from LiquidAI that interleaves full attention layers with short convolution layers. Key features:
Test plan
cargo build --example lfm2 --releaseLFM2.5-1.2B-Thinkingon CPU (Apple Silicon) — generates coherent text at ~5.9 tokens/sSupported models