Hacking things on llama.cpp:
- The brand new
llama-cli: ggml-org/llama.cpp#17824 - Serving multiple models in parallel via
llama-server: ggml-org/llama.cpp#17470 - Shipping Ministral 3, Devstral 2 (partnership with Mistral): ggml-org/llama.cpp#17644
- Shipping GPT-OSS (partnership with OpenAI and GGML team): ggml-org/llama.cpp#15091
- Bringing vision support to
llama-server: ggml-org/llama.cpp#12898 (on hackernews)
and more...
- (Big) refactoring vision support in llama.cpp, introducing
libmtmd: ggml-org/llama.cpp#12849 - Support various vision models: Pixtral, SmolVLM, etc (check out this viral demo)
- Gemma 3 Vision support (partnership with Google): ggml-org/llama.cpp#12343
- WASM speed improvement: ggml-org/llama.cpp#11453 (also checkout wllama and hackernews post)
- Refactor argument parser, for a better CLI UX: ggml-org/llama.cpp#9308
- Revamp llama.cpp Web UI (currently deprecated, but was the precursor for the modern version): ggml-org/llama.cpp#10175
- llama.cpp <> Hugging Face inference endpoint integration: read docs
- Hot-swapping LoRA adapters: ggml-org/llama.cpp#8332
- Control vector generator: ggml-org/llama.cpp#7514
- Initial chat template support: ggml-org/llama.cpp#5538






