A hands-on benchmark comparing ctransformers vs llama.cpp for local inference of quantized GGUF models (mistral, zephyr) on an M1 MacBook Pro (8GB RAM).
| Library | Speed | Simplicity | Best Use Case |
|---|---|---|---|
| ctransformers | ~15 seconds | ✅ Easy | Rapid prototyping |
| llama.cpp | ~10 seconds | RAG pipelines, speed-sensitive apps |
Follow the author on LinkedIn
git clone https://github.com/santhoshnumberone/llm-benchmarks-mac.git
cd llm-benchmarks-macpython3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Download the .gguf files from Hugging Face, don't forget to change path inside code:
ctransformers
python benchmark_ctransformers.py
llama.cpp via llama-cpp-python
python benchmark_llamacpp.py
| Model | Library | Time Taken | Output (Shortened) |
|---|---|---|---|
| Mistral | ctransformers | 15.14s | "You may sublicense if terms are met..." |
| Zephyr | llama-cpp-python | 12.63s | "It depends on the license..." |
This repo is ideal for:
- AI engineers testing local LLM inference
- Prototyping RAG apps with speed constraints
- Comparing backend performance tradeoffs
👤 Santhosh — Builder & AI Engineer