Pre-compiled, optimized BitNet binaries for all major CPU architectures and GPU configurations.
This repository contains optimized BitNet binaries for Windows, Linux, and macOS, providing:
- CPU-optimized builds for specific architectures (10-15% performance gain)
- GPU-accelerated builds with CUDA and Vulkan support (Windows/Linux)
- Complete dependency bundling - each variant is self-contained
- Cross-platform support - same architecture variants across all platforms
- Direct library access - use DLLs/SOs directly in your applications
Built from: microsoft/BitNet with optimizations from ocentra/BitNet
Why use DLLs/SOs directly instead of executables?
👉 See the complete Developer Guide for working examples!
- ⚡ 2-3x faster - No subprocess overhead, direct function calls
- 🎛️ Full control - Build custom servers (FastAPI/Actix), manage models, conversation history
- 🔧 Multi-language - Python (
llama-cpp-python) + Rust (llama-cpp-rs) examples included - 🚀 Production-ready - Load once, serve thousands of requests (like TabAgent does)
- 📦 Platform-optimized - Auto-detect CPU/GPU, load optimal variant per platform
- 🎯 Smaller apps - Bundle only 1 variant = 150 MB installer vs 2.5 GB universal
- ✅ Complete executable replication - Build
llama-server,llama-cli,llama-embedding,llama-benchfrom scratch - ✅ Hardware detection - Auto-select Zen2/Zen3/Alder Lake/M1 variants at runtime
- ✅ TabAgent strategy - Build optimized installers (180 MB) for each CPU family
- ✅ Real working code - 600+ lines of Python/Rust examples, ready to copy-paste
- ✅ API reference - All llama.cpp functions with usage examples
Example: Custom FastAPI server with model keep-alive, metrics, multi-model support
from llama_cpp import Llama
import os
# Point to your BitNet variant
os.add_dll_directory("BitnetRelease/cpu/windows/bitnet-amd-zen2")
# Load model directly (30% faster than zen3 on zen2 CPU!)
llm = Llama(model_path="model.gguf", n_ctx=4096, n_threads=8)
# Generate (same API as llama-server, but YOU control everything!)
output = llm("Hello", max_tokens=100, temperature=0.7)👉 Read the full Developer Guide →
| Platform | CPU Variants | GPU Variants | Status |
|---|---|---|---|
| Windows | 13 (1 standard + 12 BitNet) | 3 (CUDA+Vulkan, OpenCL, Python) | ✅ Available |
| Linux | 12 (1 standard + 11 BitNet) | 3 (CUDA+Vulkan, OpenCL, Python) | ✅ Available |
| macOS | 3 (ARM TL1, Intel TL2, standard) | 1 (Metal GPU) | ✅ Available (via GitHub Actions) |
Note: macOS builds are different from Windows/Linux - optimized for Apple Silicon (M1/M2/M3/M4) and Intel Macs with Metal GPU support.
⚠️ Intel Mac Users: Thebitnet-intelvariant is not included in GitHub Actions builds (ARM runners can't cross-compile to x86). GitHub Actions builds onlybitnet-armandstandard. For Intel Mac builds, you'll need to runbuild-all-macos.shlocally on an Intel Mac... which we'll provide when we get our hands on one! 😅 (Intel Macs are legacy hardware discontinued in 2020 - if you have one, thestandardbuild works fine!)
| Variant | Target | Description | Platforms |
|---|---|---|---|
standard |
Any CPU | llama.cpp baseline, any model | Windows ✅ / Linux ✅ / macOS ✅ |
| Variant | Target | CPU Architectures | Windows | Linux | Compiler Req. (Linux) |
|---|---|---|---|---|---|
bitnet-portable |
Any modern CPU | AVX2 baseline | ✅ | ✅ | Clang 14+ |
| AMD Ryzen | |||||
bitnet-amd-zen1 |
Ryzen 1000/2000 | Zen 1 (znver1) | ✅ | ✅ | Clang 14+ |
bitnet-amd-zen2 |
Ryzen 3000 | Zen 2 (znver2) | ✅ | ✅ | Clang 14+ |
bitnet-amd-zen3 |
Ryzen 5000 | Zen 3 (znver3) | ✅ | ✅ | Clang 14+ |
bitnet-amd-zen4 |
Ryzen 7000 | Zen 4 (znver4) | ✅ | ✅ | Clang 17+ |
bitnet-amd-zen5 |
Ryzen 9000 | Zen 5 (znver5) | ✅ | ⏳ | Clang 18+ (not yet available) |
| Intel Core | |||||
bitnet-intel-haswell |
4th gen | Haswell | ✅ | ✅ | Clang 14+ |
bitnet-intel-broadwell |
5th gen | Broadwell | ✅ | ✅ | Clang 14+ |
bitnet-intel-skylake |
6th-9th gen | Skylake/Kaby/Coffee Lake | ✅ | ✅ | Clang 14+ |
bitnet-intel-icelake |
10th gen | Ice Lake | ✅ | ✅ | Clang 14+ |
bitnet-intel-rocketlake |
11th gen | Rocket Lake | ✅ | ✅ | Clang 14+ |
bitnet-intel-alderlake |
12th-14th gen | Alder/Raptor Lake | ✅ | ✅ | Clang 14+ |
Linux Note: Zen 4 requires Clang 17+. Zen 5 requires Clang 18+ (not yet in stable Ubuntu 22.04 repos).
| Variant | Target | Description | Hardware |
|---|---|---|---|
bitnet-arm |
Apple Silicon | ARM TL1 kernels | M1/M2/M3/M4 Macs ✅ |
bitnet-intel |
Intel Macs | x86 TL2 kernels | Intel Macs (2020 and older) 🚧 |
standard |
Universal | No BitNet, CPU only | All Macs ✅ |
🚧
bitnet-intelStatus: Not available in automated builds (GitHub Actions uses ARM runners). Will provide when we get our hands on an Intel Mac... ooops! 😅 For now, Intel Mac users can use thestandardbuild or build locally withbuild-all-macos.sh.
| Variant | Backend | Description | Platforms |
|---|---|---|---|
standard-cuda-vulkan |
CUDA + Vulkan | NVIDIA GPU (llama.cpp, any model) | Windows ✅ / Linux ✅ |
standard-opencl |
OpenCL | Universal GPU (NVIDIA/AMD/Intel, any model) | Windows ✅ / Linux ✅ |
bitnet-python-cuda |
Python + CUDA | BitNet Python kernels (BitNet models only) | Windows ✅ / Linux ✅ |
standard-metal |
Metal | Apple GPU acceleration (any model) | macOS ✅ (M1/M2/M3 + Intel) |
Note: macOS does not support CUDA/Vulkan - use Metal GPU for best performance on all Macs (M1/M2/M3 + Intel Iris/AMD).
BitnetRelease/
├── cpu/
│ ├── windows/ ✅ Available (13 variants)
│ │ ├── standard/ [58 files, ~150 MB]
│ │ ├── bitnet-portable/ [41 files, ~100 MB]
│ │ ├── bitnet-amd-zen1/ [41 files, ~100 MB]
│ │ ├── bitnet-amd-zen2/ [41 files, ~100 MB]
│ │ ├── bitnet-amd-zen3/ [41 files, ~100 MB]
│ │ ├── bitnet-amd-zen4/ [41 files, ~100 MB]
│ │ ├── bitnet-amd-zen5/ [41 files, ~100 MB]
│ │ ├── bitnet-intel-haswell/ [41 files, ~100 MB]
│ │ ├── bitnet-intel-broadwell/ [41 files, ~100 MB]
│ │ ├── bitnet-intel-skylake/ [41 files, ~100 MB]
│ │ ├── bitnet-intel-icelake/ [41 files, ~100 MB]
│ │ ├── bitnet-intel-rocketlake/ [41 files, ~100 MB]
│ │ └── bitnet-intel-alderlake/ [41 files, ~100 MB]
│ │
│ ├── linux/ ✅ Available (12 variants)
│ │ ├── standard/ [~60 files]
│ │ ├── bitnet-portable/ [~40 files]
│ │ ├── bitnet-amd-zen1/ [~40 files]
│ │ ├── bitnet-amd-zen2/ [~40 files]
│ │ ├── bitnet-amd-zen3/ [~40 files]
│ │ ├── bitnet-amd-zen4/ [~40 files] (Clang 17+)
│ │ ├── bitnet-intel-haswell/ [~40 files]
│ │ ├── bitnet-intel-broadwell/ [~40 files]
│ │ ├── bitnet-intel-skylake/ [~40 files]
│ │ ├── bitnet-intel-icelake/ [~40 files]
│ │ ├── bitnet-intel-rocketlake/ [~40 files]
│ │ ├── bitnet-intel-alderlake/ [~40 files]
│ │ └── VERIFICATION.md (Build report)
│ │
│ └── macos/ ✅ Available (2 variants via GitHub Actions)
│ ├── bitnet-arm/ [M1/M2/M3/M4, ARM TL1] ✅
│ ├── bitnet-intel/ [Intel Macs, x86 TL2] 🚧 Not in downloads
│ ├── standard/ [Universal CPU] ✅
│ └── VERIFICATION.md (Build report)
│
└── gpu/
├── windows/ ✅ Available (3 variants)
│ ├── standard-cuda-vulkan/ [59 files, ~200 MB]
│ ├── standard-opencl/ [58 files, ~150 MB]
│ ├── bitnet-python-cuda/ [16 files, ~500 MB]
│ │ ├── libbitnet.dll (CUDA kernels)
│ │ ├── cublas64_12.dll (CUDA runtime)
│ │ ├── cublasLt64_12.dll (CUDA runtime)
│ │ ├── cudart64_12.dll (CUDA runtime)
│ │ ├── *.py (Python scripts)
│ │ └── tokenizer.model (2.1 MB)
│ └── VERIFICATION.md (Build report)
│
├── linux/ ✅ Available (3 variants)
│ ├── standard-cuda-vulkan/ [~60 files, CUDA + Vulkan]
│ ├── standard-opencl/ [~60 files, OpenCL]
│ ├── bitnet-python-cuda/ [~15 files, Python + CUDA]
│ │ ├── libbitnet.so (CUDA kernels)
│ │ ├── *.py (Python scripts)
│ │ └── tokenizer.model (2.1 MB)
│ └── VERIFICATION.md (Build report)
│
└── macos/ ✅ Available (1 variant)
├── standard-metal/ [Metal GPU for ALL Macs]
│ ├── llama-server (Metal-accelerated)
│ ├── *.dylib (Shared libraries)
│ └── *.metallib (Metal shaders)
└── README.md (Metal GPU guide)
Total Size: ~8-10 GB (all platforms, stored efficiently with Git LFS)
Build Variants: 35 total (16 Windows + 15 Linux + 4 macOS)
💡 For Developers: These examples use pre-built executables. Want 2-3x faster performance with direct library access? See Developer Guide for Python/Rust examples, custom servers, and optimized installers!
Detect your platform and CPU:
# Windows: Check CPU model
Get-CimInstance -ClassName Win32_Processor | Select-Object Name# Linux: Check CPU model
lscpu | grep "Model name"
# macOS: Check CPU model
sysctl -n machdep.cpu.brand_stringMatch to variant:
- AMD Ryzen 3900X →
bitnet-amd-zen2 - AMD Ryzen 5900X →
bitnet-amd-zen3 - Intel i9-12900K →
bitnet-intel-alderlake - Don't know? →
bitnet-portable(works on any CPU)
# Clone this repo
git clone https://github.com/ocentra/BitnetRelease.git
cd BitnetRelease
# Or download specific variant only
# Example: Download zen2 variant
# (Use GitHub web interface or sparse checkout)CPU Inference (Windows):
cd cpu\windows\bitnet-amd-zen2
.\llama-server.exe --model "path\to\model.gguf" --port 8080CPU Inference (Linux):
cd cpu/linux/bitnet-amd-zen2
./llama-server --model "path/to/model.gguf" --port 8080CPU Inference (macOS - Apple Silicon):
cd cpu/macos/bitnet-arm
./llama-server --model "path/to/model.gguf" --port 8080CPU Inference (macOS - Intel):
cd cpu/macos/bitnet-intel
./llama-server --model "path/to/model.gguf" --port 8080GPU Inference - Python (Windows):
cd gpu\windows\bitnet-python-cuda
python generate.py --model "path\to\model"GPU Inference - llama.cpp CUDA (Windows):
cd gpu\windows\standard-cuda-vulkan
.\llama-server.exe --model "path\to\model.gguf" --gpu-layers 32 --port 8080GPU Inference - llama.cpp CUDA (Linux):
cd gpu/linux/standard-cuda-vulkan
./llama-server --model "path/to/model.gguf" --gpu-layers 32 --port 8080GPU Inference - Metal (macOS - ALL Macs):
cd gpu/macos/standard-metal
./llama-server --model "path/to/model.gguf" -ngl 99 --port 8080
# -ngl 99 = offload all layers to Metal GPU (M1/M2/M3 + Intel)Compiler:
- ClangCL (Clang with MSVC compatibility)
- Visual Studio 2022 toolchain
Optimization Flags:
- CPU-specific:
-march=<architecture> - Exception handling:
/EHsc - Release mode:
/O2
Dependencies:
- llama.cpp (submodule)
- CUDA Toolkit 12.8 (GPU builds)
- Vulkan SDK (GPU builds)
Benchmark: BitNet-b1.58-7B inference (1024 tokens)
| Variant | Tokens/sec | vs Portable |
|---|---|---|
portable |
100 | baseline |
amd-zen2 |
115 | +15% ⚡ |
amd-zen3 |
118 | +18% ⚡ |
amd-zen4 |
125 | +25% ⚡ |
intel-skylake |
112 | +12% ⚡ |
intel-alderlake |
120 | +20% ⚡ |
Tested on: Ryzen 9 3900X (zen2 variant), 32GB RAM
Want to build yourself? See the main repo:
git clone --recursive https://github.com/ocentra/BitNet.git
cd BitNetWindows:
.\build_complete.ps1 # Build all 16 variants
.\build_complete.ps1 -BuildVariants "amd-zen2,cuda-vulkan" # Selective buildLinux:
./build-all-linux.sh # Build all 15 variants
./build-all-linux.sh --variants amd-zen2,cuda-vulkan # Selective buildmacOS:
# Option 1: Build locally (requires Xcode)
./build-all-macos.sh # Build all 4 variants
./build-all-macos.sh --variants arm,metal # Selective build
# Option 2: Use GitHub Actions (no Mac needed!)
# Go to GitHub → Actions → "Build macOS Binaries (All Variants)" → Run workflow
# Download the artifacts and extract to BitnetRelease/The build scripts will:
- ✅ Detect your CPU and recommend optimal variant
- ✅ Build all variants (or selected ones)
- ✅ Smart incremental builds (skip existing)
- ✅ Output to
BitnetRelease/folder - ✅ Verify binaries and generate reports
For more details, see Build Documentation
This project is licensed under the MIT License.
- BitNet by Microsoft Research
- Repository: microsoft/BitNet
- License: MIT License
- llama.cpp by ggerganov
- Repository: ggerganov/llama.cpp
- License: MIT License
- Build scripts and optimizations by ocentra
- Repository: ocentra/BitNet
- License: MIT License
MIT License
Copyright (c) 2024 Microsoft Research (Original BitNet)
Copyright (c) 2024 ocentra (Build optimizations and distribution)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
This is a binary distribution repository. For source code contributions, please visit:
- ocentra/BitNet - Build scripts and optimizations
- microsoft/BitNet - Original BitNet implementation
Issues with builds:
- Open an issue at ocentra/BitNet Issues
BitNet questions:
TabAgent integration:
- Contact: TabAgent Server
- Microsoft Research - Original BitNet implementation
- ggerganov - llama.cpp inference engine
- NVIDIA - CUDA Toolkit
- Khronos Group - Vulkan and OpenCL standards
Current Status:
- Platforms: 3 (Windows ✅ / Linux ✅ / macOS ✅)
- Build Variants: 35 total
- Windows: 16 (13 CPU + 3 GPU) ✅
- Linux: 15 (12 CPU + 3 GPU) ✅
- macOS: 4 (3 CPU + 1 GPU Metal) ✅
- CPU Coverage: 2013-2024
- AMD: Zen 1-5 (Ryzen 1000-9000 series)
- Intel: Haswell through Alder Lake (4th-14th gen)
- Apple: M1/M2/M3/M4 (ARM TL1 kernels)
- GPU Support:
- Windows/Linux: CUDA + Vulkan + OpenCL + Python CUDA
- macOS: Metal (M1/M2/M3 + Intel Iris/AMD)
- Repository Size: ~8-10 GB (Git LFS)
- Build Time:
- Windows: ~90 minutes (all 16 variants)
- Linux: ~3 hours (all 15 variants)
- macOS: ~30 minutes (all 4 variants, via GitHub Actions)
Last Updated: October 2024
⚡ Performance matters. Use the right build for your CPU and platform!