|
| 1 | +# llama.cpp Backend Options |
| 2 | + |
| 3 | +Lemonade uses [llama.cpp](https://github.com/ggerganov/llama.cpp) as its primary LLM inference backend, supporting multiple hardware acceleration options. This document explains the available backends and how to choose between them. |
| 4 | + |
| 5 | +## Available Backends |
| 6 | + |
| 7 | +### CPU |
| 8 | +- **Platform**: Windows, Linux, macOS |
| 9 | +- **Hardware**: All x86_64 processors |
| 10 | +- **Use Case**: Universal fallback, no GPU required |
| 11 | +- **Performance**: Slowest option, suitable for small models or testing |
| 12 | +- **Installation**: Automatically available via upstream llama.cpp releases |
| 13 | + |
| 14 | +### Vulkan |
| 15 | +- **Platform**: Windows, Linux |
| 16 | +- **Hardware**: AMD GPUs (iGPU and dGPU), NVIDIA GPUs, Intel GPUs |
| 17 | +- **Use Case**: Cross-vendor GPU acceleration |
| 18 | +- **Performance**: Good performance across all GPU vendors |
| 19 | +- **Installation**: Automatically available via upstream llama.cpp releases |
| 20 | +- **Notes**: Recommended for most GPU users |
| 21 | + |
| 22 | +### ROCm |
| 23 | +- **Platform**: Windows, Linux |
| 24 | +- **Hardware**: AMD Radeon RX 6000/7000 series (RDNA2/RDNA3/RDNA4), AMD Ryzen AI iGPUs (Strix Point/Halo) |
| 25 | +- **Use Case**: AMD GPU-optimized inference |
| 26 | +- **Performance**: Optimized for AMD hardware, may outperform Vulkan on supported GPUs |
| 27 | +- **Channel Options**: |
| 28 | + - **Preview** (default): Custom builds with latest optimizations from lemonade-sdk |
| 29 | + - **Stable**: Upstream llama.cpp releases with AMD ROCm support |
| 30 | +- **Installation**: Varies by channel (see below) |
| 31 | + |
| 32 | +### Metal |
| 33 | +- **Platform**: macOS only |
| 34 | +- **Hardware**: Apple Silicon (M1/M2/M3/M4) and Intel Macs with Metal support |
| 35 | +- **Use Case**: macOS GPU acceleration |
| 36 | +- **Performance**: Optimized for Apple Silicon |
| 37 | +- **Installation**: Automatically available via upstream llama.cpp releases |
| 38 | + |
| 39 | +### System |
| 40 | +- **Platform**: Linux only |
| 41 | +- **Hardware**: Depends on system-installed llama-server binary |
| 42 | +- **Use Case**: Advanced users with custom llama.cpp builds |
| 43 | +- **Performance**: Depends on build configuration |
| 44 | +- **Installation**: Requires manual installation of `llama-server` in system PATH |
| 45 | +- **Notes**: Not enabled by default; set `LEMONADE_LLAMACPP_PREFER_SYSTEM=true` in config |
| 46 | + |
| 47 | +## ROCm Channel Configuration |
| 48 | + |
| 49 | +The ROCm backend supports two channels to balance stability and performance: |
| 50 | + |
| 51 | +### Preview Channel (Default) |
| 52 | +```json |
| 53 | +{ |
| 54 | + "llamacpp": { |
| 55 | + "rocm_channel": "preview" |
| 56 | + } |
| 57 | +} |
| 58 | +``` |
| 59 | +- **Source**: Custom builds from [lemonade-sdk/llamacpp-rocm](https://github.com/lemonade-sdk/llamacpp-rocm) |
| 60 | +- **Binaries**: Architecture-specific builds (gfx1150, gfx1151, gfx103X, gfx110X, gfx120X) |
| 61 | +- **Updates**: Frequent updates with latest optimizations and fixes |
| 62 | +- **Platform**: Windows and Linux |
| 63 | +- **Runtime**: Windows bundles ROCm runtime; Linux uses bundled runtime or system `/opt/rocm` |
| 64 | +- **Best For**: Users who want the latest performance optimizations |
| 65 | + |
| 66 | +### Stable Channel |
| 67 | +```json |
| 68 | +{ |
| 69 | + "llamacpp": { |
| 70 | + "rocm_channel": "stable" |
| 71 | + } |
| 72 | +} |
| 73 | +``` |
| 74 | +- **Source**: Upstream [llama.cpp](https://github.com/ggerganov/llama.cpp) releases |
| 75 | +- **Binaries**: |
| 76 | + - **Windows**: Self-contained HIP binaries (no separate runtime needed) |
| 77 | + - **Linux**: Binaries built against ROCm 7.2 runtime |
| 78 | +- **Updates**: Follows upstream llama.cpp release cycle |
| 79 | +- **Platform**: Windows and Linux |
| 80 | +- **Runtime**: |
| 81 | + - Windows: Self-contained, no runtime installation required |
| 82 | + - Linux: Downloads AMD ROCm 7.2.1 runtime if not present at `/opt/rocm` |
| 83 | +- **Best For**: Users who prefer stable, tested releases aligned with upstream |
| 84 | + |
| 85 | +### Changing Channels |
| 86 | + |
| 87 | +To switch between channels, update your `config.json`: |
| 88 | + |
| 89 | +```json |
| 90 | +{ |
| 91 | + "llamacpp": { |
| 92 | + "rocm_channel": "stable" |
| 93 | + } |
| 94 | +} |
| 95 | +``` |
| 96 | + |
| 97 | +Or use the Lemonade CLI: |
| 98 | +```bash |
| 99 | +# Switch to stable channel |
| 100 | +lemonade config set llamacpp.rocm_channel stable |
| 101 | + |
| 102 | +# Switch back to preview channel |
| 103 | +lemonade config set llamacpp.rocm_channel preview |
| 104 | +``` |
| 105 | + |
| 106 | +After changing channels, you'll need to reinstall the ROCm backend: |
| 107 | +```bash |
| 108 | +lemonade backend install llamacpp rocm |
| 109 | +``` |
| 110 | + |
| 111 | +## Choosing the Right Backend |
| 112 | + |
| 113 | +### Decision Tree |
| 114 | + |
| 115 | +1. **Do you have an NVIDIA or Intel GPU?** |
| 116 | + - Use **Vulkan** |
| 117 | + |
| 118 | +2. **Do you have an AMD GPU?** |
| 119 | + - **For Radeon RX 6000/7000 or Ryzen AI iGPU**: |
| 120 | + - Try **ROCm** first for best performance |
| 121 | + - Fall back to **Vulkan** if you encounter issues |
| 122 | + - **For older AMD GPUs (RX 5000 and earlier)**: |
| 123 | + - Use **Vulkan** (ROCm not supported) |
| 124 | + |
| 125 | +3. **Do you have Apple Silicon?** |
| 126 | + - Use **Metal** |
| 127 | + |
| 128 | +4. **No GPU or unsupported GPU?** |
| 129 | + - Use **CPU** |
| 130 | + |
| 131 | +### ROCm Channel Selection |
| 132 | + |
| 133 | +- **Use Preview** if you: |
| 134 | + - Want the best performance on AMD hardware |
| 135 | + - Are comfortable with frequent updates |
| 136 | + - Are testing new models or features |
| 137 | + |
| 138 | +- **Use Stable** if you: |
| 139 | + - Prefer stability over latest features |
| 140 | + - Want upstream llama.cpp compatibility |
| 141 | + - Are deploying in production |
| 142 | + |
| 143 | +## Platform Specifics |
| 144 | + |
| 145 | +### Linux |
| 146 | +- All backends supported (CPU, Vulkan, ROCm, System) |
| 147 | +- ROCm requires compatible AMD GPU (see above) |
| 148 | +- System backend requires manual llama-server installation |
| 149 | + |
| 150 | +### Windows |
| 151 | +- Supported: CPU, Vulkan, ROCm |
| 152 | +- ROCm requires compatible AMD GPU |
| 153 | +- No system backend support |
| 154 | + |
| 155 | +### macOS |
| 156 | +- Supported: CPU, Metal |
| 157 | +- Metal recommended for all Macs with Metal support |
| 158 | + |
| 159 | +## Troubleshooting |
| 160 | + |
| 161 | +### ROCm Backend Not Available |
| 162 | +- Verify your AMD GPU is RDNA2 or newer (RX 6000+ or Ryzen AI iGPU) |
| 163 | +- On Linux with Strix Halo (gfx1151), ensure kernel 6.13+ with CWSR support |
| 164 | +- Check `/api/v1/system-info` endpoint for backend availability |
| 165 | + |
| 166 | +### Performance Issues |
| 167 | +- Try switching between Vulkan and ROCm to compare |
| 168 | +- For ROCm, try both preview and stable channels |
| 169 | +- Check VRAM usage - some models may be too large for your GPU |
| 170 | + |
| 171 | +### Installation Failures |
| 172 | +- Ensure stable internet connection for downloads |
| 173 | +- Check disk space in Lemonade cache directory |
| 174 | +- For ROCm on Linux, verify `/opt/rocm` permissions if using system runtime |
| 175 | + |
| 176 | +## Additional Resources |
| 177 | + |
| 178 | +- [Lemonade CLI Documentation](lemonade-cli.md) |
| 179 | +- [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp) |
| 180 | +- [AMD ROCm Documentation](https://rocm.docs.amd.com/) |
| 181 | +- [Vulkan Documentation](https://www.vulkan.org/) |
0 commit comments