Skip to content

Commit f5422f8

Browse files
committed
Add support for getting a ROCm stable runtime backend
The ROCm stable runtime utilizes the AMD released stable ROCm runtime and the upstream llama.cpp artifacts.
1 parent 158c046 commit f5422f8

14 files changed

Lines changed: 1478 additions & 46 deletions

.github/workflows/cpp_server_build_test_release.yml

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1324,6 +1324,163 @@ jobs:
13241324
with:
13251325
artifact-name: server-logs-apikey-windows-latest
13261326

1327+
# ========================================================================
1328+
# TEST ROCM STABLE CHANNEL - Verify stable channel on Windows hosted runner
1329+
# ========================================================================
1330+
1331+
test-rocm-stable-channel:
1332+
name: Test ROCm Stable Channel (Windows)
1333+
runs-on: windows-latest
1334+
needs:
1335+
- build-lemonade-server-installer
1336+
env:
1337+
LEMONADE_CI_MODE: "True"
1338+
PYTHONIOENCODING: utf-8
1339+
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
1340+
steps:
1341+
- uses: actions/checkout@v4
1342+
1343+
- name: Setup (Windows)
1344+
shell: powershell
1345+
run: |
1346+
$cwd = (Get-Item .).FullName
1347+
echo "HF_HOME=$cwd\hf-cache" >> $Env:GITHUB_ENV
1348+
echo "LEMONADE_INSTALL_PATH=$cwd\lemonade_server_install" >> $Env:GITHUB_ENV
1349+
1350+
- name: Install Lemonade Server (Windows)
1351+
uses: ./.github/actions/install-lemonade-server-msi
1352+
with:
1353+
install-path: ${{ env.LEMONADE_INSTALL_PATH }}
1354+
1355+
- name: Set paths (Windows)
1356+
shell: powershell
1357+
run: |
1358+
echo "VENV_PYTHON=.venv/Scripts/python.exe" >> $Env:GITHUB_ENV
1359+
echo "SERVER_BINARY=$Env:LEMONADE_INSTALL_PATH\bin\lemonade-server.exe" >> $Env:GITHUB_ENV
1360+
1361+
- name: Setup Python and virtual environment
1362+
uses: ./.github/actions/setup-venv
1363+
with:
1364+
venv-name: '.venv'
1365+
python-version: '3.10'
1366+
requirements-file: 'test/requirements.txt'
1367+
1368+
- name: Set ROCm channel to stable
1369+
shell: bash
1370+
run: |
1371+
set -e
1372+
echo "Setting rocm_channel to stable in config.json"
1373+
"$VENV_PYTHON" -c "
1374+
import json
1375+
import os
1376+
from pathlib import Path
1377+
1378+
cache_dir = Path.home() / '.cache' / 'lemonade'
1379+
config_file = cache_dir / 'config.json'
1380+
1381+
# Create cache dir if needed
1382+
cache_dir.mkdir(parents=True, exist_ok=True)
1383+
1384+
# Load or create config
1385+
if config_file.exists():
1386+
with open(config_file, 'r') as f:
1387+
config = json.load(f)
1388+
else:
1389+
config = {}
1390+
1391+
# Set rocm_channel to stable
1392+
config['rocm_channel'] = 'stable'
1393+
1394+
# Save config
1395+
with open(config_file, 'w') as f:
1396+
json.dump(config, f, indent=2)
1397+
1398+
print(f'Config updated: rocm_channel = stable')
1399+
"
1400+
1401+
- name: Verify ROCm stable channel configuration
1402+
shell: bash
1403+
run: |
1404+
set -e
1405+
echo "Verifying rocm_channel setting..."
1406+
"$VENV_PYTHON" -c "
1407+
import json
1408+
from pathlib import Path
1409+
1410+
config_file = Path.home() / '.cache' / 'lemonade' / 'config.json'
1411+
1412+
if not config_file.exists():
1413+
print('ERROR: config.json not found')
1414+
exit(1)
1415+
1416+
with open(config_file, 'r') as f:
1417+
config = json.load(f)
1418+
1419+
channel = config.get('rocm_channel', 'NOT_SET')
1420+
print(f'rocm_channel = {channel}')
1421+
1422+
if channel != 'stable':
1423+
print(f'ERROR: Expected rocm_channel=stable, got {channel}')
1424+
exit(1)
1425+
1426+
print('SUCCESS: rocm_channel is set to stable')
1427+
"
1428+
1429+
- name: Test channel switching with lemonade CLI
1430+
shell: bash
1431+
run: |
1432+
set -e
1433+
echo "Setting rocm_channel to stable via lemonade CLI..."
1434+
1435+
# Use lemonade config set to change the channel
1436+
"$SERVER_BINARY" config set rocm_channel=stable
1437+
1438+
echo "Verifying config was updated..."
1439+
"$VENV_PYTHON" -c "
1440+
import json
1441+
from pathlib import Path
1442+
1443+
config_file = Path.home() / '.cache' / 'lemonade' / 'config.json'
1444+
1445+
if not config_file.exists():
1446+
print('ERROR: config.json not found')
1447+
exit(1)
1448+
1449+
with open(config_file, 'r') as f:
1450+
config = json.load(f)
1451+
1452+
channel = config.get('rocm_channel', 'NOT_SET')
1453+
print(f'rocm_channel = {channel}')
1454+
1455+
if channel != 'stable':
1456+
print(f'ERROR: Expected rocm_channel=stable, got {channel}')
1457+
exit(1)
1458+
1459+
print('SUCCESS: rocm_channel set to stable via CLI')
1460+
"
1461+
1462+
- name: Test recipes command shows rocm backend
1463+
shell: bash
1464+
run: |
1465+
set -e
1466+
echo "Testing recipes command..."
1467+
"$SERVER_BINARY" recipes | tee recipes_output.txt
1468+
1469+
# Check that rocm backend is listed (not rocm-stable or rocm-preview)
1470+
if grep -q "rocm-stable" recipes_output.txt || grep -q "rocm-preview" recipes_output.txt; then
1471+
echo "ERROR: Found rocm-stable or rocm-preview in recipes output (should only show 'rocm')"
1472+
cat recipes_output.txt
1473+
exit 1
1474+
fi
1475+
1476+
echo "SUCCESS: Recipes output looks correct"
1477+
1478+
- name: Capture and upload server logs
1479+
if: always()
1480+
uses: ./.github/actions/capture-server-logs
1481+
with:
1482+
artifact-name: server-logs-rocm-stable-channel
1483+
13271484
# ========================================================================
13281485
# RELEASE JOB - Add artifacts to GitHub release
13291486
# ========================================================================

docs/llamacpp.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# llama.cpp Backend Options
2+
3+
Lemonade uses [llama.cpp](https://github.com/ggerganov/llama.cpp) as its primary LLM inference backend, supporting multiple hardware acceleration options. This document explains the available backends and how to choose between them.
4+
5+
## Available Backends
6+
7+
### CPU
8+
- **Platform**: Windows, Linux, macOS
9+
- **Hardware**: All x86_64 processors
10+
- **Use Case**: Universal fallback, no GPU required
11+
- **Performance**: Slowest option, suitable for small models or testing
12+
- **Installation**: Automatically available via upstream llama.cpp releases
13+
14+
### Vulkan
15+
- **Platform**: Windows, Linux
16+
- **Hardware**: AMD GPUs (iGPU and dGPU), NVIDIA GPUs, Intel GPUs
17+
- **Use Case**: Cross-vendor GPU acceleration
18+
- **Performance**: Good performance across all GPU vendors
19+
- **Installation**: Automatically available via upstream llama.cpp releases
20+
- **Notes**: Recommended for most GPU users
21+
22+
### ROCm
23+
- **Platform**: Windows, Linux
24+
- **Hardware**: AMD Radeon RX 6000/7000 series (RDNA2/RDNA3/RDNA4), AMD Ryzen AI iGPUs (Strix Point/Halo)
25+
- **Use Case**: AMD GPU-optimized inference
26+
- **Performance**: Optimized for AMD hardware, may outperform Vulkan on supported GPUs
27+
- **Channel Options**:
28+
- **Preview** (default): Custom builds with latest optimizations from lemonade-sdk
29+
- **Stable**: Upstream llama.cpp releases with AMD ROCm support
30+
- **Installation**: Varies by channel (see below)
31+
32+
### Metal
33+
- **Platform**: macOS only
34+
- **Hardware**: Apple Silicon (M1/M2/M3/M4) and Intel Macs with Metal support
35+
- **Use Case**: macOS GPU acceleration
36+
- **Performance**: Optimized for Apple Silicon
37+
- **Installation**: Automatically available via upstream llama.cpp releases
38+
39+
### System
40+
- **Platform**: Linux only
41+
- **Hardware**: Depends on system-installed llama-server binary
42+
- **Use Case**: Advanced users with custom llama.cpp builds
43+
- **Performance**: Depends on build configuration
44+
- **Installation**: Requires manual installation of `llama-server` in system PATH
45+
- **Notes**: Not enabled by default; set `LEMONADE_LLAMACPP_PREFER_SYSTEM=true` in config
46+
47+
## ROCm Channel Configuration
48+
49+
The ROCm backend supports two channels to balance stability and performance:
50+
51+
### Preview Channel (Default)
52+
```json
53+
{
54+
"llamacpp": {
55+
"rocm_channel": "preview"
56+
}
57+
}
58+
```
59+
- **Source**: Custom builds from [lemonade-sdk/llamacpp-rocm](https://github.com/lemonade-sdk/llamacpp-rocm)
60+
- **Binaries**: Architecture-specific builds (gfx1150, gfx1151, gfx103X, gfx110X, gfx120X)
61+
- **Updates**: Frequent updates with latest optimizations and fixes
62+
- **Platform**: Windows and Linux
63+
- **Runtime**: Windows bundles ROCm runtime; Linux uses bundled runtime or system `/opt/rocm`
64+
- **Best For**: Users who want the latest performance optimizations
65+
66+
### Stable Channel
67+
```json
68+
{
69+
"llamacpp": {
70+
"rocm_channel": "stable"
71+
}
72+
}
73+
```
74+
- **Source**: Upstream [llama.cpp](https://github.com/ggerganov/llama.cpp) releases
75+
- **Binaries**:
76+
- **Windows**: Self-contained HIP binaries (no separate runtime needed)
77+
- **Linux**: Binaries built against ROCm 7.2 runtime
78+
- **Updates**: Follows upstream llama.cpp release cycle
79+
- **Platform**: Windows and Linux
80+
- **Runtime**:
81+
- Windows: Self-contained, no runtime installation required
82+
- Linux: Downloads AMD ROCm 7.2.1 runtime if not present at `/opt/rocm`
83+
- **Best For**: Users who prefer stable, tested releases aligned with upstream
84+
85+
### Changing Channels
86+
87+
To switch between channels, update your `config.json`:
88+
89+
```json
90+
{
91+
"llamacpp": {
92+
"rocm_channel": "stable"
93+
}
94+
}
95+
```
96+
97+
Or use the Lemonade CLI:
98+
```bash
99+
# Switch to stable channel
100+
lemonade config set llamacpp.rocm_channel stable
101+
102+
# Switch back to preview channel
103+
lemonade config set llamacpp.rocm_channel preview
104+
```
105+
106+
After changing channels, you'll need to reinstall the ROCm backend:
107+
```bash
108+
lemonade backend install llamacpp rocm
109+
```
110+
111+
## Choosing the Right Backend
112+
113+
### Decision Tree
114+
115+
1. **Do you have an NVIDIA or Intel GPU?**
116+
- Use **Vulkan**
117+
118+
2. **Do you have an AMD GPU?**
119+
- **For Radeon RX 6000/7000 or Ryzen AI iGPU**:
120+
- Try **ROCm** first for best performance
121+
- Fall back to **Vulkan** if you encounter issues
122+
- **For older AMD GPUs (RX 5000 and earlier)**:
123+
- Use **Vulkan** (ROCm not supported)
124+
125+
3. **Do you have Apple Silicon?**
126+
- Use **Metal**
127+
128+
4. **No GPU or unsupported GPU?**
129+
- Use **CPU**
130+
131+
### ROCm Channel Selection
132+
133+
- **Use Preview** if you:
134+
- Want the best performance on AMD hardware
135+
- Are comfortable with frequent updates
136+
- Are testing new models or features
137+
138+
- **Use Stable** if you:
139+
- Prefer stability over latest features
140+
- Want upstream llama.cpp compatibility
141+
- Are deploying in production
142+
143+
## Platform Specifics
144+
145+
### Linux
146+
- All backends supported (CPU, Vulkan, ROCm, System)
147+
- ROCm requires compatible AMD GPU (see above)
148+
- System backend requires manual llama-server installation
149+
150+
### Windows
151+
- Supported: CPU, Vulkan, ROCm
152+
- ROCm requires compatible AMD GPU
153+
- No system backend support
154+
155+
### macOS
156+
- Supported: CPU, Metal
157+
- Metal recommended for all Macs with Metal support
158+
159+
## Troubleshooting
160+
161+
### ROCm Backend Not Available
162+
- Verify your AMD GPU is RDNA2 or newer (RX 6000+ or Ryzen AI iGPU)
163+
- On Linux with Strix Halo (gfx1151), ensure kernel 6.13+ with CWSR support
164+
- Check `/api/v1/system-info` endpoint for backend availability
165+
166+
### Performance Issues
167+
- Try switching between Vulkan and ROCm to compare
168+
- For ROCm, try both preview and stable channels
169+
- Check VRAM usage - some models may be too large for your GPU
170+
171+
### Installation Failures
172+
- Ensure stable internet connection for downloads
173+
- Check disk space in Lemonade cache directory
174+
- For ROCm on Linux, verify `/opt/rocm` permissions if using system runtime
175+
176+
## Additional Resources
177+
178+
- [Lemonade CLI Documentation](lemonade-cli.md)
179+
- [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp)
180+
- [AMD ROCm Documentation](https://rocm.docs.amd.com/)
181+
- [Vulkan Documentation](https://www.vulkan.org/)

src/cpp/include/lemon/runtime_config.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ class RuntimeConfig {
3535
bool offline() const;
3636
bool disable_model_filtering() const;
3737
bool enable_dgpu_gtt() const;
38+
std::string rocm_channel() const;
3839

3940
// Backend settings (nested)
4041
json backend_config(const std::string& backend_name) const;

src/cpp/resources/backend_versions.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
{
22
"comment": "This configuration file controls which llama.cpp, whisper.cpp, sd.cpp, ryzenai-llm, and FLM versions are downloaded for each backend. You can modify these values to pin specific versions without rebuilding the application.",
3+
"rocm-stable-runtime": "v7.2.1",
34
"llamacpp": {
45
"vulkan": "b8668",
5-
"rocm": "b1231",
6+
"rocm-stable": "b8653",
7+
"rocm-preview": "b1231",
68
"metal": "b8460",
79
"cpu": "b8668"
810
},

src/cpp/resources/defaults.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
"offline": false,
1414
"disable_model_filtering": false,
1515
"enable_dgpu_gtt": false,
16+
"rocm_channel": "preview",
1617
"llamacpp": {
1718
"backend": "auto",
1819
"args": "",

0 commit comments

Comments
 (0)