Summary
AirLLM currently hardcodes CUDA (NVIDIA) throughout its codebase, making it unusable on machines with only integrated GPUs (Intel UHD / Iris Xe / Arc, AMD Radeon integrated). This issue proposes adding support for Intel and AMD integrated GPUs on Windows via torch-directml.
Problem
Every CUDA-specific call in the codebase prevents non-NVIDIA users from running AirLLM:
torch.cuda.empty_cache() in utils.py
v.cuda() in compress_layer_state_dict / uncompress_layer_state_dict
device.startswith("cuda") gating prefetch stream in airllm_base.py
torch.cuda.is_available() gating pin_memory in airllm_base.py
torch.cuda.mem_get_info() in profiler.py
- Default
device="cuda:0" with no fallback
This affects a large number of users — laptops and budget desktops with no discrete GPU are extremely common.
Proposed Solution
-
New device_utils.py module — device-agnostic helpers:
get_device_type(device) → "cuda" | "directml" | "mps" | "cpu"
empty_cache(device) — safe for any device
can_pin_memory(device) — only true for CUDA
supports_bitsandbytes(device) — only true for CUDA (guards compression)
is_directml_available() — detects torch-directml install
-
Replace all CUDA-specific calls in utils.py, airllm_base.py, profiler.py with device-agnostic equivalents.
-
Clear error message when user tries to use compression= on a non-CUDA device (bitsandbytes is NVIDIA-only), rather than a confusing crash.
-
Optional dependency — torch-directml added as pip install airllm[directml].
Hardware Tested
- GPU: Intel Iris Xe Graphics (integrated)
- OS: Windows 11
- Python: 3.12.0, PyTorch 2.4.1, torch-directml 0.2.5
Verified:
- ✅ Device detection works correctly
- ✅ Tensor operations (matmul 1024×1024) run on iGPU via DirectML
- ✅ Layer load → move to iGPU → unload cycle works
- ✅
clean_memory() / empty_cache() safe on DirectML
- ✅ Compression correctly blocked with clear error on non-CUDA
What Still Requires CUDA
bitsandbytes (used for 4-bit/8-bit compression) only supports NVIDIA GPUs. Compression is disabled for DirectML/MPS devices with a clear ValueError. This is an acceptable limitation and is documented.
Usage After This Change
# Install
pip install torch-directml
pip install airllm[directml]
# Run on Intel/AMD integrated GPU
from airllm import AutoModel
model = AutoModel.from_pretrained(
"your-model-id",
device="privateuseone:0", # Intel/AMD iGPU via DirectML
# compression= must not be used on iGPU
)
Are you open to a PR?
I have a working implementation ready. Happy to open a PR if this direction is acceptable to the maintainers. I can also extend this to support ROCm (AMD discrete) and Intel IPEX if useful.
Summary
AirLLM currently hardcodes CUDA (NVIDIA) throughout its codebase, making it unusable on machines with only integrated GPUs (Intel UHD / Iris Xe / Arc, AMD Radeon integrated). This issue proposes adding support for Intel and AMD integrated GPUs on Windows via torch-directml.
Problem
Every CUDA-specific call in the codebase prevents non-NVIDIA users from running AirLLM:
torch.cuda.empty_cache()inutils.pyv.cuda()incompress_layer_state_dict/uncompress_layer_state_dictdevice.startswith("cuda")gating prefetch stream inairllm_base.pytorch.cuda.is_available()gatingpin_memoryinairllm_base.pytorch.cuda.mem_get_info()inprofiler.pydevice="cuda:0"with no fallbackThis affects a large number of users — laptops and budget desktops with no discrete GPU are extremely common.
Proposed Solution
New
device_utils.pymodule — device-agnostic helpers:get_device_type(device)→"cuda"|"directml"|"mps"|"cpu"empty_cache(device)— safe for any devicecan_pin_memory(device)— only true for CUDAsupports_bitsandbytes(device)— only true for CUDA (guards compression)is_directml_available()— detects torch-directml installReplace all CUDA-specific calls in
utils.py,airllm_base.py,profiler.pywith device-agnostic equivalents.Clear error message when user tries to use
compression=on a non-CUDA device (bitsandbytes is NVIDIA-only), rather than a confusing crash.Optional dependency —
torch-directmladded aspip install airllm[directml].Hardware Tested
Verified:
clean_memory()/empty_cache()safe on DirectMLWhat Still Requires CUDA
bitsandbytes(used for 4-bit/8-bit compression) only supports NVIDIA GPUs. Compression is disabled for DirectML/MPS devices with a clearValueError. This is an acceptable limitation and is documented.Usage After This Change
Are you open to a PR?
I have a working implementation ready. Happy to open a PR if this direction is acceptable to the maintainers. I can also extend this to support ROCm (AMD discrete) and Intel IPEX if useful.