Skip to content

Feature: Add integrated GPU support via torch-directml (Intel/AMD iGPU on Windows) #270

@RajeshKumar11

Description

@RajeshKumar11

Summary

AirLLM currently hardcodes CUDA (NVIDIA) throughout its codebase, making it unusable on machines with only integrated GPUs (Intel UHD / Iris Xe / Arc, AMD Radeon integrated). This issue proposes adding support for Intel and AMD integrated GPUs on Windows via torch-directml.

Problem

Every CUDA-specific call in the codebase prevents non-NVIDIA users from running AirLLM:

  • torch.cuda.empty_cache() in utils.py
  • v.cuda() in compress_layer_state_dict / uncompress_layer_state_dict
  • device.startswith("cuda") gating prefetch stream in airllm_base.py
  • torch.cuda.is_available() gating pin_memory in airllm_base.py
  • torch.cuda.mem_get_info() in profiler.py
  • Default device="cuda:0" with no fallback

This affects a large number of users — laptops and budget desktops with no discrete GPU are extremely common.

Proposed Solution

  1. New device_utils.py module — device-agnostic helpers:

    • get_device_type(device)"cuda" | "directml" | "mps" | "cpu"
    • empty_cache(device) — safe for any device
    • can_pin_memory(device) — only true for CUDA
    • supports_bitsandbytes(device) — only true for CUDA (guards compression)
    • is_directml_available() — detects torch-directml install
  2. Replace all CUDA-specific calls in utils.py, airllm_base.py, profiler.py with device-agnostic equivalents.

  3. Clear error message when user tries to use compression= on a non-CUDA device (bitsandbytes is NVIDIA-only), rather than a confusing crash.

  4. Optional dependencytorch-directml added as pip install airllm[directml].

Hardware Tested

  • GPU: Intel Iris Xe Graphics (integrated)
  • OS: Windows 11
  • Python: 3.12.0, PyTorch 2.4.1, torch-directml 0.2.5

Verified:

  • ✅ Device detection works correctly
  • ✅ Tensor operations (matmul 1024×1024) run on iGPU via DirectML
  • ✅ Layer load → move to iGPU → unload cycle works
  • clean_memory() / empty_cache() safe on DirectML
  • ✅ Compression correctly blocked with clear error on non-CUDA

What Still Requires CUDA

bitsandbytes (used for 4-bit/8-bit compression) only supports NVIDIA GPUs. Compression is disabled for DirectML/MPS devices with a clear ValueError. This is an acceptable limitation and is documented.

Usage After This Change

# Install
pip install torch-directml
pip install airllm[directml]

# Run on Intel/AMD integrated GPU
from airllm import AutoModel

model = AutoModel.from_pretrained(
    "your-model-id",
    device="privateuseone:0",  # Intel/AMD iGPU via DirectML
    # compression= must not be used on iGPU
)

Are you open to a PR?

I have a working implementation ready. Happy to open a PR if this direction is acceptable to the maintainers. I can also extend this to support ROCm (AMD discrete) and Intel IPEX if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions