Skip to content

Conversation

cnaples79
Copy link

Summary

  • Add optional GPU support for minikube in metaflow-dev with intelligent auto-detection
  • Provide manual control via MINIKUBE_ENABLE_GPU environment variable
  • Enable GPU workloads like @resources(gpu=1) in local development environments

Changes Made

  • Auto-detection logic: Detects NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs automatically
  • Environment variable control: MINIKUBE_ENABLE_GPU=auto|true|false (default: auto)
  • User feedback: Informative messages about GPU detection status during startup
  • Help documentation: Updated help text with environment variable usage
  • Conditional flag addition: Adds --gpus all to minikube start only when appropriate

Modes of Operation

  1. auto (default): Automatically detects GPU availability and enables if found
  2. true: Force enables GPU support regardless of detection
  3. false: Explicitly disables GPU support

Test Plan

  • ✅ Verified Makefile syntax with make help
  • ✅ Tested dry-run with make -n setup-minikube (shows no GPU detected message)
  • ✅ Tested forced enable with MINIKUBE_ENABLE_GPU=true (correctly adds --gpus all flag)
  • ✅ Confirmed help text displays new environment variable documentation

Example Usage

# Auto-detect GPU (default behavior)
make setup-minikube

# Force enable GPU support
MINIKUBE_ENABLE_GPU=true make setup-minikube

# Explicitly disable GPU support  
MINIKUBE_ENABLE_GPU=false make setup-minikube

Fixes #2606

Add optional GPU support for minikube with auto-detection and manual
control via MINIKUBE_ENABLE_GPU environment variable.

Features:
- Auto-detect NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs
- Three modes: auto (default), true (force enable), false (disable)
- Informative messages about GPU detection status
- Updated help text with environment variable documentation

When enabled, adds --gpus all flag to minikube start command, enabling
GPU workloads like @resources(gpu=1) in local development.

Fixes Netflix#2606
ifeq ($(MINIKUBE_ENABLE_GPU), auto)
# Auto-detect GPU availability
ifeq ($(shell command -v nvidia-smi >/dev/null 2>&1 && echo "nvidia"), nvidia)
gpu_flags = --gpus all
Copy link

@feltech feltech Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how Docker is configured, --gpus all might not work, and instead you need to use --devices nvidia.com/gpu=all or similar.

(I had this problem with Docker on NixOS)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I'll update.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I may have misled you here, the problem I referred to was using the docker command-line, rather than the minikube command-line. They both accept a --gpus argument, but I think minikube is a bit more clever about it. Indeed, --devices doesn't seem to be valid for minikube:

$ minikube start --devices nvidia.com/gpu=all
Error: unknown flag: --devices

My bad, got my wires totally crossed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see. No worries, I'll update the PR with the correct fix.

Based on maintainer feedback, improve GPU flag handling to address
Docker configuration differences:

- Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible)
- Keep --gpus all for AMD/other GPUs
- Add MINIKUBE_GPU_FLAG environment variable for explicit control:
  * auto: Smart selection based on GPU type (default)
  * gpus: Force --gpus all format
  * devices: Force --devices nvidia.com/gpu=all format
  * custom: User-provided custom flag

This addresses compatibility issues where --gpus all might not work
in certain Docker configurations (e.g., Docker on NixOS).

Addresses feedback in Netflix#2606
@cnaples79
Copy link
Author

Thanks for the feedback @feltech! I've updated the implementation to address the Docker compatibility concerns:

Changes Made

🔧 Improved Docker Compatibility:

  • Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible with different Docker configurations)
  • Keep --gpus all for AMD/other GPUs
  • This addresses the NixOS Docker issue you mentioned

⚙️ Enhanced Control Options:
Added MINIKUBE_GPU_FLAG environment variable for explicit control:

  • auto (default): Smart selection based on GPU type
  • gpus: Force --gpus all format
  • devices: Force --devices nvidia.com/gpu=all format
  • Custom value: User-provided flag (e.g., --devices nvidia.com/gpu=2)

Example Usage

# Auto-detect best GPU flag (default)
make setup-minikube

# Force devices format (good for Docker compatibility issues)
MINIKUBE_GPU_FLAG=devices make setup-minikube

# Force legacy gpus format
MINIKUBE_GPU_FLAG=gpus make setup-minikube

# Custom GPU specification
MINIKUBE_GPU_FLAG="--devices nvidia.com/gpu=2" make setup-minikube

This should resolve the Docker configuration compatibility issues while maintaining flexibility for different setups. Let me know if this addresses your concerns!

…l\n\n- Revert previous change introducing MINIKUBE_GPU_FLAG and --devices\n- minikube does not accept --devices; keep simple --gpus all when GPU detected or forced\n\nAcknowledges review: the docker CLI concern does not apply to minikube.
@cnaples79
Copy link
Author

Thanks for the clarification, and you're absolutely right — minikube doesn't support --devices. I've updated the PR to remove the --devices path and always pass --gpus all to minikube start when GPU is detected or forced via MINIKUBE_ENABLE_GPU=true.

Summary of changes:

  • Remove MINIKUBE_GPU_FLAG and the --devices nvidia.com/gpu=all path
  • Keep simple/valid --gpus all for minikube
  • Preserve auto‑detection and MINIKUBE_ENABLE_GPU env var controls

If you want me to also document the separate Docker CLI considerations (for folks not using minikube), I can add a short note in the devtools help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metaflow-dev: consider adding --gpus all to minikube start

2 participants