-
Notifications
You must be signed in to change notification settings - Fork 927
feat: add GPU support to metaflow-dev minikube setup #2609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: add GPU support to metaflow-dev minikube setup #2609
Conversation
Add optional GPU support for minikube with auto-detection and manual control via MINIKUBE_ENABLE_GPU environment variable. Features: - Auto-detect NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs - Three modes: auto (default), true (force enable), false (disable) - Informative messages about GPU detection status - Updated help text with environment variable documentation When enabled, adds --gpus all flag to minikube start command, enabling GPU workloads like @resources(gpu=1) in local development. Fixes Netflix#2606
devtools/Makefile
Outdated
ifeq ($(MINIKUBE_ENABLE_GPU), auto) | ||
# Auto-detect GPU availability | ||
ifeq ($(shell command -v nvidia-smi >/dev/null 2>&1 && echo "nvidia"), nvidia) | ||
gpu_flags = --gpus all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how Docker is configured, --gpus all
might not work, and instead you need to use --devices nvidia.com/gpu=all
or similar.
(I had this problem with Docker on NixOS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I may have misled you here, the problem I referred to was using the docker
command-line, rather than the minikube
command-line. They both accept a --gpus
argument, but I think minikube is a bit more clever about it. Indeed, --devices
doesn't seem to be valid for minikube:
$ minikube start --devices nvidia.com/gpu=all
Error: unknown flag: --devices
My bad, got my wires totally crossed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh I see. No worries, I'll update the PR with the correct fix.
Based on maintainer feedback, improve GPU flag handling to address Docker configuration differences: - Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible) - Keep --gpus all for AMD/other GPUs - Add MINIKUBE_GPU_FLAG environment variable for explicit control: * auto: Smart selection based on GPU type (default) * gpus: Force --gpus all format * devices: Force --devices nvidia.com/gpu=all format * custom: User-provided custom flag This addresses compatibility issues where --gpus all might not work in certain Docker configurations (e.g., Docker on NixOS). Addresses feedback in Netflix#2606
Thanks for the feedback @feltech! I've updated the implementation to address the Docker compatibility concerns: Changes Made🔧 Improved Docker Compatibility:
⚙️ Enhanced Control Options:
Example Usage# Auto-detect best GPU flag (default)
make setup-minikube
# Force devices format (good for Docker compatibility issues)
MINIKUBE_GPU_FLAG=devices make setup-minikube
# Force legacy gpus format
MINIKUBE_GPU_FLAG=gpus make setup-minikube
# Custom GPU specification
MINIKUBE_GPU_FLAG="--devices nvidia.com/gpu=2" make setup-minikube This should resolve the Docker configuration compatibility issues while maintaining flexibility for different setups. Let me know if this addresses your concerns! |
…l\n\n- Revert previous change introducing MINIKUBE_GPU_FLAG and --devices\n- minikube does not accept --devices; keep simple --gpus all when GPU detected or forced\n\nAcknowledges review: the docker CLI concern does not apply to minikube.
Thanks for the clarification, and you're absolutely right — Summary of changes:
If you want me to also document the separate Docker CLI considerations (for folks not using |
Summary
MINIKUBE_ENABLE_GPU
environment variable@resources(gpu=1)
in local development environmentsChanges Made
nvidia-smi
) and AMD (rocm-smi
) GPUs automaticallyMINIKUBE_ENABLE_GPU=auto|true|false
(default: auto)--gpus all
tominikube start
only when appropriateModes of Operation
auto
(default): Automatically detects GPU availability and enables if foundtrue
: Force enables GPU support regardless of detectionfalse
: Explicitly disables GPU supportTest Plan
make help
make -n setup-minikube
(shows no GPU detected message)MINIKUBE_ENABLE_GPU=true
(correctly adds--gpus all
flag)Example Usage
Fixes #2606