feat: add GPU support to metaflow-dev minikube setup #2609

cnaples79 · 2025-09-17T01:57:11Z

Summary

Add optional GPU support for minikube in metaflow-dev with intelligent auto-detection
Provide manual control via MINIKUBE_ENABLE_GPU environment variable
Enable GPU workloads like @resources(gpu=1) in local development environments

Changes Made

Auto-detection logic: Detects NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs automatically
Environment variable control: MINIKUBE_ENABLE_GPU=auto|true|false (default: auto)
User feedback: Informative messages about GPU detection status during startup
Help documentation: Updated help text with environment variable usage
Conditional flag addition: Adds --gpus all to minikube start only when appropriate

Modes of Operation

auto (default): Automatically detects GPU availability and enables if found
true: Force enables GPU support regardless of detection
false: Explicitly disables GPU support

Test Plan

✅ Verified Makefile syntax with make help
✅ Tested dry-run with make -n setup-minikube (shows no GPU detected message)
✅ Tested forced enable with MINIKUBE_ENABLE_GPU=true (correctly adds --gpus all flag)
✅ Confirmed help text displays new environment variable documentation

Example Usage

# Auto-detect GPU (default behavior)
make setup-minikube

# Force enable GPU support
MINIKUBE_ENABLE_GPU=true make setup-minikube

# Explicitly disable GPU support  
MINIKUBE_ENABLE_GPU=false make setup-minikube

Fixes #2606

Add optional GPU support for minikube with auto-detection and manual control via MINIKUBE_ENABLE_GPU environment variable. Features: - Auto-detect NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs - Three modes: auto (default), true (force enable), false (disable) - Informative messages about GPU detection status - Updated help text with environment variable documentation When enabled, adds --gpus all flag to minikube start command, enabling GPU workloads like @resources(gpu=1) in local development. Fixes Netflix#2606

feltech · 2025-09-17T10:44:43Z

devtools/Makefile

+ifeq ($(MINIKUBE_ENABLE_GPU), auto)
+	# Auto-detect GPU availability
+	ifeq ($(shell command -v nvidia-smi >/dev/null 2>&1 && echo "nvidia"), nvidia)
+		gpu_flags = --gpus all


Depending on how Docker is configured, --gpus all might not work, and instead you need to use --devices nvidia.com/gpu=all or similar.

(I had this problem with Docker on NixOS)

Thanks for the feedback, I'll update.

Apologies, I may have misled you here, the problem I referred to was using the docker command-line, rather than the minikube command-line. They both accept a --gpus argument, but I think minikube is a bit more clever about it. Indeed, --devices doesn't seem to be valid for minikube:

$ minikube start --devices nvidia.com/gpu=all Error: unknown flag: --devices

My bad, got my wires totally crossed.

Ahhh I see. No worries, I'll update the PR with the correct fix.

Based on maintainer feedback, improve GPU flag handling to address Docker configuration differences: - Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible) - Keep --gpus all for AMD/other GPUs - Add MINIKUBE_GPU_FLAG environment variable for explicit control: * auto: Smart selection based on GPU type (default) * gpus: Force --gpus all format * devices: Force --devices nvidia.com/gpu=all format * custom: User-provided custom flag This addresses compatibility issues where --gpus all might not work in certain Docker configurations (e.g., Docker on NixOS). Addresses feedback in Netflix#2606

cnaples79 · 2025-09-17T12:25:41Z

Thanks for the feedback @feltech! I've updated the implementation to address the Docker compatibility concerns:

Changes Made

🔧 Improved Docker Compatibility:

Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible with different Docker configurations)
Keep --gpus all for AMD/other GPUs
This addresses the NixOS Docker issue you mentioned

⚙️ Enhanced Control Options:
Added MINIKUBE_GPU_FLAG environment variable for explicit control:

auto (default): Smart selection based on GPU type
gpus: Force --gpus all format
devices: Force --devices nvidia.com/gpu=all format
Custom value: User-provided flag (e.g., --devices nvidia.com/gpu=2)

Example Usage

# Auto-detect best GPU flag (default)
make setup-minikube

# Force devices format (good for Docker compatibility issues)
MINIKUBE_GPU_FLAG=devices make setup-minikube

# Force legacy gpus format
MINIKUBE_GPU_FLAG=gpus make setup-minikube

# Custom GPU specification
MINIKUBE_GPU_FLAG="--devices nvidia.com/gpu=2" make setup-minikube

This should resolve the Docker configuration compatibility issues while maintaining flexibility for different setups. Let me know if this addresses your concerns!

…l\n\n- Revert previous change introducing MINIKUBE_GPU_FLAG and --devices\n- minikube does not accept --devices; keep simple --gpus all when GPU detected or forced\n\nAcknowledges review: the docker CLI concern does not apply to minikube.

cnaples79 · 2025-09-17T21:35:24Z

Thanks for the clarification, and you're absolutely right — minikube doesn't support --devices. I've updated the PR to remove the --devices path and always pass --gpus all to minikube start when GPU is detected or forced via MINIKUBE_ENABLE_GPU=true.

Summary of changes:

Remove MINIKUBE_GPU_FLAG and the --devices nvidia.com/gpu=all path
Keep simple/valid --gpus all for minikube
Preserve auto‑detection and MINIKUBE_ENABLE_GPU env var controls

If you want me to also document the separate Docker CLI considerations (for folks not using minikube), I can add a short note in the devtools help.

feltech reviewed Sep 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add GPU support to metaflow-dev minikube setup #2609

feat: add GPU support to metaflow-dev minikube setup #2609

Uh oh!

cnaples79 commented Sep 17, 2025

Uh oh!

feltech Sep 17, 2025 •

edited

Loading

Uh oh!

cnaples79 Sep 17, 2025

Uh oh!

feltech Sep 17, 2025

Uh oh!

cnaples79 Sep 17, 2025

Uh oh!

cnaples79 commented Sep 17, 2025

Uh oh!

cnaples79 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add GPU support to metaflow-dev minikube setup #2609

Are you sure you want to change the base?

feat: add GPU support to metaflow-dev minikube setup #2609

Uh oh!

Conversation

cnaples79 commented Sep 17, 2025

Summary

Changes Made

Modes of Operation

Test Plan

Example Usage

Uh oh!

feltech Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cnaples79 Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

feltech Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

cnaples79 Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

cnaples79 commented Sep 17, 2025

Changes Made

Example Usage

Uh oh!

cnaples79 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feltech Sep 17, 2025 •

edited

Loading