-
Couldn't load subscription status.
- Fork 931
feat: add GPU support to metaflow-dev minikube setup #2609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: add GPU support to metaflow-dev minikube setup #2609
Conversation
Add optional GPU support for minikube with auto-detection and manual control via MINIKUBE_ENABLE_GPU environment variable. Features: - Auto-detect NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs - Three modes: auto (default), true (force enable), false (disable) - Informative messages about GPU detection status - Updated help text with environment variable documentation When enabled, adds --gpus all flag to minikube start command, enabling GPU workloads like @resources(gpu=1) in local development. Fixes Netflix#2606
devtools/Makefile
Outdated
| ifeq ($(MINIKUBE_ENABLE_GPU), auto) | ||
| # Auto-detect GPU availability | ||
| ifeq ($(shell command -v nvidia-smi >/dev/null 2>&1 && echo "nvidia"), nvidia) | ||
| gpu_flags = --gpus all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how Docker is configured, --gpus all might not work, and instead you need to use --devices nvidia.com/gpu=all or similar.
(I had this problem with Docker on NixOS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I may have misled you here, the problem I referred to was using the docker command-line, rather than the minikube command-line. They both accept a --gpus argument, but I think minikube is a bit more clever about it. Indeed, --devices doesn't seem to be valid for minikube:
$ minikube start --devices nvidia.com/gpu=all
Error: unknown flag: --devices
My bad, got my wires totally crossed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh I see. No worries, I'll update the PR with the correct fix.
Based on maintainer feedback, improve GPU flag handling to address Docker configuration differences: - Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible) - Keep --gpus all for AMD/other GPUs - Add MINIKUBE_GPU_FLAG environment variable for explicit control: * auto: Smart selection based on GPU type (default) * gpus: Force --gpus all format * devices: Force --devices nvidia.com/gpu=all format * custom: User-provided custom flag This addresses compatibility issues where --gpus all might not work in certain Docker configurations (e.g., Docker on NixOS). Addresses feedback in Netflix#2606
|
Thanks for the feedback @feltech! I've updated the implementation to address the Docker compatibility concerns: Changes Made🔧 Improved Docker Compatibility:
⚙️ Enhanced Control Options:
Example Usage# Auto-detect best GPU flag (default)
make setup-minikube
# Force devices format (good for Docker compatibility issues)
MINIKUBE_GPU_FLAG=devices make setup-minikube
# Force legacy gpus format
MINIKUBE_GPU_FLAG=gpus make setup-minikube
# Custom GPU specification
MINIKUBE_GPU_FLAG="--devices nvidia.com/gpu=2" make setup-minikubeThis should resolve the Docker configuration compatibility issues while maintaining flexibility for different setups. Let me know if this addresses your concerns! |
…l\n\n- Revert previous change introducing MINIKUBE_GPU_FLAG and --devices\n- minikube does not accept --devices; keep simple --gpus all when GPU detected or forced\n\nAcknowledges review: the docker CLI concern does not apply to minikube.
|
Thanks for the clarification, and you're absolutely right — Summary of changes:
If you want me to also document the separate Docker CLI considerations (for folks not using |
Summary
MINIKUBE_ENABLE_GPUenvironment variable@resources(gpu=1)in local development environmentsChanges Made
nvidia-smi) and AMD (rocm-smi) GPUs automaticallyMINIKUBE_ENABLE_GPU=auto|true|false(default: auto)--gpus alltominikube startonly when appropriateModes of Operation
auto(default): Automatically detects GPU availability and enables if foundtrue: Force enables GPU support regardless of detectionfalse: Explicitly disables GPU supportTest Plan
make helpmake -n setup-minikube(shows no GPU detected message)MINIKUBE_ENABLE_GPU=true(correctly adds--gpus allflag)Example Usage
Fixes #2606