-
Notifications
You must be signed in to change notification settings - Fork 78
Add vLLM UBI Dockerfile for Gaudi with RHEL 9.6 #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
GAUDISW-242243 - Multi-stage build: gaudi-base → gaudi-pytorch → vllm-final Build arguments: - SYNAPSE_VERSION: Habana Synapse AI version (default: 1.22.1) - PT_VERSION: PyTorch version (default: 2.7.1) - VLLM_GAUDI_COMMIT: vllm-gaudi git commit/tag (default: main) - VLLM_PROJECT_COMMIT: vllm upstream commit (auto-detected if empty) - TORCH_TYPE: PyTorch type - 'upstream' or 'fork' (default: upstream) Usage: docker build --build-arg SYNAPSE_VERSION=1.23.0 -t vllm-gaudi:1.23.0 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new multi-stage Dockerfile for building vLLM with Habana Gaudi support on RHEL 9.6 UBI. The Dockerfile enables building vLLM container images with configurable Synapse AI and PyTorch versions for Habana's Gaudi accelerators.
Key Changes:
- Implements a three-stage build process: gaudi-base (system setup + Habana drivers), gaudi-pytorch (PyTorch installation), and vllm-final (vLLM installation and configuration)
- Adds flexible build arguments for version management (SYNAPSE_VERSION, PT_VERSION, VLLM_GAUDI_COMMIT, VLLM_PROJECT_COMMIT, TORCH_TYPE)
- Configures OpenShift-compatible non-root user setup for enterprise deployments
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ARG BASE_NAME=rhel9.6 | ||
| ARG PT_VERSION=2.7.1 | ||
| # can be upstream or fork | ||
| ARG TORCH_TYPE='upstream' |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove quotes around the default value. In Dockerfile ARG declarations, quotes become part of the value itself. Change to ARG TORCH_TYPE=upstream to avoid potential issues when this value is compared or used in conditions.
| ARG TORCH_TYPE='upstream' | |
| ARG TORCH_TYPE=upstream |
| git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \ | ||
| cd $VLLM_PATH2 && \ | ||
| if [ -z "${VLLM_PROJECT_COMMIT}" ]; then \ | ||
| VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null || echo "main") && \ |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback error message 'main' is unhelpful when the git show command fails. Consider adding a warning message using echo >&2 \"Warning: Could not fetch last-good-commit, using main branch\" before the fallback to help with debugging.
| VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null || echo "main") && \ | |
| VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null || { echo >&2 "Warning: Could not fetch last-good-commit, using main branch"; echo "main"; }) && \ |
| # Clone and install vLLM | ||
| RUN set -e && \ | ||
| mkdir -p $VLLM_PATH2 && \ | ||
| git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \ |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shallow clone with --depth 1 may prevent checking out specific commits if VLLM_GAUDI_COMMIT references an older commit. Consider removing --depth 1 or add a full fetch before checkout at line 179 to ensure arbitrary commits can be checked out.
| git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \ | |
| git clone https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \ |
| echo "Clone vllm-project/vllm and use configured or last good commit hash" && \ | ||
| git clone https://github.com/vllm-project/vllm.git $VLLM_PATH && \ | ||
| cd $VLLM_PATH && \ | ||
| git remote add upstream https://github.com/vllm-project/vllm.git && \ |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTP configuration values (524288000 bytes buffer, 1000 bytes/sec speed limit, 600 second timeout) lack explanation for why these specific values are needed. Add a comment explaining these are set to handle large repository fetches over slow connections.
| git remote add upstream https://github.com/vllm-project/vllm.git && \ | |
| git remote add upstream https://github.com/vllm-project/vllm.git && \ | |
| # The following git HTTP configuration values are set to handle large repository fetches over slow connections. |
|
tested using 1.22.1 synapse and both upstream and fork pytorch |
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
GAUDISW-242243
Build arguments:
Usage:
docker build --build-arg SYNAPSE_VERSION=1.23.0 -t vllm-gaudi:1.23.0 .