Add vLLM UBI Dockerfile for Gaudi with RHEL 9.6 #686

ghandoura · 2025-12-04T14:57:32Z

GAUDISW-242243

Multi-stage build: gaudi-base → gaudi-pytorch → vllm-final

Build arguments:

SYNAPSE_VERSION: Habana Synapse AI version (default: 1.22.1)
PT_VERSION: PyTorch version (default: 2.7.1)
VLLM_GAUDI_COMMIT: vllm-gaudi git commit/tag (default: main)
VLLM_PROJECT_COMMIT: vllm upstream commit (auto-detected if empty)
TORCH_TYPE: PyTorch type - 'upstream' or 'fork' (default: upstream)

Usage:
docker build --build-arg SYNAPSE_VERSION=1.23.0 -t vllm-gaudi:1.23.0 .

GAUDISW-242243 - Multi-stage build: gaudi-base → gaudi-pytorch → vllm-final Build arguments: - SYNAPSE_VERSION: Habana Synapse AI version (default: 1.22.1) - PT_VERSION: PyTorch version (default: 2.7.1) - VLLM_GAUDI_COMMIT: vllm-gaudi git commit/tag (default: main) - VLLM_PROJECT_COMMIT: vllm upstream commit (auto-detected if empty) - TORCH_TYPE: PyTorch type - 'upstream' or 'fork' (default: upstream) Usage: docker build --build-arg SYNAPSE_VERSION=1.23.0 -t vllm-gaudi:1.23.0 .

Copilot

Pull request overview

This PR adds a new multi-stage Dockerfile for building vLLM with Habana Gaudi support on RHEL 9.6 UBI. The Dockerfile enables building vLLM container images with configurable Synapse AI and PyTorch versions for Habana's Gaudi accelerators.

Key Changes:

Implements a three-stage build process: gaudi-base (system setup + Habana drivers), gaudi-pytorch (PyTorch installation), and vllm-final (vLLM installation and configuration)
Adds flexible build arguments for version management (SYNAPSE_VERSION, PT_VERSION, VLLM_GAUDI_COMMIT, VLLM_PROJECT_COMMIT, TORCH_TYPE)
Configures OpenShift-compatible non-root user setup for enterprise deployments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T14:58:31Z

.cd/Dockerfile.rhel.ubi.vllm

+ARG BASE_NAME=rhel9.6
+ARG PT_VERSION=2.7.1
+# can be upstream or fork
+ARG TORCH_TYPE='upstream'


Remove quotes around the default value. In Dockerfile ARG declarations, quotes become part of the value itself. Change to ARG TORCH_TYPE=upstream to avoid potential issues when this value is compared or used in conditions.

Suggested change

ARG TORCH_TYPE='upstream'

ARG TORCH_TYPE=upstream

Copilot · 2025-12-04T14:58:31Z

.cd/Dockerfile.rhel.ubi.vllm

+    git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \
+    cd $VLLM_PATH2 && \
+    if [ -z "${VLLM_PROJECT_COMMIT}" ]; then \
+    VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null || echo "main") && \


The fallback error message 'main' is unhelpful when the git show command fails. Consider adding a warning message using echo >&2 \"Warning: Could not fetch last-good-commit, using main branch\" before the fallback to help with debugging.

Suggested change

VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null || echo "main") && \

VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null || { echo >&2 "Warning: Could not fetch last-good-commit, using main branch"; echo "main"; }) && \

Copilot · 2025-12-04T14:58:32Z

.cd/Dockerfile.rhel.ubi.vllm

+# Clone and install vLLM
+RUN set -e && \
+    mkdir -p $VLLM_PATH2 && \
+    git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \


The shallow clone with --depth 1 may prevent checking out specific commits if VLLM_GAUDI_COMMIT references an older commit. Consider removing --depth 1 or add a full fetch before checkout at line 179 to ensure arbitrary commits can be checked out.

Suggested change

git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \

git clone https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \

Copilot · 2025-12-04T14:58:32Z

.cd/Dockerfile.rhel.ubi.vllm

+    echo "Clone vllm-project/vllm and use configured or last good commit hash" && \
+    git clone https://github.com/vllm-project/vllm.git $VLLM_PATH && \
+    cd $VLLM_PATH && \
+    git remote add upstream https://github.com/vllm-project/vllm.git && \


The HTTP configuration values (524288000 bytes buffer, 1000 bytes/sec speed limit, 600 second timeout) lack explanation for why these specific values are needed. Add a comment explaining these are set to handle large repository fetches over slow connections.

Suggested change

git remote add upstream https://github.com/vllm-project/vllm.git && \

git remote add upstream https://github.com/vllm-project/vllm.git && \

# The following git HTTP configuration values are set to handle large repository fetches over slow connections.

ghandoura · 2025-12-04T15:04:15Z

tested using 1.22.1 synapse and both upstream and fork pytorch

github-actions · 2025-12-04T16:45:43Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Copilot AI review requested due to automatic review settings December 4, 2025 14:57

ghandoura requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 4, 2025 14:57

Copilot AI reviewed Dec 4, 2025

View reviewed changes

Merge branch 'main' into vllm_ubi

5d160a9

github-actions bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vLLM UBI Dockerfile for Gaudi with RHEL 9.6 #686

Add vLLM UBI Dockerfile for Gaudi with RHEL 9.6 #686

Uh oh!

ghandoura commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

ghandoura commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null \|\| echo "main") && \
	VLLM_PROJECT_COMMIT=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null \|\| { echo >&2 "Warning: Could not fetch last-good-commit, using main branch"; echo "main"; }) && \

	git clone --depth 1 https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \
	git clone https://github.com/vllm-project/vllm-gaudi.git $VLLM_PATH2 && \

	git remote add upstream https://github.com/vllm-project/vllm.git && \
	git remote add upstream https://github.com/vllm-project/vllm.git && \
	# The following git HTTP configuration values are set to handle large repository fetches over slow connections.

Add vLLM UBI Dockerfile for Gaudi with RHEL 9.6 #686

Are you sure you want to change the base?

Add vLLM UBI Dockerfile for Gaudi with RHEL 9.6 #686

Uh oh!

Conversation

ghandoura commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ghandoura commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

🚧 CI Blocked

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants