Dryrun implementation for generating command line file #723

rajanintel24 · 2025-12-16T07:52:52Z

The dryrun feature allow users to copy the vllm-server or vllm-benchmark command line file on the host machine i.e. local area, without launching the the server or the client.

To copy the command line files, pass the below environment variable:
DRY_RUN=1

On server mode, the server command line file will be saved at /.cd/vllm_server.sh.
On benchmark mode, both the command line files will be saved at /.cd/vllm_server.sh and /.cd/vllm_benchmark.sh

nngokhale · 2025-12-16T08:23:09Z

.cd/entrypoints/script_generator.py

        os.makedirs(self.log_dir, exist_ok=True)
-        os.execvp("bash", ["bash", self.output_script_path])
+        if (os.environ.get("DRYRUN_SERVER")=='1' and self.mode=='server') or \
+        (os.environ.get("DRYRUN_BENCHMARK")=='1' and self.mode=='benchmark'):


We could simplify this to :
Just have one DRYRUN env var, and rename to DRY-RUN
Since script names are unique no need to have sub directories.
No need for mode arg then.

I went for separate dry run variables for server and benchmark because, in case of benchmark dry run I need to allow the server launch. The docker compose enforce server is launched as a precondition for benchmark run. Having a single dry-run variable, I need think another way to allow the server launch and then stop the client launch.

There 2 levels of interactions. 1. What is right for entrypoint 2. What is right for docker compose.
Entrypoint dry-run option does not need all this complexity.
For docker compose single dry-run option producing both scripts should be primary functionality.
As for docker compose server healthcheck, may be we alter that for dry-run?

Dry-Run implementation updated to remove dependency on the run mode.
DRYRUN env var, and renamed to DRY_RUN

github-actions · 2025-12-16T08:23:41Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

nngokhale · 2025-12-16T08:24:08Z

.cd/entrypoints/script_generator.py

+            print(f"[INFO] This is a dry run to save the command line file {self.output_script_path}.")
+            shutil.copy(self.output_script_path, f"/local/{self.mode}/")
+            print(f"[INFO] The command line file {self.output_script_path} saved at .cd/{self.mode}/{self.output_script_path}")            
+            try:


Do we need this?

If you feel right I can bring the below lines outside the if condition:

shutil.copy(self.output_script_path, f"/local/{self.mode}/")
print(f"[INFO] The command line file {self.output_script_path} saved at .cd/{self.mode}/{self.output_script_path}")

This will allow the command line file copy in normal run as well.

Regarding, print statement - I think is useful information to be logged.

I was asking about the Ctrl+C. Why not just exit?

As discussed over the call Ctrl+C is needed because the docker_compose is restarting the vllm_service after every exit.

The below lines moved outside the if condition
shutil.copy(self.output_script_path, f"/local/{self.mode}/")
print(f"[INFO] The command line file {self.output_script_path} saved at .cd/{self.mode}/{self.output_script_path}")

This will allow the command line file copy in normal run as well.

Signed-off-by: <>

github-actions · 2025-12-17T07:42:26Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Rajan Kumar <[email protected]>

github-actions · 2026-01-08T10:53:42Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

…e vLLM services Signed-off-by: Rajan Kumar <[email protected]>

github-actions · 2026-01-08T15:45:32Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

nngokhale · 2026-01-09T08:49:10Z

docs/getting_started/quickstart/quickstart_configuration.md

+docker run -it --rm \
+    -e MODEL=$MODEL \
+    -e HF_TOKEN=$HF_TOKEN \
+    -e http_proxy=$http_proxy \


DRY_RUN env is missing

Updated the cmd line

nngokhale · 2026-01-09T08:55:01Z

.cd/docker-compose.yml

      - SYS_NICE
    ipc: host
    runtime: habana
    restart: unless-stopped


If we change this to "on-failure",. we may not need dry-run CTRL+C code

Restart condition "on-failure is working. Tested for bad model name failure. The Dry_Run do not need CTRL+C anymore.

Signed-off-by: Rajan Kumar <[email protected]>

github-actions · 2026-01-09T15:16:36Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

nngokhale

LGTM

PatrykWo · 2026-01-13T10:10:52Z

@rajanintel24 Executing ends with the error

vllm-server-1  | Starting script, logging to logs/vllm_server.log
vllm-server-1  | Error: Permission denied. Cannot access 'vllm_server.sh' or write to '/local/'.
vllm-server-1  | [INFO] This is a dry run to save the command line file vllm_server.sh.

rajanintel24 · 2026-01-13T13:24:09Z

@PatrykWo I am unable to reproduce the error you reported on the latest commit from your end.

I am using below commands to test the branch:

BUILD_ARGS="--build-arg http_proxy --build-arg https_proxy --build-arg no_proxy" docker build -f Dockerfile.ubuntu.pytorch.vllm -t cmd-dev-rk-1p23 $BUILD_ARGS .

MODEL="meta-llama/Llama-3.1-8B-Instruct" \ HF_TOKEN="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" \ HABANA_VISIBLE_DEVICES=7 \ DOCKER_IMAGE=cmd-dev-rk-1p23 \ TENSOR_PARALLEL_SIZE=1 \ DRY_RUN=1 \ docker compose up

docker run -it --rm \ -e MODEL="meta-llama/Llama-3.1-8B-Instruct" \ -e HF_TOKEN="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" \ -e HF_HOME=/mnt/hf_cache \ -e http_proxy=$http_proxy \ -e https_proxy=$https_proxy \ -e no_proxy=$no_proxy \ --cap-add=sys_nice \ --ipc=host \ --runtime=habana \ -e HABANA_VISIBLE_DEVICES=7 \ -e VLLM_SKIP_WARMUP=TRUE \ -e DRY_RUN=1 \ -p 9001:8000 \ -v /mnt/hf_cache:/mnt/hf_cache \ -v ${PWD}:/local \ --name card-7_vllm-server-rk \ cmd-dev-rk-1p23

Please share your command lines which returns the reported error.

Dryrun implementation for generating command line file

8cbbd8a

rajanintel24 requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 16, 2025 07:52

nngokhale reviewed Dec 16, 2025

View reviewed changes

github-actions bot mentioned this pull request Dec 16, 2025

🚦 Team Review Dashboard #701

Open

Dry-Run implementation - dependency on mode removed

2050a0f

Signed-off-by: <>

patch for os agnostic code

dcadb3d

Signed-off-by: Rajan Kumar <[email protected]>

Option to dry run with docker run cmd, save log files, create multipl…

3579c63

…e vLLM services Signed-off-by: Rajan Kumar <[email protected]>

nngokhale reviewed Jan 9, 2026

View reviewed changes

Updated Readme and restart condition

381f585

Signed-off-by: Rajan Kumar <[email protected]>

nngokhale approved these changes Jan 9, 2026

View reviewed changes

PatrykWo self-assigned this Jan 9, 2026

Merge branch 'main' into dryrun_implementation

2fff365

Dryrun implementation for generating command line file #723

Are you sure you want to change the base?

Dryrun implementation for generating command line file #723

Conversation

rajanintel24 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nngokhale Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025

🚧 CI Blocked

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajanintel24 Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 17, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Jan 8, 2026

🚧 CI Blocked

Uh oh!

github-actions bot commented Jan 8, 2026

🚧 CI Blocked

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 9, 2026

🚧 CI Blocked

Uh oh!

nngokhale left a comment

Choose a reason for hiding this comment

Uh oh!

PatrykWo commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajanintel24 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rajanintel24 commented Dec 16, 2025 •

edited

Loading

nngokhale Dec 16, 2025 •

edited

Loading

rajanintel24 Dec 16, 2025 •

edited

Loading

PatrykWo commented Jan 13, 2026 •

edited

Loading

rajanintel24 commented Jan 13, 2026 •

edited

Loading