-
Notifications
You must be signed in to change notification settings - Fork 99
Dryrun implementation for generating command line file #723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Dryrun implementation for generating command line file #723
Conversation
.cd/entrypoints/script_generator.py
Outdated
| os.makedirs(self.log_dir, exist_ok=True) | ||
| os.execvp("bash", ["bash", self.output_script_path]) | ||
| if (os.environ.get("DRYRUN_SERVER")=='1' and self.mode=='server') or \ | ||
| (os.environ.get("DRYRUN_BENCHMARK")=='1' and self.mode=='benchmark'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify this to :
Just have one DRYRUN env var, and rename to DRY-RUN
Since script names are unique no need to have sub directories.
No need for mode arg then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for separate dry run variables for server and benchmark because, in case of benchmark dry run I need to allow the server launch. The docker compose enforce server is launched as a precondition for benchmark run. Having a single dry-run variable, I need think another way to allow the server launch and then stop the client launch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There 2 levels of interactions. 1. What is right for entrypoint 2. What is right for docker compose.
Entrypoint dry-run option does not need all this complexity.
For docker compose single dry-run option producing both scripts should be primary functionality.
As for docker compose server healthcheck, may be we alter that for dry-run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dry-Run implementation updated to remove dependency on the run mode.
DRYRUN env var, and renamed to DRY_RUN
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
.cd/entrypoints/script_generator.py
Outdated
| print(f"[INFO] This is a dry run to save the command line file {self.output_script_path}.") | ||
| shutil.copy(self.output_script_path, f"/local/{self.mode}/") | ||
| print(f"[INFO] The command line file {self.output_script_path} saved at .cd/{self.mode}/{self.output_script_path}") | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel right I can bring the below lines outside the if condition:
shutil.copy(self.output_script_path, f"/local/{self.mode}/")
print(f"[INFO] The command line file {self.output_script_path} saved at .cd/{self.mode}/{self.output_script_path}")
This will allow the command line file copy in normal run as well.
Regarding, print statement - I think is useful information to be logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asking about the Ctrl+C. Why not just exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over the call Ctrl+C is needed because the docker_compose is restarting the vllm_service after every exit.
The below lines moved outside the if condition
shutil.copy(self.output_script_path, f"/local/{self.mode}/")
print(f"[INFO] The command line file {self.output_script_path} saved at .cd/{self.mode}/{self.output_script_path}")
This will allow the command line file copy in normal run as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Signed-off-by: <>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Rajan Kumar <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
…e vLLM services Signed-off-by: Rajan Kumar <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
| docker run -it --rm \ | ||
| -e MODEL=$MODEL \ | ||
| -e HF_TOKEN=$HF_TOKEN \ | ||
| -e http_proxy=$http_proxy \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY_RUN env is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the cmd line
.cd/docker-compose.yml
Outdated
| - SYS_NICE | ||
| ipc: host | ||
| runtime: habana | ||
| restart: unless-stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change this to "on-failure",. we may not need dry-run CTRL+C code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restart condition "on-failure is working. Tested for bad model name failure. The Dry_Run do not need CTRL+C anymore.
Signed-off-by: Rajan Kumar <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
nngokhale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@rajanintel24 Executing ends with the error vllm-server-1 | Starting script, logging to logs/vllm_server.log
vllm-server-1 | Error: Permission denied. Cannot access 'vllm_server.sh' or write to '/local/'.
vllm-server-1 | [INFO] This is a dry run to save the command line file vllm_server.sh. |
|
@PatrykWo I am unable to reproduce the error you reported on the latest commit from your end. I am using below commands to test the branch:
|
The dryrun feature allow users to copy the vllm-server or vllm-benchmark command line file on the host machine i.e. local area, without launching the the server or the client.
To copy the command line files, pass the below environment variable:
DRY_RUN=1
On server mode, the server command line file will be saved at /.cd/vllm_server.sh.
On benchmark mode, both the command line files will be saved at /.cd/vllm_server.sh and /.cd/vllm_benchmark.sh