Skip to content

[Bugfix] Fix broken CI #2415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

[Bugfix] Fix broken CI #2415

wants to merge 12 commits into from

Conversation

Potabk
Copy link
Contributor

@Potabk Potabk commented Aug 18, 2025

What this PR does / why we need it?

Fix vllm commit
Notice that I removed npu_input_batch and directly use the original input construction class of vllm for more convenient synchronization
Note: this patch fixed the eager-mode scenario, for the graph mode, there still need more work

Does this PR introduce any user-facing change?

How was this patch tested?

test it locally:

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:03<00:00,  3.57s/it]
(EngineCore_0 pid=185182) 
(EngineCore_0 pid=185182) INFO 08-18 07:37:11 [default_loader.py:262] Loading weights took 3.66 seconds
(EngineCore_0 pid=185182) INFO 08-18 07:37:12 [model_runner_v1.py:2107] Loading model weights took 0.9278 GB
(EngineCore_0 pid=185182) INFO 08-18 07:37:16 [worker_v1.py:184] Available memory: 26289759334, total memory: 31662800896
(EngineCore_0 pid=185182) INFO 08-18 07:37:16 [kv_cache_utils.py:849] GPU KV cache size: 2,139,392 tokens
(EngineCore_0 pid=185182) INFO 08-18 07:37:16 [kv_cache_utils.py:853] Maximum concurrency for 32,768 tokens per request: 65.29x
(EngineCore_0 pid=185182) INFO 08-18 07:37:16 [core.py:214] init engine (profile, create kv cache, warmup model) took 3.51 seconds
(EngineCore_0 pid=185182) Downloading Model from https://www.modelscope.cn to directory: /shared/cache/modelscope/hub/models/Qwen/Qwen2.5-0.5B-Instruct
(EngineCore_0 pid=185182) 2025-08-18 07:37:17,838 - modelscope - INFO - Creating symbolic link [/shared/cache/modelscope/hub/models/Qwen/Qwen2.5-0.5B-Instruct].
(EngineCore_0 pid=185182) 2025-08-18 07:37:17,840 - modelscope - WARNING - Failed to create symbolic link /shared/cache/modelscope/hub/models/Qwen/Qwen2.5-0.5B-Instruct for /shared/cache/modelscope/hub/models/Qwen/Qwen2___5-0___5B-Instruct.
(EngineCore_0 pid=185182) INFO 08-18 07:37:18 [platform.py:144] Compilation disabled, using eager mode by default
INFO 08-18 07:37:18 [llm.py:298] Supported_tasks: ['generate']
Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 413.41it/s]
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.16s/it, est. speed input: 4.74 toks/s, output: 86.18 toks/s]
Prompt: 'Hello, my name is', Generated text: ' Alex and I am a 17 year old male. I have been diagnosed with a rare genetic disorder called X-linked recessive. I have been told that I will not be able to have children. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of the X-linked recessive disorder. I have been told that I will not be able to have children because of'
Prompt: 'The president of the United States is', Generated text: ' a very important person. He is the leader of the country. He is the president of the United States. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the country. He is the leader of the'
Prompt: 'The capital of France is', Generated text: ' Paris. It is the largest city in Europe and the second largest city in the world. It is located in the south of France, on the banks of the Seine River. It is situated on the Île de la Cité, which is a small island in the center of the city. The city is surrounded by the Seine River and the Mediterranean Sea. It is also surrounded by the Pyrenees mountains. The city is home to many famous landmarks, including the Eiffel Tower'
Prompt: 'The future of AI is', Generated text: ' in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of the people. The future of AI is in the hands of'

Potabk added 10 commits August 18, 2025 10:06
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
@@ -235,7 +235,9 @@ def load_model(self) -> None:
self.model_runner.load_model()

def compile_or_warm_up_model(self) -> None:
warmup_sizes = self.vllm_config.compilation_config.compile_sizes.copy()
# Note: need to adapt the graph mode
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MengqingCao graph mode need note

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the codebase to use the upstream InputBatch from vLLM, removing the custom npu_input_batch.py. This is a good step towards better maintainability and alignment with the main project. The changes mostly involve adapting to the upstream API. I've found one critical issue where an attribute might not be initialized, which could lead to a runtime crash. Please see my detailed comment.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant