-
Notifications
You must be signed in to change notification settings - Fork 99
fix empty buckets issue for enforce eager mode #761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue where eager execution mode (enforce_eager=True) was causing empty bucket lists in the bucket manager. The fix ensures that warmup_model() is always called to generate necessary buckets, while the actual warmup execution is still skipped for eager mode as intended.
Key changes:
- Removed the
enforce_eagercheck from the warmup condition, allowing bucket generation to occur even in eager mode - The actual warmup compilation is still skipped for eager mode internally within
warmup_model()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
✅ CI PassedAll checks passed successfully against the following vllm commit: |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
af9f5a1 to
0d42f15
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Signed-off-by: Youlei Yang <[email protected]>
69ca41f to
058627c
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Current implementation skip calling
self.model_runner.warmup_model()forenforce_eager=Trueleads to empty bucket lists in the bucket manager, and the followingfind_bucketcalls will get fallback buckets.The buckets are generated by the following calls in
self.model_runner.warmup_model().vllm-gaudi/vllm_gaudi/v1/worker/hpu_model_runner.py
Lines 4581 to 4596 in cc37f1f
And the actual warmup will be skipped for
enforce_eager=Trueaccording tovllm-gaudi/vllm_gaudi/v1/worker/hpu_model_runner.py
Lines 4670 to 4695 in cc37f1f
So the
self.model_runner.warmup_model()cannot be skipped whenenforce_eager=Trueand no actual warmup in this case as expected.