Skip to content

Conversation

@yangulei
Copy link
Contributor

@yangulei yangulei commented Dec 25, 2025

Current implementation skip calling self.model_runner.warmup_model() for enforce_eager=True leads to empty bucket lists in the bucket manager, and the following find_bucket calls will get fallback buckets.

The buckets are generated by the following calls in self.model_runner.warmup_model().

def warmup_model(self) -> None:
if not self.enable_bucketing:
return
if self.unified_attn:
self.bucketing_manager.generate_unified_buckets()
if self.supports_mm_inputs:
# Delayed multimodal buckets during warmup until model is loaded.
from vllm_gaudi.extension.bucketing.vision import HPUVisionBucketManager
self.get_model().vision_bucket_manager = HPUVisionBucketManager(get_config().model_type)
msg = (f"Multimodal bucket : {self.get_model().vision_bucket_manager.multimodal_buckets}")
logger.info(msg)
else:
self.bucketing_manager.generate_prompt_buckets()
if not self.is_pooling_model:
self.bucketing_manager.generate_decode_buckets()

And the actual warmup will be skipped for enforce_eager=True according to

with compile_only_mode_context() if can_use_compile_only_mode else contextlib.nullcontext():
if not self.model_config.enforce_eager:
assert self.mem_margin is not None, \
("HabanaWorker.determine_num_available_blocks needs "
"to be called before warming up the model.")
if self.is_pooling_model:
self.warmup_pooler()
else:
self.warmup_sampler()
self.warmup_defragmenter()
# TODO(kzawora): align_workers
if self.unified_attn:
self.warmup_unified_graphs(self.bucketing_manager.unified_buckets, kv_caches)
else:
mem_post_prompt, prompt_batch_seq, prompt_captured_all = \
self.warmup_graphs(
self.bucketing_manager.prompt_buckets, True, kv_caches)
self.log_graph_warmup_summary(self.bucketing_manager.prompt_buckets, True, mem_post_prompt)
if not self.is_pooling_model:
mem_post_decode, decode_batch_seq, decode_captured_all = \
self.warmup_graphs(
self.bucketing_manager.decode_buckets, False, kv_caches)
self.log_graph_warmup_summary(self.bucketing_manager.decode_buckets, False, mem_post_decode)

So the self.model_runner.warmup_model() cannot be skipped when enforce_eager=True and no actual warmup in this case as expected.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where eager execution mode (enforce_eager=True) was causing empty bucket lists in the bucket manager. The fix ensures that warmup_model() is always called to generate necessary buckets, while the actual warmup execution is still skipped for eager mode as intended.

Key changes:

  • Removed the enforce_eager check from the warmup condition, allowing bucket generation to occur even in eager mode
  • The actual warmup compilation is still skipped for eager mode internally within warmup_model()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
5d9308968649c81ee5903fc2a77377d738ed2f6d

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
cddbc2b4b2547c681d1bdb876fdd6a7b8e0ec58d

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
aa125ecf0edb9cd67656553d11d643aeb444ff9e

@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
66652e8082b69ba7d1e6aca7c234433de55f1b9b

@yangulei yangulei force-pushed the eager_bucket branch 2 times, most recently from 69ca41f to 058627c Compare January 15, 2026 17:44
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
4c1c501a7ee1d5efbad945ea62a702ce5cefb799

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants