fix empty buckets issue for enforce eager mode #761

yangulei · 2025-12-25T02:47:52Z

Current implementation skip calling self.model_runner.warmup_model() for enforce_eager=True leads to empty bucket lists in the bucket manager, and the following find_bucket calls will get fallback buckets.

The buckets are generated by the following calls in self.model_runner.warmup_model().

vllm-gaudi/vllm_gaudi/v1/worker/hpu_model_runner.py

Lines 4581 to 4596 in cc37f1f

    
           def warmup_model(self) -> None: 
        
               if not self.enable_bucketing: 
        
                   return 
        
               if self.unified_attn: 
        
                   self.bucketing_manager.generate_unified_buckets() 
        
                   if self.supports_mm_inputs: 
        
                       # Delayed multimodal buckets during warmup until model is loaded. 
        
                       from vllm_gaudi.extension.bucketing.vision import HPUVisionBucketManager 
        
                       self.get_model().vision_bucket_manager = HPUVisionBucketManager(get_config().model_type) 
        
                       msg = (f"Multimodal bucket : {self.get_model().vision_bucket_manager.multimodal_buckets}") 
        
                       logger.info(msg) 
        
               else: 
        
                   self.bucketing_manager.generate_prompt_buckets() 
        
                   if not self.is_pooling_model: 
        
                       self.bucketing_manager.generate_decode_buckets()

And the actual warmup will be skipped for enforce_eager=True according to

vllm-gaudi/vllm_gaudi/v1/worker/hpu_model_runner.py

Lines 4670 to 4695 in cc37f1f

    
           with compile_only_mode_context() if can_use_compile_only_mode else contextlib.nullcontext(): 
        
               if not self.model_config.enforce_eager: 
        
                   assert self.mem_margin is not None, \ 
        
                       ("HabanaWorker.determine_num_available_blocks needs " 
        
                        "to be called before warming up the model.") 
        
                   if self.is_pooling_model: 
        
                       self.warmup_pooler() 
        
                   else: 
        
                       self.warmup_sampler() 
        
                       self.warmup_defragmenter() 
        
                   # TODO(kzawora): align_workers 
        
                   if self.unified_attn: 
        
                       self.warmup_unified_graphs(self.bucketing_manager.unified_buckets, kv_caches) 
        
                   else: 
        
                       mem_post_prompt, prompt_batch_seq, prompt_captured_all = \ 
        
                           self.warmup_graphs( 
        
                               self.bucketing_manager.prompt_buckets, True, kv_caches) 
        
                       self.log_graph_warmup_summary(self.bucketing_manager.prompt_buckets, True, mem_post_prompt) 
        
                       if not self.is_pooling_model: 
        
                           mem_post_decode, decode_batch_seq, decode_captured_all = \ 
        
                             self.warmup_graphs( 
        
                                 self.bucketing_manager.decode_buckets, False, kv_caches) 
        
                           self.log_graph_warmup_summary(self.bucketing_manager.decode_buckets, False, mem_post_decode)

So the self.model_runner.warmup_model() cannot be skipped when enforce_eager=True and no actual warmup in this case as expected.

Copilot

Pull request overview

This PR fixes an issue where eager execution mode (enforce_eager=True) was causing empty bucket lists in the bucket manager. The fix ensures that warmup_model() is always called to generate necessary buckets, while the actual warmup execution is still skipped for eager mode as intended.

Key changes:

Removed the enforce_eager check from the warmup condition, allowing bucket generation to occur even in eager mode
The actual warmup compilation is still skipped for eager mode internally within warmup_model()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2025-12-25T04:47:32Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
5d9308968649c81ee5903fc2a77377d738ed2f6d

github-actions · 2026-01-08T13:35:13Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
cddbc2b4b2547c681d1bdb876fdd6a7b8e0ec58d

github-actions · 2026-01-09T12:14:16Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
aa125ecf0edb9cd67656553d11d643aeb444ff9e

github-actions · 2026-01-15T16:02:43Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
66652e8082b69ba7d1e6aca7c234433de55f1b9b

Signed-off-by: Youlei Yang <[email protected]>

github-actions · 2026-01-16T07:07:02Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
4c1c501a7ee1d5efbad945ea62a702ce5cefb799

Copilot AI review requested due to automatic review settings December 25, 2025 02:47

yangulei requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 25, 2025 02:47

Copilot AI reviewed Dec 25, 2025

View reviewed changes

github-actions bot mentioned this pull request Dec 25, 2025

🚦 Team Review Dashboard #701

Open

yangulei force-pushed the eager_bucket branch from af9f5a1 to 0d42f15 Compare January 15, 2026 01:39

iboiko-habana approved these changes Jan 15, 2026

View reviewed changes

fix empty buckets issue for enforce eager mode

058627c

Signed-off-by: Youlei Yang <[email protected]>

yangulei force-pushed the eager_bucket branch 2 times, most recently from 69ca41f to 058627c Compare January 15, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix empty buckets issue for enforce eager mode #761

fix empty buckets issue for enforce eager mode #761

yangulei commented Dec 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def warmup_model(self) -> None:
	if not self.enable_bucketing:
	return

	if self.unified_attn:
	self.bucketing_manager.generate_unified_buckets()
	if self.supports_mm_inputs:
	# Delayed multimodal buckets during warmup until model is loaded.
	from vllm_gaudi.extension.bucketing.vision import HPUVisionBucketManager
	self.get_model().vision_bucket_manager = HPUVisionBucketManager(get_config().model_type)
	msg = (f"Multimodal bucket : {self.get_model().vision_bucket_manager.multimodal_buckets}")
	logger.info(msg)
	else:
	self.bucketing_manager.generate_prompt_buckets()
	if not self.is_pooling_model:
	self.bucketing_manager.generate_decode_buckets()

	with compile_only_mode_context() if can_use_compile_only_mode else contextlib.nullcontext():
	if not self.model_config.enforce_eager:
	assert self.mem_margin is not None, \
	("HabanaWorker.determine_num_available_blocks needs "
	"to be called before warming up the model.")

	if self.is_pooling_model:
	self.warmup_pooler()
	else:
	self.warmup_sampler()
	self.warmup_defragmenter()

	# TODO(kzawora): align_workers
	if self.unified_attn:
	self.warmup_unified_graphs(self.bucketing_manager.unified_buckets, kv_caches)
	else:
	mem_post_prompt, prompt_batch_seq, prompt_captured_all = \
	self.warmup_graphs(
	self.bucketing_manager.prompt_buckets, True, kv_caches)
	self.log_graph_warmup_summary(self.bucketing_manager.prompt_buckets, True, mem_post_prompt)
	if not self.is_pooling_model:
	mem_post_decode, decode_batch_seq, decode_captured_all = \
	self.warmup_graphs(
	self.bucketing_manager.decode_buckets, False, kv_caches)
	self.log_graph_warmup_summary(self.bucketing_manager.decode_buckets, False, mem_post_decode)

fix empty buckets issue for enforce eager mode #761

Are you sure you want to change the base?

fix empty buckets issue for enforce eager mode #761

Conversation

yangulei commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions bot commented Dec 25, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Jan 8, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 9, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 15, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 16, 2026

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yangulei commented Dec 25, 2025 •

edited

Loading