Skip to content
This repository was archived by the owner on Sep 18, 2025. It is now read-only.
This repository was archived by the owner on Sep 18, 2025. It is now read-only.

Issue: Error when running calibration for the FP8 Quantization using INC notebook #145

@wsfowler

Description

@wsfowler

I'm trying to follow the example here: https://github.com/HabanaAI/Gaudi-tutorials/blob/main/PyTorch/vLLM_Tutorials/FP8_Quantization_using_INC/FP8_Quantization_using_INC.ipynb

But I'm getting the error below when I try to run the calibration step:

./calibrate_model.sh -m $MODEL_NAME -d /root/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl -o g3 -b 128 -t 8 -l 1024

Processed prompts:   0%|          | 0/65 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][rank0]: Traceback (most recent call last):
[rank0]:   File "/root/vllm-hpu-extension/calibration/step-2-measure-scales.py", line 81, in <module>
[rank0]:     generate_responses(llm, input_batch, args)
[rank0]:   File "/root/vllm-hpu-extension/calibration/step-2-measure-scales.py", line 25, in generate_responses
[rank0]:     responses = llm.generate(input_batch, sampling_params, use_tqdm=True)
[rank0]:   File "/root/vllm-fork/vllm/utils.py", line 1158, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/root/vllm-fork/vllm/entrypoints/llm.py", line 469, in generate
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]:   File "/root/vllm-fork/vllm/entrypoints/llm.py", line 1397, in _run_engine
[rank0]:     step_outputs = self.llm_engine.step()
[rank0]:   File "/root/vllm-fork/vllm/engine/llm_engine.py", line 1330, in step
[rank0]:     ) = self.scheduler[virtual_engine].schedule()
[rank0]:   File "/root/vllm-fork/vllm/core/scheduler.py", line 1392, in schedule
[rank0]:     scheduler_outputs: SchedulerOutputs = self._schedule()
[rank0]:   File "/root/vllm-fork/vllm/core/scheduler.py", line 1351, in _schedule
[rank0]:     return self._schedule_default()
[rank0]:   File "/root/vllm-fork/vllm/core/scheduler.py", line 1174, in _schedule_default
[rank0]:     prefills = self._schedule_prefills(budget,
[rank0]:   File "/root/vllm-fork/vllm/core/scheduler.py", line 1087, in _schedule_prefills
[rank0]:     or not budget.can_schedule(**can_schedule_kwargs)):
[rank0]:   File "/root/vllm-fork/vllm/core/scheduler.py", line 182, in can_schedule
[rank0]:     num_new_padded_tokens = padding_fn(new_batch_size, new_max_seq_len)
[rank0]:   File "/root/vllm-fork/vllm/core/scheduler.py", line 141, in _hpu_padding_fn
[rank0]:     return padded_bs * padded_seq
[rank0]: TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions