pd: support fp8 kvcache in insert_blocks_to_device #693

xinyu-intel · 2025-12-05T08:56:13Z

No description provided.

Copilot

Pull request overview

This PR adds support for FP8 key-value cache types (float8_e4m3fn and float8_e5m2) in the insert_blocks_to_device method by implementing a workaround for a PyTorch issue. The changes ensure FP8 tensors are temporarily converted to uint8 for indexing operations before being converted back.

Key changes:

Added FP8 dtype detection and uint8 conversion workaround
Updated both tuple and non-tuple cache handling paths to support FP8 types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-05T08:56:55Z

vllm_gaudi/platform.py

+        # WA: https://github.com/pytorch/pytorch/issues/169656
+        view_as_uint = src_cache.dtype in [torch.float8_e4m3fn, torch.float8_e5m2]
+        if view_as_uint:
+            src_cache = src_cache.view(torch.uint8)


The original dtype information is lost after converting src_cache to uint8. Later references to src_cache.dtype on lines 240 and 245 will return torch.uint8 instead of the original FP8 dtype. Store the original dtype in a variable before the conversion: original_dtype = src_cache.dtype and use original_dtype in the view conversions.

Copilot · 2025-12-05T08:56:55Z

vllm_gaudi/platform.py

-                dst_cache[i].index_copy_(0, dst_block_indices, _src_cache[i].to(dst_cache[i].device))
+                indexed_cache = _src_cache[i]
+                if view_as_uint:
+                    indexed_cache = indexed_cache.view(src_cache.dtype)


This attempts to view as src_cache.dtype, but src_cache was already converted to uint8 on line 234, so this will view as uint8 again instead of the original FP8 dtype. Use the original dtype stored before the conversion.

Copilot · 2025-12-05T08:56:55Z

vllm_gaudi/platform.py

-            dst_cache.index_copy_(0, dst_block_indices, src_cache[src_block_indices].to(dst_cache.device))
+            indexed_cache = src_cache[src_block_indices]
+            if view_as_uint:
+                indexed_cache = indexed_cache.view(src_cache.dtype)


Same issue as the tuple path: this views as src_cache.dtype which is now uint8, not the original FP8 dtype. Use the stored original dtype instead.

Signed-off-by: Xinyu Chen <[email protected]>

github-actions · 2025-12-05T12:30:49Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
1b7c7f5159484063af28cb47809d79e83d3301ec

xuechendi · 2025-12-08T23:10:05Z

vllm_gaudi/platform.py

+        # WA: https://github.com/pytorch/pytorch/issues/169656
+        view_as_uint = src_cache.dtype in [torch.float8_e4m3fn, torch.float8_e5m2]
+        if view_as_uint:
+            src_cache = src_cache.view(torch.uint8)


view as uint8? Can you explain more, how it helps here?

index_cpu doesn't support fp8 data type. view as uint8 here only for data movement.

Copilot AI review requested due to automatic review settings December 5, 2025 08:56

xinyu-intel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 5, 2025 08:56

Copilot AI reviewed Dec 5, 2025

View reviewed changes

pd: support fp8 kvcache in insert_blocks_to_device

859e9aa

Signed-off-by: Xinyu Chen <[email protected]>

xuechendi self-assigned this Dec 8, 2025

github-actions bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

xuechendi reviewed Dec 8, 2025

View reviewed changes

xuechendi approved these changes Dec 9, 2025

View reviewed changes

xuechendi merged commit 4a65393 into vllm-project:main Dec 9, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pd: support fp8 kvcache in insert_blocks_to_device #693

pd: support fp8 kvcache in insert_blocks_to_device #693

Uh oh!

xinyu-intel commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

xuechendi Dec 8, 2025

Uh oh!

xinyu-intel Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pd: support fp8 kvcache in insert_blocks_to_device #693

pd: support fp8 kvcache in insert_blocks_to_device #693

Uh oh!

Conversation

xinyu-intel commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 5, 2025

✅ CI Passed

Uh oh!

xuechendi Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

xinyu-intel Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants