Skip to content

Conversation

vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Jul 1, 2025

Summary

  • This PR enhances the robustness of the KV offload generation pipeline by adding pixel values padding for Vision-Language Models. The implementation includes approach that works across different VLM architectures. Currently exposed to Llama4.

Problem Statement

  • During KV offload generation in modeling_auto.py, some VLM models require specific pixel values tensor shapes that may not match the input data. Previously, this could cause runtime errors or suboptimal performance when the input pixel values didn't match the expected patch count for the compiled model.

Solution

  • Model-Specific Patch Count Method: Added get_expected_patch_count() method to VLM model classes that returns the expected number of patches as an integer: (17 in case of Llama4)

@vbaddi vbaddi self-assigned this Jul 1, 2025
@vbaddi vbaddi added enhancement New feature or request 1.20.0 labels Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.21.0 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants