Skip to content

Conversation

@nikolaystanishev
Copy link

@nikolaystanishev nikolaystanishev commented Jul 25, 2025

Description

  • When patch key or value model heads n_key_value_heads is used instead of n_heads.
  • When stacking the results for them in the attention head patching methods their results are padded due to different dimentions.

The problem is described in the corresponding issue.

Fixes #980

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Report] Error when patching key or value heads

1 participant