Granite 3.3 Model Support for TransformerLens #965

emharsha1812 · 2025-07-13T12:51:56Z

Granite 3.3 Model Support for TransformerLens
Description
This PR adds support for the IBM Granite 3.3 family of models to TransformerLens. These models use a specialized architecture with Grouped Query Attention (GQA) and SwiGLU activation with separate gate and up-projection weights.

Key implementation details:

Added weight conversion logic in granite.py to properly handle Granite's architecture
Correctly handled the transposition and reshaping of attention weights
Implemented support for GQA (Grouped Query Attention)
Configured the model to use the GatedMLP component with silu activation
Ensured all tensors are moved to the correct device during conversion
The implementation enables researchers to use IBM's Granite models with TransformerLens's interpretability tools, expanding the range of architectures available for study

Fixes # (issue)

Added Support for Granite 3.3 architecture. Specifically Granite 3.3 collection

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Additional Notes
The implementation specifically handles the unique aspects of Granite 3.3's architecture:

It properly separates gate and up-projection weights for the GatedMLP component
It correctly handles the Grouped Query Attention mechanism
It ensures all tensors are moved to the model's configured device
This PR builds on TransformerLens's existing infrastructure for supporting various model architectures, making minimal changes to accommodate Granite 3.3's specific requirements

Release 2.16

Release 2.16.1

…merLens into granite3.3

bryce13950 and others added 9 commits June 12, 2025 11:19

Merge pull request TransformerLensOrg#945 from TransformerLensOrg/dev

e1c7506

Release 2.16

Merge pull request TransformerLensOrg#952 from TransformerLensOrg/dev

a634e57

Release 2.16.1

Support for Granite3.3

082a45e

Merge branch 'TransformerLensOrg:main' into granite3.3

b93dffd

Added Granite 3.3. Support

0d31a7b

Merge branch 'granite3.3' of https://github.com/emharsha1812/Transfor…

95e34be

…merLens into granite3.3

Fixed imports

bb53d1d

ran format

6fa6bea

Merge branch 'dev' into granite3.3

839a1c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Granite 3.3 Model Support for TransformerLens #965

Granite 3.3 Model Support for TransformerLens #965

Uh oh!

emharsha1812 commented Jul 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Granite 3.3 Model Support for TransformerLens #965

Are you sure you want to change the base?

Granite 3.3 Model Support for TransformerLens #965

Uh oh!

Conversation

emharsha1812 commented Jul 13, 2025

Screenshots

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants