-
Notifications
You must be signed in to change notification settings - Fork 35
Block interface: full changes #363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
+0
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
First impression is very good! |
tscholak
approved these changes
Sep 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Description
Major feature: Revisit model configuration.
mixer
: Dynamic configuration for a mixer. Available:attention
(default),mamba
,mamba_2
,discrete_mamba_2
.mlp
: Dynamic configuration for a mlp. Available:mlp
(default),moe
.block
: Configuration for a transformer (or other) block. Available:decoder
(default, standard transformer-decoder-style block with a mixer and a mlp).block_sequence
: Configuration for a sequence of blocks, ex. a transformer decoder. Available:fixed
(default),pattern
.embeddings_layer
/output_layer
: New modular configuration for language model embeddings and head, extracted from language model config into modular (but not yet dynamic) components.base_model
: For thegpt
model, this consists ofembeddings_layer
, a dynamicblock_sequence
and amodel_head
ParameterConfig
. Parameter (meta) creation now goes throughParameterConfig.get_parameter
. (Block interface: parameter and linear config, separate SSM config. #358)ParameterConfig
, with defaults set by the parent layer. Remove most pre-existing initialization config fields as they are no longer needed. (Block interface: rework LM config, fine-grained initialization, lr_scale, peft #360)lr_scale
field inParameterConfig
and most layer configs. The resulting lr scale for a given parameter is the product of its own and that of all its parent layers. (Block interface: rework LM config, fine-grained initialization, lr_scale, peft #360)apply_peft
field of linear configs, and default set by the parent layer. (Block interface: rework LM config, fine-grained initialization, lr_scale, peft #360)add_linear_biases
. (Mixers and MLPs now use a separate field, as in HFtransformers
)Side feature: Modularize model components.
Since many components are now dynamic, the GPT base model is no longer able to keep track of all the internal details, so several hard-coded parts had to be replaced by modular methods that delegate to the appropriate components. This includes:
Side feature: Revisit model conversion. (#362)
Llama
, and use it as a basis for other models which override/extend the relevant components.apriel_hybrid_ssm
(previously apriel 15b) which supports both mamba 2 and discrete mamba 2.Known issues / todo / notes
Critical:
0.3.0
so we have a clear distinction between formats.test_huggingface_model
fails for SSMs because of bugs in the external model (lack of support for DynamicCache?). This is a huge problem because this test is our main correctness test for SSMs.Important but could postpone:
test_huggingface_model
fails for Mixtral, probably a conversion issue.Minor: