Add StableLM-3B 4E1T to Keras Hub #2151

Bond099 · 2025-03-18T18:43:49Z

This PR adds the StableLM-3B 4E1T model to Keras Hub. However, numerical matching with the Hugging Face implementation is still in progress.

Bond099 · 2025-03-22T18:54:33Z

@divyashreepathihalli Here is a comparison of numerics with Hugging Face in Colab. The results match with an absolute tolerance of 1e-3, but they do not match when using 1e-5. Could you please take a look and suggest some improvements or explanations for this discrepancy?

keras_hub/src/models/stablelm/stablelm_attention.py

keras_hub/src/models/stablelm/stablelm_backbone.py

divyashreepathihalli · 2025-04-16T05:52:32Z

The numerics is good enough!

abheesht17

Looks good overall, but let's check the numerics and the generate output.

keras_hub/src/models/stablelm/stablelm_attention.py

keras_hub/src/models/stablelm/stablelm_backbone.py

mattdangerw · 2025-05-29T15:38:56Z

@Bond099 let's sync this with the latest changes and make sure to run our format script. I'm not exactly sure why non of our CI is running, but I don't think it ran.

abheesht17 · 2025-06-16T17:43:25Z

Let's clean up the PR. Can we fix the following minor things?

Pull in master.
I still don't see tie_weights = False in the PR.
Run formatting, etc.
Let's wait for all the tests to run.

abheesht17 · 2025-06-17T04:20:04Z

Looks like there are conflicts. Please pull in master and resolve conflicts

abheesht17 · 2025-07-03T00:22:23Z

keras_hub/src/samplers/sampler.py

@@ -3,7 +3,6 @@
 from keras import random


I'd actually want sampler.py to be untouched, because all models use this. We'd want to make this change only for Stable LM. The changes will be less intrusive that way.

Is there a way you can pass the correct mask right here:

keras-hub/keras_hub/src/models/stablelm/stablelm_causal_lm.py

Line 130 in e836a78

mask=padding_mask,

, https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/samplers/sampler.py#L79?

If you do pass the correct mask, this should evaluate to the correct bool: https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/samplers/sampler.py#L100.

Take a look if this works.

divyashreepathihalli · 2025-07-11T00:05:08Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for the StableLM-3B-4E1T model. I've identified a critical bug in the StableLMTransformerDecoder's caching logic that would cause failures during training, and a few medium-to-high severity issues in the StableLMBackbone's documentation and the checkpoint conversion script.

gemini-code-assist · 2025-07-11T00:06:51Z

keras_hub/src/models/stablelm/stablelm_decoder.py

+        x, self_attention_cache = self.self_attention_layer(
+            hidden_states=x,
+            attention_mask=self_attention_mask,
+            cache=self_attention_cache,
+            cache_update_index=self_attention_cache_update_index,
+        )
+        x = self.self_attention_dropout(x, training=training)
+        x = x + residual
+
+        residual = x
+
+        # Feedforward block
+        x = self.feedforward_layernorm(x)
+        gate_output = self.feedforward_gate_dense(x)
+        gate_output = self.activation(gate_output)
+        intermediate_output = self.feedforward_intermediate_dense(x)
+        x = self.feedforward_output_dense(
+            ops.multiply(intermediate_output, gate_output)
+        )
+        decoder_output = x + residual
+        if self_attention_cache is not None:
+            return decoder_output, self_attention_cache
+        return decoder_output


The call method's handling of the attention cache has a bug. When call is invoked without a cache (i.e., self_attention_cache is initially None), self.self_attention_layer returns a tuple of (attention_output, attention_output). This makes the local self_attention_cache a tensor, causing the if condition to be true and the method to incorrectly return a tuple, breaking the StableLMBackbone functional model which expects a single tensor output. Use a different variable name for the cache returned by the attention layer to fix this.

attention_output, next_attention_cache = self.self_attention_layer( hidden_states=x, attention_mask=self_attention_mask, cache=self_attention_cache, cache_update_index=self_attention_cache_update_index, ) x = self.self_attention_dropout(attention_output, training=training) x = x + residual residual = x # Feedforward block x = self.feedforward_layernorm(x) gate_output = self.feedforward_gate_dense(x) gate_output = self.activation(gate_output) intermediate_output = self.feedforward_intermediate_dense(x) x = self.feedforward_output_dense( ops.multiply(intermediate_output, gate_output) ) decoder_output = x + residual if self_attention_cache is not None: return decoder_output, next_attention_cache return decoder_output

gemini-code-assist · 2025-07-11T00:06:51Z

tools/checkpoint_conversion/convert_stablelm_checkpoints.py

+cfg["max_sequence_length"] = pt_cfg["max_position_embeddings"]
+cfg["layer_norm_epsilon"] = pt_cfg["layer_norm_eps"]
+cfg["rope_max_wavelength"] = pt_cfg["rope_theta"]
+cfg["partial_rotary_factor"] = pt_cfg["partial_rotary_factor"]


The Hugging Face config parameter partial_rotary_factor should be mapped to rotary_percentage to ensure the Keras model is configured correctly.

Suggested change

cfg["partial_rotary_factor"] = pt_cfg["partial_rotary_factor"]

cfg["rotary_percentage"] = pt_cfg["partial_rotary_factor"]

gemini-code-assist · 2025-07-11T00:06:51Z

keras_hub/src/models/stablelm/stablelm_backbone.py

+        tie_word_embeddings: bool, optional. Whether to tie input and output
+            embeddings. Defaults to False.


The docstring for __init__ includes the tie_word_embeddings argument, but it is not present in the method's signature. Either add the argument to the __init__ method, or remove it from the docstring.

gemini-code-assist · 2025-07-11T00:06:52Z

tools/checkpoint_conversion/convert_stablelm_checkpoints.py

+    keras_model.get_layer(
+        f"transformer_layer_{layer_index}"
+    )._self_attention_layer._query_dense.set_weights(weights)


The script accesses layer attributes using names with a leading underscore (e.g., _self_attention_layer, _query_dense). Use the public attribute names for setting weights instead.

Suggested change

keras_model.get_layer(

f"transformer_layer_{layer_index}"

)._self_attention_layer._query_dense.set_weights(weights)

keras_model.get_layer(

f"transformer_layer_{layer_index}"

).self_attention_layer.query_dense.set_weights(weights)

divyashreepathihalli · 2025-08-25T22:13:31Z

/gemini review

gemini-code-assist

Code Review

This pull request adds the StableLM-3B 4E1T model, which is a valuable contribution. The implementation largely follows the repository's style guide, including the modular structure and naming conventions. However, there are a few critical omissions and issues that need to be addressed. Most importantly, the model presets file (stablelm_presets.py) is missing, which is required by the style guide [^1] and is essential for loading pre-trained models. The PR also lacks the required validation Colab notebooks [^2]. Additionally, I've found a critical bug in the CausalLM implementation due to the use of a removed utility function, and a few other issues in the backbone and checkpoint conversion script that I've detailed in the specific comments.

gemini-code-assist · 2025-08-25T22:15:44Z

keras_hub/src/models/stablelm/stablelm_causal_lm.py

+            end_locations = any_equal(
+                token_ids, stop_token_ids, ops.logical_not(padding_mask)
+            )


The any_equal utility function has been removed in this PR (as seen in the changes to sampler.py), but it's still being used here. This will cause a runtime error.

You should replace this with equivalent logic to check for stop tokens. Here's a suggested implementation that achieves the same result.

Suggested change

end_locations = any_equal(

token_ids, stop_token_ids, ops.logical_not(padding_mask)

)

stop_token_ids_tensor = ops.convert_to_tensor(

stop_token_ids, dtype=token_ids.dtype

)

is_stop_token = ops.any(

token_ids[..., None] == stop_token_ids_tensor, axis=-1

)

end_locations = is_stop_token & ops.logical_not(padding_mask)

gemini-code-assist · 2025-08-25T22:15:45Z

tools/checkpoint_conversion/convert_stablelm_checkpoints.py

+cfg["max_sequence_length"] = pt_cfg["max_position_embeddings"]
+cfg["layer_norm_epsilon"] = pt_cfg["layer_norm_eps"]
+cfg["rope_max_wavelength"] = pt_cfg["rope_theta"]
+cfg["partial_rotary_factor"] = pt_cfg["partial_rotary_factor"]


The configuration key partial_rotary_factor is extracted from the Hugging Face config, but the StableLMBackbone expects rotary_percentage. This will cause a TypeError when initializing the backbone because of an unexpected keyword argument. You should use the correct key rotary_percentage.

Suggested change

cfg["partial_rotary_factor"] = pt_cfg["partial_rotary_factor"]

cfg["rotary_percentage"] = pt_cfg["partial_rotary_factor"]

gemini-code-assist · 2025-08-25T22:15:45Z

keras_hub/src/models/stablelm/stablelm_backbone.py

+        self.token_embedding = ReversibleEmbedding(
+            input_dim=vocabulary_size,
+            output_dim=hidden_dim,
+            tie_weights=False,


The tie_word_embeddings argument is documented in the Args section of the docstring but is not included in the __init__ method's signature. This causes a discrepancy between the documentation and the implementation, and prevents users from configuring this behavior. ¹

To fix this, you should:

Add tie_word_embeddings=False to the __init__ signature.

Store it as self.tie_word_embeddings.

Pass tie_weights=self.tie_word_embeddings to the ReversibleEmbedding layer.

Include it in get_config().

Suggested change

tie_weights=False,

tie_weights=tie_word_embeddings,

Style Guide References

Footnotes

Docstrings should accurately document all parameters. The tie_word_embeddings parameter is in the docstring but not in the function signature, making it unusable. (link) ↩

gemini-code-assist · 2025-08-25T22:15:45Z

keras_hub/src/models/stablelm/stablelm_backbone.py

+        if model_parallel_dim_name not in device_mesh.axis_names:
+            raise ValueError(
+                f"{model_parallel_dim_name} is not found in the "
+                f"device_mesh.axis_names. {device_mesh.axis_name=}"


There's a typo in the f-string. It should be device_mesh.axis_names instead of device_mesh.axis_name to correctly display the available axis names in the error message.

Suggested change

f"device_mesh.axis_names. {device_mesh.axis_name=}"

f"device_mesh.axis_names. {device_mesh.axis_names=}"

gemini-code-assist · 2025-08-25T22:15:45Z

keras_hub/src/models/stablelm/stablelm_backbone.py

+        if data_parallel_dim_name not in device_mesh.axis_names:
+            raise ValueError(
+                f"{data_parallel_dim_name} is not found in the "
+                f"device_mesh.axis_names. {device_mesh.axis_name=}"


There's a typo in the f-string. It should be device_mesh.axis_names instead of device_mesh.axis_name to correctly display the available axis names in the error message.

Suggested change

f"device_mesh.axis_names. {device_mesh.axis_name=}"

f"device_mesh.axis_names. {device_mesh.axis_names=}"

Bond099 added 9 commits March 18, 2025 22:21

stablelm_attention

79b99c5

stablelm_decoder

a903bdd

stablelm_backbone and stablelm_backbone_test

8540522

stablelm_tokenizer and stablelm_tokenizer_test

a28143e

stablelm_causal_lm_preprocessor

3083719

stablelm_causal_lm_preprocessor_test

6172850

stablelm_causal_lm

cb6b830

stablelm_causal_lm_test

5867ad0

initialization

6fd200b

mattdangerw requested a review from divyashreepathihalli March 18, 2025 20:58

divyashreepathihalli reviewed Mar 24, 2025

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Mar 24, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 24, 2025

sachinprasadhs added the WIP Pull requests which are work in progress and not ready yet for review. label Apr 11, 2025

Corrected test cases and added conversion checkpoints

5ce12a0

Bond099 requested a review from divyashreepathihalli April 28, 2025 15:16

Bond099 marked this pull request as ready for review May 15, 2025 09:58

abheesht17 reviewed May 29, 2025

View reviewed changes

keras_hub/src/models/stablelm/stablelm_attention.py Outdated Show resolved Hide resolved

keras_hub/src/models/stablelm/stablelm_attention.py Outdated Show resolved Hide resolved

keras_hub/src/models/stablelm/stablelm_backbone.py Outdated Show resolved Hide resolved

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jun 9, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 9, 2025

Bond099 added 2 commits June 17, 2025 01:30

Corrected and reformatted

a37de92

minor corrections

b62b606

Bond099 added 3 commits June 17, 2025 11:42

conflicts

a43e52c

Merge branch 'master' into stablelm

eefed1e

updated stablelm_attention

e5e5eb9

abheesht17 and others added 5 commits June 17, 2025 21:24

Run formatting

2d3a5f0

updated stop condition in sampler

05b1f0e

Merge branch 'keras-team:master' into stablelm

652e525

Merge branch 'stablelm' of github.com:Bond099/keras-hub into stablelm

cdb5f76

comments

e836a78

abheesht17 added the kokoro:force-run Runs Tests on GPU label Jul 3, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 3, 2025

abheesht17 requested changes Jul 3, 2025

View reviewed changes

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

sachinprasadhs added this to KerasHub Jul 16, 2025

sachinprasadhs moved this to In Progress in KerasHub Jul 16, 2025

gemini-code-assist bot reviewed Aug 25, 2025

View reviewed changes

	cfg["partial_rotary_factor"] = pt_cfg["partial_rotary_factor"]
	cfg["rotary_percentage"] = pt_cfg["partial_rotary_factor"]

		tie_word_embeddings: bool, optional. Whether to tie input and output
		embeddings. Defaults to False.

-            end_locations = any_equal(
-                token_ids, stop_token_ids, ops.logical_not(padding_mask)
-            )
+            stop_token_ids_tensor = ops.convert_to_tensor(
+                stop_token_ids, dtype=token_ids.dtype
+            )
+            is_stop_token = ops.any(
+                token_ids[..., None] == stop_token_ids_tensor, axis=-1
+            )
+            end_locations = is_stop_token & ops.logical_not(padding_mask)

	f"device_mesh.axis_names. {device_mesh.axis_name=}"
	f"device_mesh.axis_names. {device_mesh.axis_names=}"

Add StableLM-3B 4E1T to Keras Hub #2151

Are you sure you want to change the base?

Add StableLM-3B 4E1T to Keras Hub #2151

Uh oh!

Conversation

Bond099 commented Mar 18, 2025

Uh oh!

Bond099 commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divyashreepathihalli commented Apr 16, 2025

Uh oh!

abheesht17 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw commented May 29, 2025

Uh oh!

abheesht17 commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Jun 17, 2025

Uh oh!

abheesht17 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

divyashreepathihalli commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

divyashreepathihalli commented Aug 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 25, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Bond099 commented Mar 22, 2025 •

edited

Loading

abheesht17 commented Jun 16, 2025 •

edited

Loading