Add Mixtral #2196

kanpuriyanawab · 2025-04-02T18:52:52Z

This PR adds Mixtral to Keras Hub.

Reference

kanpuriyanawab · 2025-04-14T15:50:53Z

Output matching :

divyashreepathihalli

Left a few comments! Please provide a demo colab

divyashreepathihalli · 2025-04-15T11:21:49Z

keras_hub/src/models/mixtral/mixtral_attention.py

+        )
+        self._query_dense.build(inputs_shape)
+
+        self._key_dense = keras.layers.EinsumDense(


update the layer names to be compatible with enable_lora

divyashreepathihalli · 2025-04-15T11:28:10Z

keras_hub/src/models/mixtral/mixtral_backbone.py

+@keras_hub_export("keras_hub.models.MixtralBackbone")
+class MixtralBackbone(Backbone):
+    """
+    The Mixtral Transformer core architecture with hyperparameters.


docstring first line should follow """

keras_hub/src/models/mixtral/mixtral_backbone.py

divyashreepathihalli · 2025-04-15T11:31:13Z

keras_hub/src/models/mixtral/mixtral_causal_lm_preprocessor.py

+    preprocessor("League of legends")
+
+    # Tokenize a batch of sentences.
+    sentences = tf.constant(["Taco tuesday", "Fish taco please!"])


divyashreepathihalli · 2025-04-15T11:31:40Z

keras_hub/src/models/mixtral/mixtral_causal_lm.py

+        target_ids = keras.ops.roll(generation_ids, shift=-1, axis=1)
+
+        embeddings = None
+        with tf.GradientTape(watch_accessed_variables=True) as tape:


borrowed docstring

We don't recommend using backend specific examples, For generic usage use keras.ops or numpy

There are some conflicts in the api directory due to the recent changes, please resolve.

conflicts resolved.

We don't recommend using backend specific examples, For generic usage use keras.ops or numpy

@sachinprasadhs like I mentioned above, there is already tf.GradientTape examples in existing model docstrings, that should be cleaned up in a separate PR.

kanpuriyanawab · 2025-04-28T05:41:27Z

mixtral output matching

sachinprasadhs

Added few more comments.

sachinprasadhs · 2025-04-29T20:20:54Z

keras_hub/src/models/mixtral/mixtral_backbone.py

+
+    This network implements a Transformer-based decoder network,
+    Mixtral, as described in
+    ["Mixtral 7B"](https://arxiv.org/pdf/2310.06825.pdf).


The reference provided here is for Misral not Mixtral, add the correct reference.

sachinprasadhs · 2025-04-29T20:56:21Z

keras_hub/src/models/mixtral/mixtral_decoder.py

+    router_logits, num_experts, top_k, attention_mask=None
+):
+    """
+    Compute the load balancing auxiliary loss for a single MoE layer.


This should be in the same line after """, and then a new blank line before Args.

sachinprasadhs · 2025-04-29T21:02:57Z

keras_hub/src/models/mixtral/mixtral_layer_norm.py

+from keras import ops
+
+
+# TODO: Deprecate this in favor of


We don't support Keras 2 anymore in Keras Hub, I guess you can get rid of this

sachinprasadhs · 2025-04-29T21:52:39Z

keras_hub/src/models/mixtral/mixtral_decoder.py

+            # Below is a workaround for `ops.triu` for Keras 2.
+            # TODO(tirthasheshpatel): Use `ops.triu` once Keras 2 support is
+            # removed.
+            # causal_mask = ops.triu(causal_mask, k=-self.sliding_window)


Keras 2 support is removed now, you can enable this

sachinprasadhs · 2025-04-29T22:03:10Z

keras_hub/src/models/mixtral/mixtral_causal_lm_preprocessor_test.py

+class MixtralCausalLMPreprocessorTest(TestCase):
+    def setUp(self):
+        self.tokenizer = MixtralTokenizer(
+            # Generated using create_mixtral_test_proto.py


This file is missing.

kanpuriyanawab added 6 commits April 2, 2025 17:35

mistral init commit

90e7c59

wip mixtral

43764fc

mixtral wip

b509c48

checkpoint conversion wip

b0160cb

mixtral weight matching complete

b9bc2e3

batched moe impl

d5aee61

kanpuriyanawab marked this pull request as ready for review April 10, 2025 08:40

kanpuriyanawab and others added 5 commits April 11, 2025 16:11

output matching with batched moe complete

3597d53

update

0c73de0

Merge branch 'keras-team:master' into mixtral

e554b13

flash attention fixes

ff5f4b1

bug fixes

1dba1a3

kanpuriyanawab requested a review from divyashreepathihalli April 14, 2025 05:41

kanpuriyanawab self-assigned this Apr 14, 2025

bug fix

71d7401

divyashreepathihalli reviewed Apr 15, 2025

View reviewed changes

sachinprasadhs added the stat:awaiting response from contributor label Apr 25, 2025

Merge branch 'master' into mixtral

08db45e

sachinprasadhs reviewed Apr 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mixtral #2196

Add Mixtral #2196

kanpuriyanawab commented Apr 2, 2025 •

edited

Loading

kanpuriyanawab commented Apr 14, 2025

divyashreepathihalli left a comment

divyashreepathihalli Apr 15, 2025

divyashreepathihalli Apr 15, 2025

divyashreepathihalli Apr 15, 2025

divyashreepathihalli Apr 15, 2025

kanpuriyanawab Apr 16, 2025

sachinprasadhs Apr 25, 2025

sachinprasadhs Apr 25, 2025

kanpuriyanawab Apr 28, 2025

kanpuriyanawab Apr 28, 2025

kanpuriyanawab commented Apr 28, 2025

sachinprasadhs left a comment

sachinprasadhs Apr 29, 2025

sachinprasadhs Apr 29, 2025

sachinprasadhs Apr 29, 2025

sachinprasadhs Apr 29, 2025

sachinprasadhs Apr 29, 2025

Add Mixtral #2196

Are you sure you want to change the base?

Add Mixtral #2196

Conversation

kanpuriyanawab commented Apr 2, 2025 • edited Loading

kanpuriyanawab commented Apr 14, 2025

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanpuriyanawab commented Apr 28, 2025

sachinprasadhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanpuriyanawab commented Apr 2, 2025 •

edited

Loading