Add Moonshine to KerasHub #2093

harshaljanjani · 2025-02-12T04:53:16Z

Moonshine ASR Model Implementation in Keras

This PR introduces the Moonshine Automatic Speech Recognition (ASR) model into the Keras ecosystem. The Moonshine model, originally developed by UsefulSensors and available via Hugging Face, is a transformer-based architecture designed to transcribe audio inputs into text. This implementation ports the model into Keras, complete with support for pre-trained weights from Hugging Face.

Overview

The Moonshine ASR model employs an encoder-decoder architecture. The encoder processes audio features, while the decoder generates text transcriptions. This implementation includes custom layers and components to mirror the original model's behavior, validated against the Hugging Face version for accuracy.

Files Added

The following files have been added to implement the Moonshine ASR model:

moonshine_backbone.py defines the MoonshineBackbone class, the core of the model. It integrates the encoder and decoder blocks, embeddings, and layer normalization, forming the complete encoder-decoder pipeline.
moonshine_decoder.py contains the MoonshineDecoderBlock class, a custom decoder block with self-attention (causal), cross-attention, and feedforward layers. It supports caching for efficient generation and uses SwiGLU activation by default.
moonshine_encoder.py implements the MoonshineEncoderBlock class, the encoder component with self-attention and feedforward layers. It optionally uses SwiGLU activation, matching the original model's configuration.
moonshine_multi_head_attention.py provides a custom multi-head attention layer, the MoonshineMultiHeadAttention class, that is used in three ways:
- Decoder Self-Attention: Causal, rotary embeddings, and dynamic caching (essentially, self-attention caches grow with sequence length.
- Decoder Cross-Attention: Non-causal, no rotary embeddings, and static caching.
- Encoder Self-Attention: Non-causal, rotary embeddings, and no caching.
moonshine_layers.py includes utility layers, which are:
- MoonshineRotaryEmbedding: Rotary positional embeddings with dynamic scaling support.
- MoonshineMLP: Can be configured to use SwiGLU activation for feedforward networks or as a linear layer with GeLU activation.
moonshine_audio_converter.py implements the MoonshineAudioConverter class, a specialized audio preprocessing layer that converts raw audio waveforms into feature representations suitable for the Moonshine ASR model. It includes downsampling and feature extraction, normalization, and handling of attention masks.
moonshine_tokenizer.py provides the MoonshineTokenizer class, which extends the LlamaTokenizer to handle text tokenization for the Moonshine model. It incorporates Moonshine-specific special tokens, including position embedding tokens, hex tokens, and empty tokens, and manages the conversion between raw text and token IDs.
moonshine_audio_to_text.py implements the MoonshineAudioToText class, a task model that extends the Seq2SeqLM base class. This class integrates the audio converter, backbone, and tokenizer components to create a complete end-to-end ASR pipeline. It includes methods for text generation from audio inputs, with support for customizable generation parameters and built-in trimming of output sequences.
Weights Conversion Script
- Converts pre-trained weights from Hugging Face into a Keras-compatible format.
- Loads them into the MoonshineBackbone model.
- Validates the Keras implementation by comparing outputs with the Hugging Face model using random inputs.

Dependencies

Keras 3: Required for backend-agnostic operations.
Hugging Face Transformers: Needed by the weights conversion script for loading the original model.
Librosa: Required for audio processing.

Notes for Reviewers

The implementation is fully functional with pre-trained weights and ready for immediate use.
The modular design allows for easy extension or modification of individual components (e.g., attention layers or embeddings).
All custom layers are serializable with get_config() and registered with @keras.saving.register_keras_serializable, ensuring compatibility with Keras model saving/loading.
End-To-End Demo and Component Functionality Validation Notebook: Colab Notebook.

Closes issue #2083.

divyashreepathihalli

Thank you for the PR! I left some initial comments.
I would suggest following the format, structure and naming conventions similar to teh Whisper model here - https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/whisper

add docstrings
convert backbone to a functional model
add a moonshine_audio_converter.py
Add a numerics verification colab to verify the implementation

keras_hub/src/models/moonshine/moonshine_backbone.py

keras_hub/src/models/moonshine/moonshine_backbone_test.py

keras_hub/src/models/moonshine/moonshine_utils.py

keras_hub/src/models/moonshine/moonshine_encoder.py

harshaljanjani · 2025-02-12T18:31:10Z

Will make the changes at the earliest, thanks for the review!

keras_hub/src/models/moonshine/moonshine_layers.py

divyashreepathihalli · 2025-02-18T03:11:53Z

you will need to run shell/api_gen.sh and also shell/format.sh at root to resolve the code formatting error

harshaljanjani · 2025-02-18T04:05:31Z

Thanks for the review, made the changes! The issue regarding the build still persists.

harshaljanjani · 2025-02-19T08:22:27Z

Summary of Changes:

Added MoonshineDecoderBlock (passes numeric checks, facing a few issues in the reversible embeddings, which keeps me from integrating the whole decoder, but I'll try to fix that and get back).
Made a testable component for the encoder subclassed from keras.Model separate from the MoonshineBackbone class, as it's easier to test loading weights this way since each of the preprocessor, decoder and encoder has separate weight files.

harshaljanjani · 2025-02-22T10:30:56Z

TODO:

Verify the build methods, as the sanity checks for serialization don’t pass, even though the numerics are aligned.
Write weight conversion scripts.

harshaljanjani · 2025-02-24T18:35:40Z

Status of the PR:
Weight assignment works, but the numerics differ.

Outputs of the convert_moonshine_checkpoints.py script:

MD5 Checksum Comparison
Decoder Weights Assignment
Preprocessor Weights Assignment
Encoder Weights Assignment

keras_hub/src/models/moonshine/moonshine_audio_converter_test.py

keras_hub/src/models/moonshine/moonshine_backbone.py

keras_hub/src/models/moonshine/moonshine_backbone_test.py

keras_hub/src/models/moonshine/moonshine_decoder.py

mattdangerw · 2025-03-13T20:13:57Z

What does librosa buy us? We definitely can't add it as a hard dependency. Most people using KerasHub will not be using it for audio modeling today, so adding a hard dep for them will just create headaches.

However, we could add a conditional import of librosa in our API -- attempt to use moonshine in KerasHub, get an error message asking for librosa. And of course if we just need this for data preparation outside of our API, that is easiest, just use librosa in any guides and example (but ultimately leave it up to the user).

harshaljanjani · 2025-03-13T20:23:10Z

@mattdangerw, that sounds about right, a conditional import seems like the way to go.
Additionally, I've written a generate() method as part of a MoonshineForConditionalGeneration task model. It looks something like this, I'll be updating the PR and the Colab Notebook soon with results from the end-to-end example using the task model.

…kbone to fix test cases

… dict for inputs in the backbone and task model

…sary classes, and improve test robustness

…taryEmbedding by using Constant initializer for inv_freq

- Added both preset configurations in weights conversion script. - Verified edge cases and error handling. - Implemented attention/caching speedup for improved inference performance. - Finalized API docstrings.

…lving issues with model saving

harshaljanjani · 2025-03-24T04:36:44Z

I've incorporated all review feedback and completed the following updates:

Added component-wise tests.
Finalized all docstrings.
Ensured our generate() API matches Hugging Face's generate() API functionality and incorporated caching for speedup.
Cleaned up the codebase.
Added comprehensive test cases covering all features.
Included both the task model and an end-to-end example in the PR/Notebook.

Classes Merged/Removed:

MoonshinePrecomputed + MoonshineCausal + MoonshineMultiHead → MoonshineMultiHeadAttention.
MoonshineSwiGLU + MoonshineLinearGeLU → MoonshineMLP.
Removed MoonshineInvFreqInitializer and MoonshineArange while ensuring functional equivalence.

Additionally, there's a bug in Hugging Face's implementation of the tiny preset, which has been corrected in the Keras Hub version. I suspect the issue lies in the application of rotary positional embeddings in the HF implementation, in any case, the following code cell demonstrates the bug, and the notebook examples for each backend show how the Keras Hub implementation's tiny preset overcomes this bug for the same audio sample: Code cell showing HF bug for the tiny preset. I've chosen five samples, including the buggy one mentioned above, that have been tested for all backends and both presets in the notebook.

All bells and whistles are covered with the new test suites, caching, forward passes, standardized tests, training flows, and beyond.

Notebook: Colab Notebook

I've also shared the model weights for each preset so you can test them yourself, just as a Keras Hub user would (this would be the end-to-end example you asked about, @divyashreepathihalli and @mattdangerw):
Notebook Cell
Model Weights

I’d love to make a few updates based on your reviews and wrap up the project!

mattdangerw · 2025-03-25T22:00:41Z

@harshaljanjani Thanks! I'll take a look more closely soon, but I think at a high-level this is too close to huggingface's abstractions and not quite congruous with KerasHub's yet. We probably want to keep this most closely modeled off the Seq2Seq task we have in KerasHub today. We might want a new name for this AudioToText or Transcribe, but we would probably expect the following use cases roughly...

audio_tensor = load_audio_tensor_with_any_lib()
audio_batch = ... # With a batch dim
audio_dataset = ... # Paired audio tensors and strings as a tf.data.Dataset.

# Load model arch and preprocessing.
audio_to_text = keras_hub.models.AudioToText.from_preset(
    "moonshine_preset_name_blah"
)
# Equivalent, no auto class functionality.
audio_to_text = keras_hub.models.MoonshineAudioToText.from_preset(
    "moonshine_preset_name_blah"
)
# Direct string output!
audio_to_text.generate(audio_tensor)
# List of strings output!
audio_to_text.generate(audio_batch)

# Change the generation sampler and regenerate
audio_to_text.compile(sampler="top_k")
audio_to_text.generate(audio_tensor)
audio_to_text.compile(sampler=keras_hub.samplers.Greedy())
audio_to_text.generate(audio_tensor)

# Fine-tune with a dataset dataset!
audio_to_text.compile(optimizer=...)
audio_to_text.enable_lora(4)  # Optional.
audio_to_text.fit(audio_dataset)

# Set max sequence length for encoder and decoder inputs.
audio_to_text.preprocessor.encoder_sequence_length = 1024
audio_to_text.preprocessor.decoder_sequence_length = 512
audio_to_text.generate(audio_tensor)

# Strip preprocessing form the generated function entirely.
preprocessor = audio_to_text.preprocessor
audio_to_text.preprocessor = None
# Run preprocessing separately.
preprocessed_batch = preprocessor.generate_preprocess(audio_batch)
# Returns a token id tensor!
generated_batch = audio_to_text.generate(preprocessed_batch)
# Converts to strings!
preprocessor.generate_preprocess(generated_batch)

I'd maybe start by trying to move the generation from this huggingface port to our infra. Can you compile() your task with a keras_hub.samplers object and change the generation strategy? When we get there we are getting on the right track.

…date conversion script)

… changes related to self_attention_cache_update_index to compare implementations apples-to-apples

harshaljanjani · 2025-04-02T14:17:47Z

Please do look into it regarding the issue, thanks!

…the issue still persists

…asHub; issue still persists

… issue still persists

divyashreepathihalli · 2025-04-10T01:45:17Z

@harshaljanjani caching logic is complicated. It is out of scope for the maintainers to debug contribution code. Please take your time to debug the model to get matching outputs with that of the HF model.

… the PyTorch backend, integrated into the KerasHub infra!

harshaljanjani and others added 8 commits February 10, 2025 21:10

init: Add MoonshineBackbone files

8037ed0

feat: Make backbone test suite more robust

51a40b8

feat: Exactness to the original and robustness of test cases

098781e

fix: Support stacked encoder layers from original implementation

047de1f

TODO: Fix layer names

885f77f

fix: Add __init__ file

Loading
Loading status checks…

805a806

Merge branch 'master' into moonshine

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

9f579c0

fix: Correct subclassing and make ops more robust

Loading
Loading status checks…

aebeac7

divyashreepathihalli requested changes Feb 12, 2025

View reviewed changes

feat: Incorporate feedback for Moonshine

Loading
Loading status checks…

60112d5

harshaljanjani requested a review from divyashreepathihalli February 16, 2025 15:37

divyashreepathihalli reviewed Feb 17, 2025

View reviewed changes

keras_hub/src/models/moonshine/moonshine_layers.py Outdated Show resolved Hide resolved

refactor: Move super.build() calls to the beginning of build() functions

Loading
Loading status checks…

10cff1e

divyashreepathihalli requested a review from JyotinderSingh February 18, 2025 03:04

fix: Resolve API issue and fix duplicate parameters in attention

Loading
Loading status checks…

8dac22f

harshaljanjani self-assigned this Feb 19, 2025

init: Add MoonshineDecoderBlock files (TODO: MoonshineDecoder)

Loading
Loading status checks…

2bacaf2

harshaljanjani added 4 commits February 20, 2025 10:58

feat: Add MoonshineDecoder with questionable tolerance

Loading
Loading status checks…

3af8498

fix: Fix decoder numerics (TODO: serialization and tokenizer)

Loading
Loading status checks…

2a2fcb9

feat: Add Tokenizer and SentencePiece model files

Loading
Loading status checks…

e05d1ed

refactor: API modification and temporarily removed TestCase

Loading
Loading status checks…

9130d2c

chore: Update HF params (TODO: Resolve numerics issue)

Loading
Loading status checks…

b4e1ae9

harshaljanjani requested a review from divyashreepathihalli February 24, 2025 18:40

divyashreepathihalli reviewed Feb 24, 2025

View reviewed changes

harshaljanjani and others added 9 commits March 14, 2025 11:25

feat: Add trainable conditional generation task model, fix nits

Loading
Loading status checks…

e993ead

refactor: Reformat MoonshineForConditionalGeneration and MoonshineBac…

Loading
Loading status checks…

34ea915

…kbone to fix test cases

Merge branch 'keras-team:master' into moonshine

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

5599073

may fix JAX (Keras 3.5) backend tests: Update input handling to use a…

Loading
Loading status checks…

f57fcd1

… dict for inputs in the backbone and task model

cleanup: Merge MoonshineAttention into a single class, remove unneces…

Loading
Loading status checks…

c9e4d76

…sary classes, and improve test robustness

may fix JAX (Keras 3.5) backend: Fix initializer error in MoonshineRo…

Loading
Loading status checks…

80b1d9d

…taryEmbedding by using Constant initializer for inv_freq

refactor: Apply BART-inspired structural changes and optimize generate()

Loading
Loading status checks…

18c06ef

bug fix: Fix the build() method in MoonshineAudioConverter, thus reso…

Loading
Loading status checks…

d719ca2

…lving issues with model saving

harshaljanjani requested review from divyashreepathihalli and mattdangerw March 24, 2025 04:52

divyashreepathihalli added the kokoro:force-run label Mar 24, 2025

kokoro-team removed the kokoro:force-run label Mar 24, 2025

harshaljanjani mentioned this pull request Mar 27, 2025

fix(export+models): Enhance support for dictionary-based model input signatures in TensorFlow and JAX keras-team/keras#20842

Closed

harshaljanjani added 3 commits March 31, 2025 11:35

task: Complete rewrite of the generation strategy in one go (TODO: up…

Loading
Loading status checks…

578c7d0

…date conversion script)

feat: Update weights conversion script

Loading
Loading status checks…

0705d58

revert: Leave comments in the code for next review and revert caching…

Loading
Loading status checks…

3224a28

… changes related to self_attention_cache_update_index to compare implementations apples-to-apples

harshaljanjani added 3 commits April 3, 2025 23:17

fix nits: Add warnings and the missing decoder_attention_mask param; …

Loading
Loading status checks…

f5541f4

…the issue still persists

another refactor: Single caching strategy, easily integrable into Ker…

Loading
Loading status checks…

63a457f

…asHub; issue still persists

fix nits: Remove unused encoder packer init, re-enable MHA tests; the…

Loading
Loading status checks…

f961a06

… issue still persists

JyotinderSingh removed their request for review April 9, 2025 15:25

sachinprasadhs added the WIP label Apr 11, 2025

harshaljanjani marked this pull request as ready for review April 12, 2025 12:25

hooraayyy: The tests are yet to be fixed, but the task model works on…

Loading
Loading status checks…

4f53d78

… the PyTorch backend, integrated into the KerasHub infra!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Moonshine to KerasHub #2093

Add Moonshine to KerasHub #2093

harshaljanjani commented Feb 12, 2025 •

edited

Loading

divyashreepathihalli left a comment

harshaljanjani commented Feb 12, 2025

divyashreepathihalli commented Feb 18, 2025

harshaljanjani commented Feb 18, 2025

harshaljanjani commented Feb 19, 2025

harshaljanjani commented Feb 22, 2025

harshaljanjani commented Feb 24, 2025

mattdangerw commented Mar 13, 2025

harshaljanjani commented Mar 13, 2025

harshaljanjani commented Mar 24, 2025 •

edited

Loading

mattdangerw commented Mar 25, 2025 •

edited

Loading

harshaljanjani commented Apr 2, 2025 •

edited

Loading

divyashreepathihalli commented Apr 10, 2025

Add Moonshine to KerasHub #2093

Are you sure you want to change the base?

Add Moonshine to KerasHub #2093

Conversation

harshaljanjani commented Feb 12, 2025 • edited Loading

Moonshine ASR Model Implementation in Keras

Overview

Files Added

Dependencies

Notes for Reviewers

divyashreepathihalli left a comment

Choose a reason for hiding this comment

harshaljanjani commented Feb 12, 2025

divyashreepathihalli commented Feb 18, 2025

harshaljanjani commented Feb 18, 2025

harshaljanjani commented Feb 19, 2025

harshaljanjani commented Feb 22, 2025

harshaljanjani commented Feb 24, 2025

mattdangerw commented Mar 13, 2025

harshaljanjani commented Mar 13, 2025

harshaljanjani commented Mar 24, 2025 • edited Loading

mattdangerw commented Mar 25, 2025 • edited Loading

harshaljanjani commented Apr 2, 2025 • edited Loading

divyashreepathihalli commented Apr 10, 2025

harshaljanjani commented Feb 12, 2025 •

edited

Loading

harshaljanjani commented Mar 24, 2025 •

edited

Loading

mattdangerw commented Mar 25, 2025 •

edited

Loading

harshaljanjani commented Apr 2, 2025 •

edited

Loading