Apriel SSM/Hybrid #258

oleksost · 2025-05-09T13:03:06Z

✨ Description

This pr improves some minor things in SSM/Hybrid classes, adds functionality for loading and exporting Apriel SSM and hybrid SSM models (adds corresponding modeling.py classes), adds embeddings_lr_scale argument

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Add mdoeling.py classes for Apriel SSM and hybrid
Import & Export of Apriel SSM and hybrid models
Added embeddings_lr_scale
Added output_lr_scale
Debug parsing of lr_schedule when its provided as a string

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

🗒️ Additional Notes

…ocessing

…ksiy/apriel-ssm

tscholak

That's a great addition!

Please not that @bigximik is working on a first-class integration of lm-eval harness into Fast-LLM, both during training and as standalone command (fast-llm evaluate). As part of this work, we will also be able to run generate() for any Fast-LLM model directly (without materializing any converted HF checkpoint first). Long-term, we will want to integrate SSMs and hybrids into that workflow, too.

jlamypoirier added 30 commits March 26, 2025 00:10

stuff

5137757

Merge remote-tracking branch 'origin/main' into config_updates

f0cb32a

Update pretrained config

f26010e

stuff

b930a39

Merge branch 'config_updates' into update_pretrained_config

918a7a8

fixes

8117c47

fix

1c995d3

Merge branch 'main' into config_updates

3f90475

Merge branch 'config_updates' into update_pretrained_config

e389058

fixes

506fe92

fixes

971d3ef

Tests wip

6bf20cb

misc

c13fb19

tests

a20fcec

Merge branch 'main' into config_updates

9af26a7

Tests, fixes, remove tuple format

9af372d

fix

dded00a

Merge remote-tracking branch 'origin/main' into config_updates

42d5ca4

fix

986f9f3

Merge branch 'config_updates' into update_pretrained_config

5abc087

fixes

8e3e795

fixes

da6eb7b

Merge branch 'main' into config_updates

67e08aa

Merge branch 'config_updates' into update_pretrained_config

a09e6f3

fix

baad705

Test, fixes

b702837

Knowledge distillation, fix cross-entropy

a8684f8

Fixes, distillation

b781729

fixes

db6504b

Merge remote-tracking branch 'origin/main' into config_updates

7c2933a

jlamypoirier and others added 22 commits May 2, 2025 13:14

fix

9d95064

Merge remote-tracking branch 'origin/main' into reference_model_prepr…

2c96abb

…ocessing

fix

935c470

fix shuffled tokens

9aff3b7

Merge remote-tracking branch 'origin/main' into reference_model_prepr…

d82ddbf

…ocessing

Merge branch 'reference_model_preprocessing' into distillation_loss_mask

6949c49

Merge remote-tracking branch 'origin/main' into distillation_loss_mask

9c105e7

fixes

ae4d111

fixes

deb7ce6

innit like in mamba in llama

eaba34f

embeddings_lr_scale

f8ca122

fix

2db740b

hybrid model loading and exporting

4160b1f

wip

30ad8b8

Merge branch 'main' into oleksiy/apriel-ssm

ea55ae2

Merge remote-tracking branch 'origin/distillation_loss_mask' into ole…

cd4edd5

…ksiy/apriel-ssm

wip

1784dca

nvm

1e3cc28

hybrid modeling

2dc945b

modeling

4277e67

Merge branch 'main' into oleksiy/apriel-ssm

6153c33

nvm

c71cb16

oleksost changed the title ~~Oleksiy/apriel ssm~~ Apriel SSM/Hybrid May 9, 2025

oleksost marked this pull request as draft May 9, 2025 13:06

oleksost added 3 commits May 9, 2025 13:14

output lr scale

be04c19

output_lr_scale

1311f5b

nvm

baf4011

tscholak approved these changes May 9, 2025

View reviewed changes

oleksost requested a review from jlamypoirier May 9, 2025 20:57

eval

6cf26c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apriel SSM/Hybrid #258

Apriel SSM/Hybrid #258

oleksost commented May 9, 2025 •

edited

Loading

tscholak left a comment

Apriel SSM/Hybrid #258

Are you sure you want to change the base?

Apriel SSM/Hybrid #258

Conversation

oleksost commented May 9, 2025 • edited Loading

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

tscholak left a comment

Choose a reason for hiding this comment

oleksost commented May 9, 2025 •

edited

Loading