-
Notifications
You must be signed in to change notification settings - Fork 30
Apriel SSM/Hybrid #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Apriel SSM/Hybrid #258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great addition!
Please not that @bigximik is working on a first-class integration of lm-eval harness into Fast-LLM, both during training and as standalone command (fast-llm evaluate
). As part of this work, we will also be able to run generate()
for any Fast-LLM model directly (without materializing any converted HF checkpoint first). Long-term, we will want to integrate SSMs and hybrids into that workflow, too.
✨ Description
This pr improves some minor things in SSM/Hybrid classes, adds functionality for loading and exporting Apriel SSM and hybrid SSM models (adds corresponding modeling.py classes), adds
embeddings_lr_scale
argument🔍 Type of change
Select all that apply:
📝 Changes
List the key changes introduced in this PR:
embeddings_lr_scale
output_lr_scale
✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Dependencies and Configuration
Testing
Performance Impact
📊 Performance Impact Details
🗒️ Additional Notes