add more models #57

GandalfTea · 2025-11-25T10:43:38Z

Summary

Add support for new model architectures:

Olmo3
GLM4
Qwen3-MoE

Changes

 src/dnet/core/models/__init__.py  |   6 +++
 src/dnet/core/models/glm4.py      | 119 ++++++++++++++++++++++++++++++++++++++++++
 src/dnet/core/models/olmo3.py     | 120 ++++++++++++++++++++++++++++++++++++++++++
 src/dnet/core/models/qwen3_moe.py | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 432 insertions(+)

Testing

  Olmo 3
! mlx-community/Olmo-3-1025-7B-4bit          [FAIL] (junk output) 
+ mlx-community/Olmo-3-7B-Think-4bit
+ mlx-community/Olmo-3-7B-Think-SFT-4bit
+ mlx-community/Olmo-3-7B-Instruct-4bit
+ mlx-community/Olmo-3-7B-Instruct-SFT-4bit

+ mlx-community/Olmo-3-1025-7B-8bit
+ mlx-community/Olmo-3-7B-Think-8bit
+ mlx-community/Olmo-3-7B-Think-SFT-8bit
+ mlx-community/Olmo-3-7B-Instruct-8bit
+ mlx-community/Olmo-3-7B-Instruct-SFT-8bit

! mlx-community/Olmo-3-7B-Instruct-bf16.        [FAIL] (bf16 fails sampling)
! mlx-community/Olmo-3-7B-Instruct-SFT-bfloat16 [FAIL] 
! mlx-community/Olmo-3-7B-Think-bfloat16        [FAIL] 
! mlx-community/Olmo-3-7B-Think-SFT-bfloat16
! mlx-community/Olmo-3-1025-7B-bfloat16         [FAIL] 

+ mlx-community/Olmo-3-1125-32B-4bit
+ mlx-community/Olmo-3-1125-32B-8bit


  GLM
+ mlx-community/GLM-4-9B-0414-4bit
+ mlx-community/GLM-Z1-9B-0414-4bit
+ mlx-community/GLM-4-9B-0414-8bit 
+ mlx-community/GLM-Z1-9B-0414-8bit
+ mlx-community/GLM-4-32B-0414-4bit
+ mlx-community/GLM-Z1-32B-0414-4bit
! mlx-community/GLM-Z1-9B-0414-bf16 [FAIL] (failed sampling)
! mlx-community/GLM-4-9B-0414-bf16  [FAIL] (failed sampling)

  Qwen3-MoE
+ mlx-community/Qwen3-30B-A3B-4bit
+ mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
+ mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit

Also modified existent catalogue entries:

- mlx-community/Qwen3-8B-bf16 (failed sampling)

Dependencies

Commit is dependent on distilp PRs: firstbatchxyz/distilp#18 and firstbatchxyz/distilp#17

GandalfTea · 2025-11-25T15:54:24Z

mlx-community/GLM-Z1-9B-0414-bf16 fails with:

2025-11-25 07:52:38,633 - dnet - ERROR - fit_in_memory.py:164 - End-shard sampling failed: [matmul] Last dimension of first input with shape (1,14,4096) must match second to last dimension of second input with shape (151552,4096).

andthattoo · 2025-11-25T16:40:01Z

Models are now tracked within catalogue. See catalog

GandalfTea · 2025-11-25T18:07:09Z

Added the working models to the catalogue. I'll look into 6bit and bf16 quantization problems

GandalfTea force-pushed the oto/add-models branch 2 times, most recently from bb41004 to a282bef Compare November 25, 2025 14:54

GandalfTea marked this pull request as ready for review November 25, 2025 18:07

GandalfTea marked this pull request as draft November 26, 2025 05:08

GandalfTea added 5 commits November 26, 2025 21:54

add support for qwen3_moe model

76162fc

add support for glm4 model

16bd540

add olmo3 model

0adfb07

ruff format

37e7ebe

add working models to catalog

069aa50

GandalfTea force-pushed the oto/add-models branch from ae1e76a to a2e676b Compare November 27, 2025 08:26

GandalfTea added 2 commits November 27, 2025 00:34

fix qwen3_moe switch_glu object

451190c

update catalogue

d04d04e

GandalfTea force-pushed the oto/add-models branch from a2e676b to d04d04e Compare November 27, 2025 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add more models #57

add more models #57

Uh oh!

GandalfTea commented Nov 25, 2025 •

edited

Loading

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

andthattoo commented Nov 25, 2025

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add more models #57

Are you sure you want to change the base?

add more models #57

Uh oh!

Conversation

GandalfTea commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Dependencies

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

andthattoo commented Nov 25, 2025

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GandalfTea commented Nov 25, 2025 •

edited

Loading