Skip to content

feat: add sharding support for mlx-lm models #5

@andthattoo

Description

@andthattoo

Integrate all MLX-LM model architectures with proper sharding augmentations for distributed inference in dnet.

Priority based on production deployments, HuggingFace downloads, and benchmark performance

  • gpt_oss
  • deepseek_v2
  • deepseek_v3
  • llama
  • llama4
  • qwen3
  • qwen3_moe
  • qwen3_next
  • qwen2
  • qwen2_moe
  • internlm3
  • gemma3
  • gemma3_text
  • gemma3n
  • glm4
  • glm4_moe
  • olmo2
  • olmo3

Metadata

Metadata

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions