Implement sharded model support in model_manager by james-martinez · Pull Request #1348 · lemonade-sdk/lemonade

james-martinez · 2026-03-12T04:31:32Z

Added support for sharded model variants by automatically discovering and adding remaining parts to the download queue.

This adds support to download a particular quantization when all the gguf files are in 1 directory.

checkpoint would be NVIDIA-Nemotron-3-Super-120B-A12B:Q4_K_M
Rule 3 (Original): It checks for exactly Q4_K_M.gguf. -> Fails.

added rule : It checks if the repo contains any file matching Q4_K_M-00001-of-. -> Success! It finds NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf and sets it as the primary file.

Rule 4 (Folder check): It completely skips this step because it already found a match in Rule 3.

The New Bottom Logic: It inspects the primary file, sees it spans 3 parts, and automatically adds 00002 and 00003 to the download queue.

example
https://huggingface.co/lmstudio-community/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/tree/main

contains
Q4_K_M Quantization
NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00002-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00003-of-00003.gguf

Q6_K Quantization
NVIDIA-Nemotron-3-Super-120B-A12B-Q6_K-00001-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q6_K-00002-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q6_K-00003-of-00003.gguf

Q8_0 Quantization
NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00001-of-00004.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00002-of-00004.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00003-of-00004.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00004-of-00004.gguf

Added support for sharded model variants by automatically discovering and adding remaining parts to the download queue.

jeremyfowers · 2026-03-13T14:54:15Z

@claude please review. In particular, does this PR have any unaddressed corner cases?

github-actions · 2026-03-13T14:54:28Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

james-martinez · 2026-03-13T17:26:12Z

@claude please review. In particular, does this PR have any unaddressed corner cases?

I'd like to address all cases. Focus on using a particular gguf quant in any folder

Implement sharded model support in model_manager

2b4e19f

Added support for sharded model variants by automatically discovering and adding remaining parts to the download queue.

james-martinez marked this pull request as ready for review March 12, 2026 04:33

james-martinez added 4 commits March 12, 2026 11:29

Merge branch 'main' into feature/gguf-support

c3ceb2d

Merge branch 'main' into feature/gguf-support

ac98476

Merge branch 'main' into feature/gguf-support

1960a64

Merge branch 'main' into feature/gguf-support

c6d0cd3

Merge branch 'main' into feature/gguf-support

97ad381

james-martinez added 2 commits March 13, 2026 22:40

Merge branch 'main' into feature/gguf-support

ffd0e5d

Merge branch 'main' into feature/gguf-support

292cbcb

james-martinez closed this Apr 12, 2026

james-martinez deleted the feature/gguf-support branch April 12, 2026 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sharded model support in model_manager#1348

Implement sharded model support in model_manager#1348
james-martinez wants to merge 8 commits intolemonade-sdk:mainfrom
james-martinez:feature/gguf-support

james-martinez commented Mar 12, 2026

Uh oh!

jeremyfowers commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

james-martinez commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

james-martinez commented Mar 12, 2026

Uh oh!

jeremyfowers commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

james-martinez commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants