Skip to content

Implement sharded model support in model_manager#1348

Closed
james-martinez wants to merge 8 commits intolemonade-sdk:mainfrom
james-martinez:feature/gguf-support
Closed

Implement sharded model support in model_manager#1348
james-martinez wants to merge 8 commits intolemonade-sdk:mainfrom
james-martinez:feature/gguf-support

Conversation

@james-martinez
Copy link
Copy Markdown
Contributor

Added support for sharded model variants by automatically discovering and adding remaining parts to the download queue.

This adds support to download a particular quantization when all the gguf files are in 1 directory.

checkpoint would be NVIDIA-Nemotron-3-Super-120B-A12B:Q4_K_M
Rule 3 (Original): It checks for exactly Q4_K_M.gguf. -> Fails.

added rule : It checks if the repo contains any file matching Q4_K_M-00001-of-. -> Success! It finds NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf and sets it as the primary file.

Rule 4 (Folder check): It completely skips this step because it already found a match in Rule 3.

The New Bottom Logic: It inspects the primary file, sees it spans 3 parts, and automatically adds 00002 and 00003 to the download queue.

example
https://huggingface.co/lmstudio-community/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/tree/main

contains
Q4_K_M Quantization
NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00002-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00003-of-00003.gguf

Q6_K Quantization
NVIDIA-Nemotron-3-Super-120B-A12B-Q6_K-00001-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q6_K-00002-of-00003.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q6_K-00003-of-00003.gguf

Q8_0 Quantization
NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00001-of-00004.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00002-of-00004.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00003-of-00004.gguf

NVIDIA-Nemotron-3-Super-120B-A12B-Q8_0-00004-of-00004.gguf

Added support for sharded model variants by automatically discovering and adding remaining parts to the download queue.
@james-martinez james-martinez marked this pull request as ready for review March 12, 2026 04:33
@jeremyfowers
Copy link
Copy Markdown
Member

@claude please review. In particular, does this PR have any unaddressed corner cases?

@github-actions
Copy link
Copy Markdown
Contributor

Claude Code is working…

I'll analyze this and get back to you.

View job run

@james-martinez
Copy link
Copy Markdown
Contributor Author

@claude please review. In particular, does this PR have any unaddressed corner cases?

I'd like to address all cases. Focus on using a particular gguf quant in any folder

@james-martinez james-martinez deleted the feature/gguf-support branch April 12, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants