Skip to content

Comments

Add resumable model download with retry, timeout, and offline mode#77

Open
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:feat/resumable-download
Open

Add resumable model download with retry, timeout, and offline mode#77
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:feat/resumable-download

Conversation

@janhilgard
Copy link
Collaborator

Summary

  • Adds a pre-download step with configurable retry (exponential backoff) and timeout before load_model() is called, so interrupted downloads of large models can be resumed
  • New CLI flags for serve: --download-timeout, --download-retries, --offline
  • New standalone subcommand: vllm-mlx download <model> for pre-warming HF caches (useful for CI/CD)
  • Replaces direct snapshot_download() call in tokenizer fallback path with the new retry-aware wrapper

Motivation

Addresses #75 — HuggingFace downloads hang or fail around 10GB for large models with no way to resume.

Usage

# Download model to cache without starting server
vllm-mlx download mlx-community/Qwen3-Next-80B-A3B-Instruct-6bit

# Serve with custom retry/timeout
vllm-mlx serve <model> --download-timeout 600 --download-retries 5

# Offline mode (only locally cached models)
vllm-mlx serve <model> --offline

Test plan

  • 12 unit tests pass (pytest tests/test_download.py -v)
  • Manual test: vllm-mlx download mlx-community/Qwen3-0.6B-4bit succeeds
  • Manual test: nonexistent model fails with clear error message after retries
  • ruff check and black pass on all changed files

🤖 Generated with Claude Code

@waybarrios
Copy link
Owner

@janhilgard
For next time, could you please organize your commits a bit better? Having so many commits in a single PR makes it difficult to review the changes. I recommend squashing them all into one commit for this and future PRs

@janhilgard janhilgard force-pushed the feat/resumable-download branch from 47e726b to 5b9db2b Compare February 13, 2026 08:48
@janhilgard
Copy link
Collaborator Author

You're right, sorry about that! I've squashed everything into a single clean commit now.

@janhilgard janhilgard force-pushed the feat/resumable-download branch from 5b9db2b to ee5d6be Compare February 13, 2026 08:51
Large model downloads via huggingface_hub often hang or fail around 10GB.
This adds a pre-download step with configurable retry/timeout before
load_model() is called, so interrupted downloads can be resumed.

New CLI flags for `serve`: --download-timeout, --download-retries, --offline
New subcommand: `vllm-mlx download <model>` for pre-warming caches

Closes waybarrios#75

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@janhilgard janhilgard force-pushed the feat/resumable-download branch from 8e75792 to a510953 Compare February 15, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants