Add resumable model download with retry, timeout, and offline mode by janhilgard · Pull Request #77 · waybarrios/vllm-mlx

janhilgard · 2026-02-12T18:46:37Z

Summary

Adds a pre-download step with configurable retry (exponential backoff) and timeout before load_model() is called, so interrupted downloads of large models can be resumed
New CLI flags for serve: --download-timeout, --download-retries, --offline
New standalone subcommand: vllm-mlx download <model> for pre-warming HF caches (useful for CI/CD)
Replaces direct snapshot_download() call in tokenizer fallback path with the new retry-aware wrapper

Motivation

Addresses #75 — HuggingFace downloads hang or fail around 10GB for large models with no way to resume.

Usage

# Download model to cache without starting server
vllm-mlx download mlx-community/Qwen3-Next-80B-A3B-Instruct-6bit

# Serve with custom retry/timeout
vllm-mlx serve <model> --download-timeout 600 --download-retries 5

# Offline mode (only locally cached models)
vllm-mlx serve <model> --offline

Test plan

12 unit tests pass (pytest tests/test_download.py -v)
Manual test: vllm-mlx download mlx-community/Qwen3-0.6B-4bit succeeds
Manual test: nonexistent model fails with clear error message after retries
ruff check and black pass on all changed files

🤖 Generated with Claude Code

waybarrios · 2026-02-13T04:51:40Z

@janhilgard
For next time, could you please organize your commits a bit better? Having so many commits in a single PR makes it difficult to review the changes. I recommend squashing them all into one commit for this and future PRs

janhilgard · 2026-02-13T08:49:57Z

You're right, sorry about that! I've squashed everything into a single clean commit now.

Large model downloads via huggingface_hub often hang or fail around 10GB. This adds a pre-download step with configurable retry/timeout before load_model() is called, so interrupted downloads can be resumed. New CLI flags for `serve`: --download-timeout, --download-retries, --offline New subcommand: `vllm-mlx download <model>` for pre-warming caches Closes waybarrios#75 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

janhilgard force-pushed the feat/resumable-download branch from 47e726b to 5b9db2b Compare February 13, 2026 08:48

janhilgard force-pushed the feat/resumable-download branch from 5b9db2b to ee5d6be Compare February 13, 2026 08:51

janhilgard force-pushed the feat/resumable-download branch from 8e75792 to a510953 Compare February 15, 2026 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add resumable model download with retry, timeout, and offline mode#77

Add resumable model download with retry, timeout, and offline mode#77
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:feat/resumable-download

janhilgard commented Feb 12, 2026

Uh oh!

waybarrios commented Feb 13, 2026

Uh oh!

janhilgard commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

janhilgard commented Feb 12, 2026

Summary

Motivation

Usage

Test plan

Uh oh!

waybarrios commented Feb 13, 2026

Uh oh!

janhilgard commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants