feat: recipe-aware model compatibility for HF search and cache discovery by ianbmacdonald · Pull Request #1390 · lemonade-sdk/lemonade

ianbmacdonald · 2026-03-17T07:46:44Z

Summary

This builds on the recently added Hugging Face search support for GGUFs and extends search across all backends.

Some of the newer backends are currently pinned to a small set of authors. That still provides value: it can surface models that are not yet in the curated release set, such as newer Qwen3.5 models for FLM at the time of writing. It also gives us room to expand coverage as backend support evolves, for example as whisper.cpp gains support for newer GGUF formats.

Backend	Strategy
llamacpp	`filter=gguf`
sd-cpp	`filter=safetensors,text-to-image`
kokoro	`filter=onnx,text-to-speech`
whispercpp	`author=ggerganov` (pinned)
flm	`author=FastFlowLM` (pinned)
ryzenai-llm	`author=amd` + `filter=onnx`

A helper script, tests/hf_model_tags.py, queries the HF API and reports pipeline_tag plus relevant library, task, and format tags for suggested models or any HF model ID. It is useful for building and refining backend filter logic, especially for evolving image and audio cases.

For example, llama.cpp is currently treated as “all GGUF except unsupported tasks.” If overlap with other GGUF-consuming backends becomes too broad, we can invert that logic and move to a more additive model. Analyzing the current suggested models reveals tag options for a more targeted approach noting 15 models have no pipeline tags upstream.

~/src/lemonade/test$ python3 hf_model_tags.py --summary --llamacpp
...
============================================================
TAG SUMMARY BY RECIPE
============================================================

[llamacpp] (57 models)
   pipeline tags: (none)×15, image-text-to-text, sentence-similarity, text-generation, text-ranking
         formats: gguf
           tasks: conversational, feature-extraction, image-text-to-text, sentence-similarity, text-generation, text-ranking
       libraries: (none)×15, llama.cpp, pytorch, sentence-transformers, transformers, transformers.js, vllm
           other: af, am, ar, az, ba, be, bg, bn, bs, ca, ce, chat, co, code, codeqwen, cross-encoder, cs, custom_code, cy, da, de, deepseek, deploy:azure, dv, edge, el, en, endpoints_compatible, eo, es, et, eu, fa, facebook, fi, fr, fy, ga, gd, gemma, gemma3, gguf-my-repo, gl, gn, google, gpt_oss, granite-4.0, gu, gv, ha, he, hi, hr, ht, hu, hy, id, ig, image-generation, imatrix, it, ja, jv, km, kn, ko, ku, ky, la, language, lfm2, lfm2.5, liquid, llama, llama-3, llama-4, llama-cpp, llama4, lo, lt, lv, math, meta, mg, mi, microsoft, mistral-common, mk, ml, mn, moe, mr, ms, multilingual, mxfp4, my, ne, nl, nlp, nn, no, nvidia, ny, openai, pa, phi, phi3, phi4, pl, prompt-compression, prompt-engineering, prompt-expansion, ps, pt, q4_k_m, quantized, qwen, qwen-coder, qwen3, qwen3_5_moe, qwen3_moe, qwen3_next, reranker, ro, ru, sd, si, sk, sl, sm, sn, so, sq, sr, st, su, sv, sw, ta, te, text-embeddings-inference, tg, th, tl, tn, tr, ug, uk, unsloth, ur, uz, vi, xh, yi, yo, zh, zu

Which might lead to something like below if the GGUF filter with task filters draws in too many unsupported models but missed some without pipeline tags.

(hf_api_formats = GGUF) AND (hf_api_pipeline_tags_set OR hf_api_task_tags_set OR hf_api_other_tags_set)

Model Repo Links

HF search often requires opening the repo page to inspect quant sizes, runtime flags, chat template details, and example usage. This PR adds the repo link to each searched model so users can quickly click through and validate details in a new tab.

HF search proxy with auth

Adds a GET /hf/search proxy that forwards server-side HF_TOKEN, doubling the default rate limit from 500 to 1000 requests per 5 minutes
Adds cursor-based pagination with ‹ N › controls
Adds adaptive cooldown behavior based on the RateLimit response header
Improves rate-limit messaging by showing the exact retry time and suggesting HF_TOKEN when unauthenticated
Whitelists the author parameter in the proxy for pinned-provider queries

HF Cache

The HF cache is often shared across tools and services. Exposing it in the model manager lets Lemonade discover models that were added outside Lemonade and register them without re-downloading. This is especially useful on systems with shared model stores, multi-OS setups, or large mounted repositories where users want to “take models off the shelf” into Lemonade and optionally leave the files in place when removing them later.

This PR adds HF cache discovery and a UI option to keep cached files when removing a model from Lemonade.

Adds GET /cache/models to scan the local HF cache for downloaded models that are not yet registered
Lets users register cached models without re-downloading, which is especially useful for testing recipe setups
Adds a FROM HF CACHE section in the UI with quant dropdowns, recipe badges, size display, and one-click registration
Rolls up multiple quants from the same author, which is helpful when comparing quants and promoting one into Lemonade
Handles symlinked HF cache layouts, sharded models, root-level and folder-based layouts, and mmproj detection
Skips downloads for models already present in cache
Adds a server-side keep_files option so removal can unregister the model without deleting cached files
Makes removed models reappear in the HF cache section for easy re-registration

Filter Dialog

The filter dialog now supports including or excluding any backend from search, so users can narrow results to the backends they care about.

It also makes the active filter state more visible by reflecting both default behavior and current backend availability. Backends that are not active are automatically excluded from search. Users can also hide the new HF Cache and HF Online sections to restore the previous suggested-models-only layout.

Backend chips are color-coded by state: green = installed, yellow = available, red = unsupported

Model Quants

Improves quant detection and ordering across flat and nested folder layouts.

Adds MXFP4 recognition across all quant regexes
Expands regex support for compact forms like q4k and extended names like UD-Q3_K_XL
Adds UD quant support
Fixes ordering to better match HF’s increasing bit-accuracy progression
Deduplicates root-level sharded GGUF files in the quant dropdown

Core: recipe-aware classification

Adds recipeCompatibility.ts with classifyModel(), which prioritizes Hugging Face pipeline_tag metadata over file-format-only detection
Adds RECIPE_FORMATS to gate classification by file-format compatibility, for example preventing sd-cpp from matching GGUF models
Uses a three-pass classification flow: pipeline_tag → repo tags → name pattern matching
Introduces four confidence levels: supported, likely, experimental, and incompatible
Marks experimental models with a yellow ? badge and requires confirmation before install
Aligns badging to backend compatibility rather than raw file format

Vision model (`mmproj`) handling

Returns mmproj_files from the cache endpoint for vision model detection
If multiple mmproj files are found, opens the Add Model dialog for user selection
If exactly one mmproj file is found, auto-selects it and installs directly
Prefers BF16 > F16 > F32 when picking a default mmproj

Other UX improvements

Single-quant models now show the quant in the dropdown for consistency with multi-quant models
Adds a recipe mismatch warning in the Add Model panel when the checkpoint suggests a different modality
Fixes the [object Object] error toast for model load failures
Shows “No more results” on later pages instead of “No compatible models found”

Backend fixes

Computes model sizes from actual files on disk for user-registered models, including sharded models
Updates HttpClient::get to capture response headers for rate-limit parsing
Skips download after user registration when files already exist in the HF cache

Files changed

File	Description
New: `src/app/src/renderer/utils/recipeCompatibility.ts`	Task-to-recipe mapping and model classification
`src/app/src/renderer/ModelManager.tsx`	HF cache section, pagination, proxy integration, vision handling
`src/app/src/renderer/AddModelPanel.tsx`	`mmproj` default selection, recipe mismatch warning
`src/app/src/renderer/ConfirmDialog.tsx`	Optional checkbox support for keep-files
`src/app/src/renderer/components/ConnectedBackendRow.tsx`	Updated for new confirm dialog return type
`src/app/src/renderer/utils/backendInstaller.ts`	`labels` field, `keep_files` param, error message fix
`src/app/styles.css`	Experimental badge, pagination, cooldown animation, warning styles, checkbox
`src/cpp/include/lemon/model_manager.h`	`discover_hf_cache_models()`, `delete_model(keep_files)`
`src/cpp/server/model_manager.cpp`	Cache discovery, GGUF path resolver for sd-cpp, size computation, skip-download logic
`src/cpp/server/server.cpp`	`/cache/models` and `/hf/search` endpoints, `keep_files` support
`src/cpp/server/utils/http_client.cpp`	Response header capture for rate-limit parsing

Test plan

Tested on Ubuntu 26.04 via the web interface using lemonade-server.

🤖 Generated with Claude Code using the 1M context window on Opus with no compaction

…ery (lemonade-sdk#1381) Resolves lemonade-sdk#1381 — Model search was showing incompatible non-LLM models because all GGUF files were blindly routed to llama.cpp. This PR introduces task-first model classification that separates format, task, and backend/recipe into distinct concepts. ## Recipe-aware classification - New `recipeCompatibility.ts` module with `classifyModel()` that prioritizes HuggingFace `pipeline_tag` over file format detection - Supports `text-generation`, `image-text-to-text`, `text-to-image`, `automatic-speech-recognition`, `text-to-speech` pipeline tags - Four confidence levels: supported, likely, experimental, incompatible - Experimental models show yellow badge with `?` and require confirmation before install ## HF cache discovery - New `GET /cache/models` endpoint scans the HF cache directory for downloaded models not yet registered in the model registry - Frontend "FROM HF CACHE" section with quant dropdowns, recipe badges, and one-click registration - Handles symlinked HF cache layouts (canonical path resolution) - Skips re-download for models already present in cache - Properly groups sharded models (folder-based and root-level) ## HF search improvements - New `GET /hf/search` proxy endpoint passes `HF_TOKEN` from server environment for doubled rate limits (500 → 1000 req/5min) - Cursor-based pagination with `‹ N ›` controls - Adaptive rate limiting cooldown based on `RateLimit` response header - Rate limit message shows exact retry time and suggests HF_TOKEN - Model names are clickable links to HuggingFace pages ## Whisper.cpp handling - Whisper models use `.bin` files for quant selection (not `.gguf`) - GGUF-only whisper models are hidden until whisper.cpp GGUF support - Removed whisper from GGUF path resolver (was breaking `.bin` models) ## SD.cpp / image model handling - SD models with GGUF files correctly route to `sd-cpp` (not `llamacpp`) - GGUF path resolver extended to `sd-cpp` for variant matching ## Vision model (mmproj) handling - Cache endpoint returns `mmproj_files` for vision model detection - Multiple mmproj files → opens Add Model dialog for user selection - Single mmproj → auto-selects and installs directly - Default mmproj preference: BF16 > F16 > F32 - Recipe mismatch warning in Add Model panel ## Model deletion with keep-files option - Delete dialog for `user.*` models shows "Keep downloaded files in HF cache" checkbox (unchecked by default) - Server accepts `keep_files` parameter — removes from registry only - Model reappears in HF cache section for easy re-registration ## Additional fixes - Fixed `[object Object]` error toast for load failures (server returns nested error objects) - Model sizes computed from actual files on disk for user-registered models (including sharded models) - MXFP4 quantization format recognized in all quant regexes - Quant regex widened for compact forms (q4k) and extended names (UD-Q3_K_XL) - Root-level sharded GGUF files deduplicated in quant dropdown - Non-quant folders (e.g. `whisper.cpp/`) expand to individual files - Confirm dialog supports optional checkbox with controlled state - HttpClient::get now captures response headers (for rate limit parsing) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename ConfirmCheckbox/checkbox/checkboxChecked to KeepFilesOption/keepFilesOption/keepFiles throughout ConfirmDialog and its callers to better reflect the HF cache preservation feature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move HF cache discovery state below HF search state instead of interleaved in the middle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Server-side: discover_hf_cache_models() fetches pipeline_tag from HF API for each cached model, enabling accurate recipe classification. Frontend: cache models grouped by provider slug (e.g., "unsloth (4)") using collapsible sections. Extracted CacheModelInfo interface and renderCacheModelItem/renderCacheProviderGroup functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…earch fixes - Recipe filter chips (llama.cpp, sd.cpp, whisper.cpp, Kokoro, FLM, RyzenAI) apply across all three sections: suggested, cache, and search - Section visibility toggles to show/hide suggested, HF cache, and HF search independently - Filter icon turns green when non-default filters are active - Multi-word search matches each word independently ("Qwen Image" matches "Qwen-Image-GGUF") - Fix URL encoding in HF search proxy for queries with spaces Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add TASK_RECIPE_MAP entries for embedding models (sentence-similarity, feature-extraction pipeline tags) and reranking models (text-ranking) so they route to llamacpp instead of being marked incompatible. Expand sd-cpp hfTags with image-generation and image-editing to catch FLUX models via repository tags. Add name pattern fallbacks: /embed/i, /nomic/i, /rerank/i. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Sort quantizations by bit-depth (ascending) matching HuggingFace ordering: IQ1 → Q2 → Q3 → Q4 → Q5 → Q6 → Q8 → BF16/F16/F32 - UD (Unsloth Dynamic) quants labeled with (UD) suffix for clarity - UD variants sort right after their non-UD equivalent - Default quant selection prefers Q4_K_M when available - Tooltip shows count when >10 quants ("scroll for more") - URL encode search proxy params (fixes spaces in queries) - Green filter icon when non-default filters active - Multi-word search matches each word independently Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The GGUF path resolver was catching sd-cpp models with safetensors checkpoints (e.g. Z-Image-Turbo), finding no .gguf files, and returning the directory path instead of falling through to the generic resolver. Now skips the GGUF resolver when the variant explicitly contains a non-GGUF extension (.safetensors, .onnx, .bin). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve merge conflicts with main, keeping recipe-aware filtering. HF Search: - Issue parallel search queries per enabled recipe backend - llamacpp: filter=gguf, sd-cpp: filter=safetensors,text-to-image, kokoro: filter=onnx,text-to-speech - Pinned providers: whispercpp→ggerganov, flm→FastFlowLM, ryzenai-llm→amd (surfaces new models without registry updates) - Merge, deduplicate, sort by downloads across all backends - Per-recipe cursor tracking for correct multi-backend pagination - Pre-filter: skip detectBackend for models with unsupported pipeline_tag (saves 2 HF API calls per incompatible model) - Add author param to HF search proxy whitelist Format gating (recipeCompatibility.ts): - RECIPE_FORMATS map defines supported file formats per backend - hasRequiredFormat() gates classification by format compatibility (e.g. sd-cpp requires safetensors, not gguf) - Uses HF format tags with file extension fallback - SUPPORTED_PIPELINE_TAGS exported for pre-filter - Add translation and image-to-text to LLM_PIPELINE_TAGS Filter chips: - Color-coded by backend state: green (installed), yellow (available/installable), red (unsupported) - Inactive chips show subtle state tint (unsupported distinguishable from deselected) - Default to only viable backends on system detection - Filter indicator compares against viable count Test tooling: - New test/hf_model_tags.py: HF model tag analysis with --detect, --summary, per-recipe flags, rate limiting, HF_TOKEN support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These Python-only quantization formats require bitsandbytes/autoawq/auto-gptq runtimes and cannot be loaded by any C++ backend (llamacpp, sd-cpp, etc.). Check model ID for bnb/awq/gptq markers before classification to avoid false positives (e.g. bnb-4bit safetensors models classified as sd-cpp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianbmacdonald · 2026-03-19T14:54:26Z

Search heuristics on Hugging Face models is pandoras box until the newer backends start supporting more model nuances; sd.cpp [almost] works with Qwen-Image .. just not on ROCm (goes to black halfway through layers); whisper.cpp almost works with GGUF but not yet; and FLM is built around a single-user/windows use case and just doesn't support the HF ecosystem properly FastFlowLM/FastFlowLM#406 , so dynamic search and add for new models doesn't fit the use case for a shared model shelf unless you are dragging around the proprietary folder format. Moving this back to draft. I may cherry pick a PR for the HF cache piece which is the best feature IMHO for anyone sharing model shelfs between users, OSs, etc. but gone for the next couple of weeks. @jeremyfowers

Move "Downloaded only" toggle above section toggles for better discoverability, rename "Group suggested models" to "Suggested models". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianbmacdonald marked this pull request as ready for review March 17, 2026 07:47

ianbmacdonald and others added 7 commits March 17, 2026 11:15

refactor: reorganize state declarations in ModelManager

586fcc1

Move HF cache discovery state below HF search state instead of interleaved in the middle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianbmacdonald marked this pull request as draft March 18, 2026 17:28

ianbmacdonald marked this pull request as ready for review March 19, 2026 06:00

ianbmacdonald marked this pull request as draft March 19, 2026 14:41

fix: reorder filter panel and rename group label

c619839

Move "Downloaded only" toggle above section toggles for better discoverability, rename "Group suggested models" to "Suggested models". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: recipe-aware model compatibility for HF search and cache discovery#1390

feat: recipe-aware model compatibility for HF search and cache discovery#1390
ianbmacdonald wants to merge 11 commits intolemonade-sdk:mainfrom
ianbmacdonald:feature/recipe-aware-model-compatibility

ianbmacdonald commented Mar 17, 2026 •

edited

Loading

Uh oh!

ianbmacdonald commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ianbmacdonald commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Model Repo Links

HF search proxy with auth

HF Cache

Filter Dialog

Model Quants

Core: recipe-aware classification

Vision model (mmproj) handling

Other UX improvements

Backend fixes

Files changed

Test plan

Uh oh!

ianbmacdonald commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ianbmacdonald commented Mar 17, 2026 •

edited

Loading

Vision model (`mmproj`) handling