Skip to content

feat: add RAG-Anything Studio — standalone multimodal-aware Web UI#270

Open
devinlovekoala wants to merge 15 commits into
HKUDS:mainfrom
devinlovekoala:gui-support
Open

feat: add RAG-Anything Studio — standalone multimodal-aware Web UI#270
devinlovekoala wants to merge 15 commits into
HKUDS:mainfrom
devinlovekoala:gui-support

Conversation

@devinlovekoala

@devinlovekoala devinlovekoala commented Apr 28, 2026

Copy link
Copy Markdown

Summary

This PR introduces RAG-Anything Studio, a standalone, local-first Web UI for RAG-Anything.

Motivation

While RAG-Anything provides strong multimodal capabilities, it currently lacks a practical interface for:

  • inspecting multimodal parsing outputs
  • debugging retrieval behavior across text and images
  • configuring heterogeneous model providers and storage backends

This PR provides a minimal, low-risk UI layer to improve usability and observability without modifying the core pipeline.

Scope

This PR does not redesign RAG-Anything.
It adds an optional UI layer on top of the existing system.

All code lives under /raganything_studio and interacts with RAG-Anything only through its public APIs.

What's included

  • Backend (raganything_studio/backend/) — FastAPI service wrapping RAGAnything without modifying its internals
  • Frontend (raganything_studio/frontend/) — React + TypeScript SPA (prebuilt and served from backend/static/, no Node.js required)

Key capabilities

  • Multimodal interaction

    • Multimodal query interface with image-evidence display
    • Image preview support for knowledge graph nodes
  • System configuration

    • Full settings page for LLM / VLM / embedding providers
    • Remote vector DB and hybrid storage configuration
    • Provider model discovery and embedding dimension auto-detection
  • Reliability & usability

    • Test-connection and readiness guards before insert/query
    • Phased concurrency controls and file-hash caching for insert jobs
  • Visualization

    • Knowledge graph visualization (Cytoscape fcose layout)

Design principles

  • Zero risk to existing users — fully isolated under /raganything_studio
  • No core pipeline changes — uses only public APIs
  • Optional dependency — installed via extras (pip install ".[studio]")
  • Maintainability-first — Studio evolves independently from core

Test plan

Automated checks:

  • python -m pip install -e . completes in a clean virtual environment
  • cd raganything_studio/frontend && npm ci && npm run build
  • python -m raganything_studio --host 127.0.0.1 --port 7860
  • curl -fsS http://127.0.0.1:7860/api/health
  • pytest tests/studio

Manual checks:

  • Open http://127.0.0.1:7860 and verify the Studio UI loads
  • In Settings, save valid model/storage settings and verify invalid storage combinations are rejected
  • Use Test connection for the configured model provider and any configured database storage backend
  • Upload a document, start processing via the UI, and confirm the job reaches succeeded and the document becomes indexed
  • Open Knowledge Graph and confirm graph nodes/edges render
  • Run a query with multimodal retrieval enabled and confirm the answer includes sources plus image/media preview when the indexed document contains visual content

Related issue: #269

…ved job errors

- Add ReadinessContext: globally tracks llm/embedding key status and indexed doc count
- UploadPage: show gate banner with missing keys listed, disable process button when unconfigured
- QueryPage: dual guard — blocks query if unconfigured OR no indexed docs, shows contextual action
- Dashboard: replace static metrics with 3-step onboarding track (Config → Upload → Query), auto-advances active step based on real state; keep metrics + recent doc list below
- JobPage: extract error summary from traceback (last non-File line), show collapsible traceback behind 'Show details'; log console auto-scrolls to bottom on every log update
- styles.css: full rewrite with CSS custom properties, refined sidebar (sticky, footer status dot), improved typography scale, gate-banner, onboarding-track, error-summary, log-section components
Backend:
- POST /api/settings/test-connection: tests LLM/Embedding/Vision using
  unsaved form values; falls back to saved key when form field is blank
- Embedding test returns detected_dim by measuring actual vector length
- LLM/Vision test sends a minimal 4-token prompt and measures latency

Frontend:
- SettingsPage: each provider card gets a 'Test connection' button with
  spinner, green latency badge on success, red Failed badge on error
- Embedding card: on successful test, detected_dim is auto-applied to
  the Dimension field and shown as 'Auto-detected: N' alongside it
- API key placeholder shows hint when key is already saved so user
  knows they can leave it blank during a re-test
- Backend: POST /api/settings/list-models — fetches /v1/models from any
  OpenAI-compatible provider; falls back to saved API key when form key
  is blank; returns ModelListResponse{ok, models[], error}
- Provider registry: 15 known platforms (SiliconFlow, 阿里云百炼, 百度千帆,
  火山引擎, OpenRouter, DeepSeek, Groq, etc.) with pre-filled base URLs;
  selecting a known provider auto-fills Base URL and hides the URL field
- Frontend: ProviderSection replaces free-text model input with a
  two-mode ModelPickerField — text input + "Load models" button before
  fetch, grouped searchable dropdown after; groups by owned_by field
  matching Cherry Studio UX
- Provider select reordered into optgroups: Popular (CN), International,
  Self-hosted, Other
- CSS: model-dropdown-panel, model-group, model-option, base-url-display,
  model-picker-row, load-models-btn and related selectors
- pyproject.toml: add httpx>=0.25 as explicit dependency
- _test_vision: replace text-only openai_complete_if_cache call with a
  direct httpx multimodal POST containing a 1×1 white PNG; plain text
  calls are rejected by GLM-4V and most VLMs with InvalidResponseError
- _is_vision_capable: pattern-match model IDs against known VL keywords
  (vl, vision, glm-4v, qwen-vl, llava, gpt-4o, claude-3, etc.) to tag
  ModelInfo.vision_capable in the /list-models response
- ModelPickerField: when kind==='vision', default visionOnly filter to
  true and show "VL only" toggle button; VL badge shown on capable models
- styles: .model-badge--vision (blue), .vision-filter-btn + .active state
@privat655

Copy link
Copy Markdown

wow great idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants