Reference implementation for a Retrieval Augmented Generation pipeline. Python 3.11+ backend (FastAPI + LangChain), React/TypeScript frontend, deployable via Docker Compose or Helm.
src/nvidia_rag/
├── rag_server/ # RAG query/response server (FastAPI)
├── ingestor_server/ # Document ingestion server (FastAPI)
└── utils/ # Shared utilities
frontend/ # React + TypeScript UI (pnpm)
deploy/
├── compose/ # Docker Compose files and env configs
└── helm/ # Helm charts (standard + MIG-slicing)
docs/ # User-facing documentation (Sphinx, RST/MD)
tests/
├── unit/ # No network calls allowed
└── integration/ # Network calls permitted
notebooks/ # Jupyter notebooks for evaluation and examples
uv sync # Install all deps
uv run pytest tests/unit/ # Unit tests
uv run pytest tests/integration/ # Integration tests
ruff check --fix src/ # Lint + autofix
ruff format src/ # Format
pre-commit run --all-files # Run all pre-commit hookscd frontend
pnpm install
pnpm run dev # Dev server
pnpm run lint # ESLint
pnpm exec tsc --noEmit # Type check
pnpm run test:run # Tests- Python: Ruff for linting and formatting (line-length 88, double quotes, space indent). Config in
pyproject.toml. - Type hints: Required on all function signatures.
- Imports: Sorted by isort via Ruff. No in-function imports.
- Tests: Mirror source tree (
src/nvidia_rag/rag_server/server.py→tests/unit/rag_server/test_server.py). - Frontend: ESLint + TypeScript strict mode. Function components with hooks.
- Env files:
deploy/compose/nvdev.env(NVIDIA-hosted NIMs) anddeploy/compose/.env(self-hosted). These are the source of truth for Docker deployments — shell-only exports are lost on restart.
- Docker Compose —
deploy/compose/with env-file configs. Multiple profiles: standard, retrieval-only, NVIDIA-hosted. - Helm —
deploy/helm/nvidia-blueprint-rag/chart withvalues.yaml. Supports MIG GPU slicing viadeploy/helm/mig-slicing/. - Library — Import
nvidia_ragas a Python package for custom pipelines.
pyproject.toml— All Python deps, ruff config, project metadatadeploy/compose/nvdev.env— Default env file for NVIDIA API Catalog deploymentssrc/nvidia_rag/rag_server/prompt.yaml— System prompt templatesdocs/support-matrix.md— GPU requirements per deployment modedocs/service-port-gpu-reference.md— Port mappings and GPU assignments
- Target the
developbranch, nevermain. - All commits must be signed off (DCO).
- Run
pre-commit run --all-filesbefore submitting. - See
CONTRIBUTING.mdfor full workflow.
For any operational task, use the rag-blueprint skill (.agents/skills/rag-blueprint/).
- Deploy — Docker Compose (standard, retrieval-only, NVIDIA-hosted), Helm, MIG-slicing, library mode
- Configure — VLM, guardrails, query rewriting, ingestion, search & retrieval, models, observability, summarization, multimodal, MCP, evaluation, notebooks, UI, and more
- Troubleshoot — Debug unhealthy services, container errors, GPU issues, connectivity failures
- Shutdown — Stop, tear down, and clean up services