docs: added example for a frontend shared across multiple models #3008

hutm · 2025-09-11T02:47:14Z

Overview:

added example for a frontend shared across multiple models

Details:

added example for a frontend shared across multiple models

Where should the reviewer start?

review all the files

Summary by CodeRabbit

New Features
- Added a Kubernetes example for a Shared Frontend that serves multiple models with a shared model cache.
- Provides manifests to deploy the stack and expose endpoints for listing models (/v1/models) and chat completions.
Documentation
- New README with end-to-end deployment steps: install chart, create access token secret, apply manifests, port-forward, and test requests.
- Includes guidance for verifying model availability and sample payloads, plus a reference for benchmarking.

copy-pr-bot · 2025-09-11T02:47:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-11T02:57:35Z

Walkthrough

Adds a new Kubernetes example for a shared Dynamo frontend. Introduces a README with deployment steps and a manifest defining a PVC, a frontend deployment, a vLLM aggregation worker, and an agg-qwen stack (encode, VLM, processor), all using a shared HF cache and token secret.

Changes

Cohort / File(s)	Summary
Docs: Kubernetes shared frontend README `examples/basics/kubernetes/shared_frontend/README.md`	New README describing deployment to the dynamo namespace, Helm install, HF token secret, applying shared_frontend.yaml, port-forwarding on 8000, listing /v1/models, sample chat completion, and GenAI-Perf reference.
Kubernetes manifests: shared frontend stack `examples/basics/kubernetes/shared_frontend/shared_frontend.yaml`	New manifest adding: PVC `dynamo-model-cache` (100Gi); `DynamoGraphDeployment` frontend (namespace: dynamo); `DynamoGraphDeployment` vllm-agg (VllmDecodeWorker, GPU 1, HF cache/token); `DynamoGraphDeployment` agg-qwen (EncodeWorker, VLMWorker prefill, Processor) with shared PVC mounts and command entries.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Frontend as Frontend (dynamo)
    participant vLLMAgg as VllmDecodeWorker (vllm-agg)
    participant AggQwen as agg-qwen Services
    participant HFCache as Shared PVC (/root/.cache/huggingface)
    participant HF as Hugging Face Hub

    User->>Frontend: HTTP request (/v1/*)
    alt Text decode
        Frontend->>vLLMAgg: Generate/Decode request
        vLLMAgg-->>HFCache: Read/Write model weights
        HFCache-->>HF: Fetch missing weights (via HF token)
        vLLMAgg-->>Frontend: Tokens/Result
    else Multimodal pipeline
        Frontend->>AggQwen: Encode request
        AggQwen->>AggQwen: EncodeWorker → VLMWorker(prefill) → Processor
        AggQwen-->>HFCache: Read/Write model weights
        HFCache-->>HF: Fetch missing weights (via HF token)
        AggQwen-->>Frontend: Pipeline result
    end
    Frontend-->>User: Response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

chore: add agg_qwen.yaml to multimodal deploy #2872 — Introduces an agg-qwen DynamoGraphDeployment with similar Encode/VLM/Processor services and configs.
docs: SNS agg k8s example #2773 — Adds related Kubernetes DynamoGraphDeployment examples for frontend plus vLLM workers using shared PVC and HF secrets.
chore: Change vllm K8s from dynamo-run to python -m dynamo.frontend #2055 — Adjusts frontend startup/port conventions aligned with this example’s frontend (HTTP port 8000).

Poem

A rabbit twitches whiskers, keen,
New pods arise in namespaces clean—
One cache to share, the models hum,
Frontend routes and tokens come.
VLLM sings, Qwen joins the thread,
Burrows of YAML, neatly spread.
Hop! The cluster’s green lights led.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (2 passed, 1 inconclusive)

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	❓ Inconclusive	The description includes the template headings (Overview, Details, Where should the reviewer start?) but the content is minimal and largely repetitive, offering no file-level guidance, summary of key changes, or testing instructions. The "Where should the reviewer start?" entry only says "review all the files," which does not meet the template's intent to call out specific files or risk areas. Because important details required by the template are missing, the description is insufficient to confidently assess the PR.	Please expand Details to briefly summarize the added files and key changes (for example, examples/basics/kubernetes/shared_frontend/README.md and shared_frontend.yaml) and note important resources such as the new PVC and DynamoGraphDeployment entries and any manual test steps; replace "review all the files" with explicit starting points and callouts (specific files, sections, or commands to run) and add a Related Issues line if applicable. After those additions the description will meet the repository template and can be marked as pass.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Title Check	✅ Passed	The title accurately and concisely summarizes the primary change: adding an example for a frontend shared across multiple models (documentation plus Kubernetes manifests). It directly matches the added README and shared_frontend.yaml in the changeset and is clear for a teammate scanning PR history. The phrasing is specific and avoids unnecessary noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (4)

examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)
39-41: HF token secret must exist in each runtime namespace.

envFromSecret assumes hf-token-secret resides in the pod namespace. Ensure the secret is created in: vllm-agg, agg-qwen (and dynamo if frontend needs it). Update README accordingly. I can provide a patch.

Also applies to: 68-70, 89-91, 110-112

43-46: Add CPU/memory requests for reliable scheduling.

Only GPU limits are set. Add cpu/memory requests (and limits as needed). Example (tune values):
       resources:
-        limits:
-          gpu: "1"
+        requests:
+          cpu: "2"
+          memory: "8Gi"
+        limits:
+          gpu: "1"
+          memory: "16Gi"
Also applies to: 72-75, 94-96, 115-116
examples/basics/kubernetes/shared_frontend/README.md (2)
18-21: Port-forward: pin to the frontend namespace.

Service name may vary by operator, but namespace should be dynamo.
-kubectl port-forward svc/frontend-frontend 8000:8000 -n ${NAMESPACE}
+kubectl port-forward svc/frontend-frontend 8000:8000 -n dynamo
1-8: Trailing whitespace fixed by pre-commit—keep it clean.

Pre-merge hook reported trailing whitespace on these blocks; ensure editors trim on save to avoid CI churn.

Also applies to: 20-22

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eced09 and c0e89f5.

📒 Files selected for processing (2)

examples/basics/kubernetes/shared_frontend/README.md (1 hunks)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3008/merge) by hutm.

examples/basics/kubernetes/shared_frontend/README.md

[error] 1-8: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The hook fixed issues in this file.

[error] 20-22: Trailing whitespace detected by pre-commit (hook: trailing-whitespace) on code fence line. The hook fixed issues.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)
43-46: No change required — CRD expects resources.limits.gpu
DynamoGraphDeployment CRD uses plain "gpu" under resources.limits (not "nvidia.com/gpu"); the YAML snippet is correct.

16-18: ```shell
#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
TARGET="examples/basics/kubernetes/shared_frontend/shared_frontend.yaml"

echo
if [ -f "$TARGET" ]; then
echo "---- $TARGET (full) ----"
sed -n '1,200p' "$TARGET"
else
echo "FILE NOT FOUND: $TARGET"
fi

echo
echo "---- Search for 'dynamoNamespace' ----"
rg -n --hidden -S "dynamoNamespace" || true

echo
echo "---- Search for 'dynamo' (word) ----"
rg -n --hidden -S "\bdynamo\b" || true

echo
echo "---- YAML files under examples/basics/kubernetes ----"
find examples/basics/kubernetes -maxdepth 4 -type f -name "*.yaml" -print | sed -n '1,200p' || true

echo
echo "---- Search for 'namespace:' occurrences in examples (show context) ----"
rg -n -C2 --hidden -S "namespace:" examples || true

echo
echo "---- Show 'metadata:' blocks in that examples path (context) ----"
rg -n -C2 --hidden -S "metadata:" examples/basics/kubernetes || true
</blockquote></details>

</blockquote></details>

</details>

coderabbitai · 2025-09-11T02:57:38Z

examples/basics/kubernetes/shared_frontend/README.md

+This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD)
+and two models.
+Frontend is shared across the two models. Frontend is deployed to  dynamo namespace `dynamo`, which is a reserved dynamo 
+namespace name for frontend to observe deployed models across all dynamo namespaces.  
+A shared PVC is configured to store model checkpoint weights fetched from HF.


🛠️ Refactor suggestion

⚠️ Potential issue

Fix typos and clarify DGD/namespace behavior.

“DynamoGraphDeploymen” → “DynamoGraphDeployment”.

Clarify that the CR schedules workloads to multiple namespaces and that secrets/PVCs must exist there.

-This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD) -and two models. -Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo -namespace name for frontend to observe deployed models across all dynamo namespaces. +This folder contains Kubernetes manifests to deploy the Dynamo frontend as a standalone DynamoGraphDeployment (DGD) +and two models. +The frontend is shared across the two models. It is deployed into the reserved Dynamo namespace `dynamo`, which allows +the frontend to discover models across all Dynamo namespaces. +A shared PVC stores model checkpoint weights fetched from Hugging Face.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD)

and two models.

Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo

namespace name for frontend to observe deployed models across all dynamo namespaces.

A shared PVC is configured to store model checkpoint weights fetched from HF.

This folder contains Kubernetes manifests to deploy the Dynamo frontend as a standalone DynamoGraphDeployment (DGD)

and two models.

The frontend is shared across the two models. It is deployed into the reserved Dynamo namespace `dynamo`, which allows

the frontend to discover models across all Dynamo namespaces.

A shared PVC stores model checkpoint weights fetched from Hugging Face.

🤖 Prompt for AI Agents

In examples/basics/kubernetes/shared_frontend/README.md around lines 2 to 6, fix typos and clarify CR/namespace behavior: change “DynamoGraphDeploymen” to “DynamoGraphDeployment”, correct spacing and punctuation, and update the description to state that the DGD CR schedules workloads into multiple target namespaces (not just deploys the frontend), and that any required secrets and PVCs must be created in each target namespace where models run; also note the frontend itself is deployed into the reserved dynamo namespace `dynamo` which watches those namespaces.

coderabbitai · 2025-09-11T02:57:38Z

examples/basics/kubernetes/shared_frontend/README.md

+1. Install Dynamo k8s platform helm chart
+2. Create a K8S secret with your Huggingface token and then render k8s manifests
+```sh
+export HF_TOKEN=YOUR_HF_TOKEN
+kubectl create secret generic hf-token-secret \
+    --from-literal=HF_TOKEN=${HF_TOKEN} \
+    --namespace ${NAMESPACE}
+kubectl apply -f shared_frontend.yaml --namespace ${NAMESPACE}
+```


⚠️ Potential issue

Secret creation: create in each runtime namespace; avoid apply --namespace.

The pods run in namespaces: dynamo, vllm-agg, agg-qwen. Create the secret in each, and apply the manifest without overriding namespaces.

-2. Create a K8S secret with your Huggingface token and then render k8s manifests +2. Create a K8S secret with your Hugging Face token in each runtime namespace, then apply the manifests ```sh export HF_TOKEN=YOUR_HF_TOKEN -kubectl create secret generic hf-token-secret \ - --from-literal=HF_TOKEN=${HF_TOKEN} \ - --namespace ${NAMESPACE} -kubectl apply -f shared_frontend.yaml --namespace ${NAMESPACE} +for ns in dynamo vllm-agg agg-qwen; do + kubectl create namespace "$ns" --dry-run=client -o yaml | kubectl apply -f - + kubectl -n "$ns" create secret generic hf-token-secret \ + --from-literal=HF_TOKEN="${HF_TOKEN}" --dry-run=client -o yaml | kubectl apply -f - +done +kubectl apply -f shared_frontend.yaml

🤖 Prompt for AI Agents

In examples/basics/kubernetes/shared_frontend/README.md around lines 8 to 16, the secret creation instructions currently create the HF_TOKEN secret only in a single namespace and apply the manifest with an overridden namespace; update instructions to create the hf-token-secret in each runtime namespace (dynamo, vllm-agg, agg-qwen) and ensure those namespaces exist (create them if missing) using kubectl dry-run + apply, then create the secret in each namespace via kubectl -n <ns> create secret ... --from-literal=HF_TOKEN="$HF_TOKEN" --dry-run=client -o yaml | kubectl apply -f -, and finally apply the shared_frontend.yaml without passing --namespace so the manifests’ own namespaces are respected.

examples/basics/kubernetes/shared_frontend/README.md

examples/basics/kubernetes/shared_frontend/shared_frontend.yaml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>

hutm requested review from nnshah1, whoisj, nealvaidya, ishandhanani and a team as code owners September 11, 2025 02:47

pull-request-size bot added the size/L label Sep 11, 2025

hutm changed the title ~~added example for a frontend shared across multiple models~~ [docs] added example for a frontend shared across multiple models Sep 11, 2025

hutm changed the title ~~[docs] added example for a frontend shared across multiple models~~ docs: added example for a frontend shared across multiple models Sep 11, 2025

github-actions bot added the docs label Sep 11, 2025

added example for a frontend shared across multiple models

f01958f

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch from c0e89f5 to f01958f Compare September 11, 2025 03:00

hutm and others added 5 commits September 15, 2025 19:09

Update examples/basics/kubernetes/shared_frontend/shared_frontend.yaml

4089609

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>

Update examples/basics/kubernetes/shared_frontend/shared_frontend.yaml

95c87d1

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>

Update examples/basics/kubernetes/shared_frontend/README.md

b7f3377

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>

Update examples/basics/kubernetes/shared_frontend/shared_frontend.yaml

4ca674e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>

Update examples/basics/kubernetes/shared_frontend/shared_frontend.yaml

ef108af

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: added example for a frontend shared across multiple models #3008

docs: added example for a frontend shared across multiple models #3008

hutm commented Sep 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 11, 2025

Uh oh!

coderabbitai bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

docs: added example for a frontend shared across multiple models #3008

Are you sure you want to change the base?

docs: added example for a frontend shared across multiple models #3008

Conversation

hutm commented Sep 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 11, 2025

Uh oh!

coderabbitai bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks (2 passed, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hutm commented Sep 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 11, 2025 •

edited

Loading