Skip to content

Conversation

hutm
Copy link
Contributor

@hutm hutm commented Sep 11, 2025

Overview:

added example for a frontend shared across multiple models

Details:

added example for a frontend shared across multiple models

Where should the reviewer start?

review all the files

Summary by CodeRabbit

  • New Features
    • Added a Kubernetes example for a Shared Frontend that serves multiple models with a shared model cache.
    • Provides manifests to deploy the stack and expose endpoints for listing models (/v1/models) and chat completions.
  • Documentation
    • New README with end-to-end deployment steps: install chart, create access token secret, apply manifests, port-forward, and test requests.
    • Includes guidance for verifying model availability and sample payloads, plus a reference for benchmarking.

Copy link

copy-pr-bot bot commented Sep 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hutm hutm changed the title added example for a frontend shared across multiple models [docs] added example for a frontend shared across multiple models Sep 11, 2025
@hutm hutm changed the title [docs] added example for a frontend shared across multiple models docs: added example for a frontend shared across multiple models Sep 11, 2025
@github-actions github-actions bot added the docs label Sep 11, 2025
Copy link
Contributor

coderabbitai bot commented Sep 11, 2025

Walkthrough

Adds a new Kubernetes example for a shared Dynamo frontend. Introduces a README with deployment steps and a manifest defining a PVC, a frontend deployment, a vLLM aggregation worker, and an agg-qwen stack (encode, VLM, processor), all using a shared HF cache and token secret.

Changes

Cohort / File(s) Summary
Docs: Kubernetes shared frontend README
examples/basics/kubernetes/shared_frontend/README.md
New README describing deployment to the dynamo namespace, Helm install, HF token secret, applying shared_frontend.yaml, port-forwarding on 8000, listing /v1/models, sample chat completion, and GenAI-Perf reference.
Kubernetes manifests: shared frontend stack
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
New manifest adding: PVC dynamo-model-cache (100Gi); DynamoGraphDeployment frontend (namespace: dynamo); DynamoGraphDeployment vllm-agg (VllmDecodeWorker, GPU 1, HF cache/token); DynamoGraphDeployment agg-qwen (EncodeWorker, VLMWorker prefill, Processor) with shared PVC mounts and command entries.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Frontend as Frontend (dynamo)
    participant vLLMAgg as VllmDecodeWorker (vllm-agg)
    participant AggQwen as agg-qwen Services
    participant HFCache as Shared PVC (/root/.cache/huggingface)
    participant HF as Hugging Face Hub

    User->>Frontend: HTTP request (/v1/*)
    alt Text decode
        Frontend->>vLLMAgg: Generate/Decode request
        vLLMAgg-->>HFCache: Read/Write model weights
        HFCache-->>HF: Fetch missing weights (via HF token)
        vLLMAgg-->>Frontend: Tokens/Result
    else Multimodal pipeline
        Frontend->>AggQwen: Encode request
        AggQwen->>AggQwen: EncodeWorker → VLMWorker(prefill) → Processor
        AggQwen-->>HFCache: Read/Write model weights
        HFCache-->>HF: Fetch missing weights (via HF token)
        AggQwen-->>Frontend: Pipeline result
    end
    Frontend-->>User: Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit twitches whiskers, keen,
New pods arise in namespaces clean—
One cache to share, the models hum,
Frontend routes and tokens come.
VLLM sings, Qwen joins the thread,
Burrows of YAML, neatly spread.
Hop! The cluster’s green lights led.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (2 passed, 1 inconclusive)

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description Check ❓ Inconclusive The description includes the template headings (Overview, Details, Where should the reviewer start?) but the content is minimal and largely repetitive, offering no file-level guidance, summary of key changes, or testing instructions. The "Where should the reviewer start?" entry only says "review all the files," which does not meet the template's intent to call out specific files or risk areas. Because important details required by the template are missing, the description is insufficient to confidently assess the PR. Please expand Details to briefly summarize the added files and key changes (for example, examples/basics/kubernetes/shared_frontend/README.md and shared_frontend.yaml) and note important resources such as the new PVC and DynamoGraphDeployment entries and any manual test steps; replace "review all the files" with explicit starting points and callouts (specific files, sections, or commands to run) and add a Related Issues line if applicable. After those additions the description will meet the repository template and can be marked as pass.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title accurately and concisely summarizes the primary change: adding an example for a frontend shared across multiple models (documentation plus Kubernetes manifests). It directly matches the added README and shared_frontend.yaml in the changeset and is clear for a teammate scanning PR history. The phrasing is specific and avoids unnecessary noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)

39-41: HF token secret must exist in each runtime namespace.

envFromSecret assumes hf-token-secret resides in the pod namespace. Ensure the secret is created in: vllm-agg, agg-qwen (and dynamo if frontend needs it). Update README accordingly. I can provide a patch.

Also applies to: 68-70, 89-91, 110-112


43-46: Add CPU/memory requests for reliable scheduling.

Only GPU limits are set. Add cpu/memory requests (and limits as needed). Example (tune values):

       resources:
-        limits:
-          gpu: "1"
+        requests:
+          cpu: "2"
+          memory: "8Gi"
+        limits:
+          gpu: "1"
+          memory: "16Gi"

Also applies to: 72-75, 94-96, 115-116

examples/basics/kubernetes/shared_frontend/README.md (2)

18-21: Port-forward: pin to the frontend namespace.

Service name may vary by operator, but namespace should be dynamo.

-kubectl port-forward svc/frontend-frontend 8000:8000 -n ${NAMESPACE}
+kubectl port-forward svc/frontend-frontend 8000:8000 -n dynamo

1-8: Trailing whitespace fixed by pre-commit—keep it clean.

Pre-merge hook reported trailing whitespace on these blocks; ensure editors trim on save to avoid CI churn.

Also applies to: 20-22

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eced09 and c0e89f5.

📒 Files selected for processing (2)
  • examples/basics/kubernetes/shared_frontend/README.md (1 hunks)
  • examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3008/merge) by hutm.
examples/basics/kubernetes/shared_frontend/README.md

[error] 1-8: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The hook fixed issues in this file.


[error] 20-22: Trailing whitespace detected by pre-commit (hook: trailing-whitespace) on code fence line. The hook fixed issues.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)

43-46: No change required — CRD expects resources.limits.gpu
DynamoGraphDeployment CRD uses plain "gpu" under resources.limits (not "nvidia.com/gpu"); the YAML snippet is correct.


16-18: ```shell
#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
TARGET="examples/basics/kubernetes/shared_frontend/shared_frontend.yaml"

echo
if [ -f "$TARGET" ]; then
echo "---- $TARGET (full) ----"
sed -n '1,200p' "$TARGET"
else
echo "FILE NOT FOUND: $TARGET"
fi

echo
echo "---- Search for 'dynamoNamespace' ----"
rg -n --hidden -S "dynamoNamespace" || true

echo
echo "---- Search for 'dynamo' (word) ----"
rg -n --hidden -S "\bdynamo\b" || true

echo
echo "---- YAML files under examples/basics/kubernetes ----"
find examples/basics/kubernetes -maxdepth 4 -type f -name "*.yaml" -print | sed -n '1,200p' || true

echo
echo "---- Search for 'namespace:' occurrences in examples (show context) ----"
rg -n -C2 --hidden -S "namespace:" examples || true

echo
echo "---- Show 'metadata:' blocks in that examples path (context) ----"
rg -n -C2 --hidden -S "metadata:" examples/basics/kubernetes || true


</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines 2 to 6
This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD)
and two models.
Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo
namespace name for frontend to observe deployed models across all dynamo namespaces.
A shared PVC is configured to store model checkpoint weights fetched from HF.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix typos and clarify DGD/namespace behavior.

  • “DynamoGraphDeploymen” → “DynamoGraphDeployment”.
  • Clarify that the CR schedules workloads to multiple namespaces and that secrets/PVCs must exist there.
-This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD)
-and two models.
-Frontend is shared across the two models. Frontend is deployed to  dynamo namespace `dynamo`, which is a reserved dynamo 
-namespace name for frontend to observe deployed models across all dynamo namespaces.  
+This folder contains Kubernetes manifests to deploy the Dynamo frontend as a standalone DynamoGraphDeployment (DGD)
+and two models.
+The frontend is shared across the two models. It is deployed into the reserved Dynamo namespace `dynamo`, which allows
+the frontend to discover models across all Dynamo namespaces.
+A shared PVC stores model checkpoint weights fetched from Hugging Face.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD)
and two models.
Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo
namespace name for frontend to observe deployed models across all dynamo namespaces.
A shared PVC is configured to store model checkpoint weights fetched from HF.
This folder contains Kubernetes manifests to deploy the Dynamo frontend as a standalone DynamoGraphDeployment (DGD)
and two models.
The frontend is shared across the two models. It is deployed into the reserved Dynamo namespace `dynamo`, which allows
the frontend to discover models across all Dynamo namespaces.
A shared PVC stores model checkpoint weights fetched from Hugging Face.
🤖 Prompt for AI Agents
In examples/basics/kubernetes/shared_frontend/README.md around lines 2 to 6, fix
typos and clarify CR/namespace behavior: change “DynamoGraphDeploymen” to
“DynamoGraphDeployment”, correct spacing and punctuation, and update the
description to state that the DGD CR schedules workloads into multiple target
namespaces (not just deploys the frontend), and that any required secrets and
PVCs must be created in each target namespace where models run; also note the
frontend itself is deployed into the reserved dynamo namespace `dynamo` which
watches those namespaces.

Comment on lines +8 to +16
1. Install Dynamo k8s platform helm chart
2. Create a K8S secret with your Huggingface token and then render k8s manifests
```sh
export HF_TOKEN=YOUR_HF_TOKEN
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
--namespace ${NAMESPACE}
kubectl apply -f shared_frontend.yaml --namespace ${NAMESPACE}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Secret creation: create in each runtime namespace; avoid apply --namespace.

The pods run in namespaces: dynamo, vllm-agg, agg-qwen. Create the secret in each, and apply the manifest without overriding namespaces.

-2. Create a K8S secret with your Huggingface token and then render k8s manifests
+2. Create a K8S secret with your Hugging Face token in each runtime namespace, then apply the manifests
 ```sh
 export HF_TOKEN=YOUR_HF_TOKEN
-kubectl create secret generic hf-token-secret \
-    --from-literal=HF_TOKEN=${HF_TOKEN} \
-    --namespace ${NAMESPACE}
-kubectl apply -f shared_frontend.yaml --namespace ${NAMESPACE}
+for ns in dynamo vllm-agg agg-qwen; do
+  kubectl create namespace "$ns" --dry-run=client -o yaml | kubectl apply -f -
+  kubectl -n "$ns" create secret generic hf-token-secret \
+    --from-literal=HF_TOKEN="${HF_TOKEN}" --dry-run=client -o yaml | kubectl apply -f -
+done
+kubectl apply -f shared_frontend.yaml
🤖 Prompt for AI Agents
In examples/basics/kubernetes/shared_frontend/README.md around lines 8 to 16,
the secret creation instructions currently create the HF_TOKEN secret only in a
single namespace and apply the manifest with an overridden namespace; update
instructions to create the hf-token-secret in each runtime namespace (dynamo,
vllm-agg, agg-qwen) and ensure those namespaces exist (create them if missing)
using kubectl dry-run + apply, then create the secret in each namespace via
kubectl -n <ns> create secret ... --from-literal=HF_TOKEN="$HF_TOKEN"
--dry-run=client -o yaml | kubectl apply -f -, and finally apply the
shared_frontend.yaml without passing --namespace so the manifests’ own
namespaces are respected.

@hutm hutm force-pushed the mkhadkevich/addSeparateFrontEndExample branch from c0e89f5 to f01958f Compare September 11, 2025 03:00
hutm and others added 5 commits September 15, 2025 19:09
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Maksim Khadkevich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant