-
Notifications
You must be signed in to change notification settings - Fork 593
docs: added example for a frontend shared across multiple models #3008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new Kubernetes example for a shared Dynamo frontend. Introduces a README with deployment steps and a manifest defining a PVC, a frontend deployment, a vLLM aggregation worker, and an agg-qwen stack (encode, VLM, processor), all using a shared HF cache and token secret. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Frontend as Frontend (dynamo)
participant vLLMAgg as VllmDecodeWorker (vllm-agg)
participant AggQwen as agg-qwen Services
participant HFCache as Shared PVC (/root/.cache/huggingface)
participant HF as Hugging Face Hub
User->>Frontend: HTTP request (/v1/*)
alt Text decode
Frontend->>vLLMAgg: Generate/Decode request
vLLMAgg-->>HFCache: Read/Write model weights
HFCache-->>HF: Fetch missing weights (via HF token)
vLLMAgg-->>Frontend: Tokens/Result
else Multimodal pipeline
Frontend->>AggQwen: Encode request
AggQwen->>AggQwen: EncodeWorker → VLMWorker(prefill) → Processor
AggQwen-->>HFCache: Read/Write model weights
HFCache-->>HF: Fetch missing weights (via HF token)
AggQwen-->>Frontend: Pipeline result
end
Frontend-->>User: Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Pre-merge checks (2 passed, 1 inconclusive)❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (4)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)
39-41
: HF token secret must exist in each runtime namespace.envFromSecret assumes hf-token-secret resides in the pod namespace. Ensure the secret is created in: vllm-agg, agg-qwen (and dynamo if frontend needs it). Update README accordingly. I can provide a patch.
Also applies to: 68-70, 89-91, 110-112
43-46
: Add CPU/memory requests for reliable scheduling.Only GPU limits are set. Add cpu/memory requests (and limits as needed). Example (tune values):
resources: - limits: - gpu: "1" + requests: + cpu: "2" + memory: "8Gi" + limits: + gpu: "1" + memory: "16Gi"Also applies to: 72-75, 94-96, 115-116
examples/basics/kubernetes/shared_frontend/README.md (2)
18-21
: Port-forward: pin to the frontend namespace.Service name may vary by operator, but namespace should be
dynamo
.-kubectl port-forward svc/frontend-frontend 8000:8000 -n ${NAMESPACE} +kubectl port-forward svc/frontend-frontend 8000:8000 -n dynamo
1-8
: Trailing whitespace fixed by pre-commit—keep it clean.Pre-merge hook reported trailing whitespace on these blocks; ensure editors trim on save to avoid CI churn.
Also applies to: 20-22
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/basics/kubernetes/shared_frontend/README.md
(1 hunks)examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3008/merge) by hutm.
examples/basics/kubernetes/shared_frontend/README.md
[error] 1-8: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The hook fixed issues in this file.
[error] 20-22: Trailing whitespace detected by pre-commit (hook: trailing-whitespace) on code fence line. The hook fixed issues.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml (2)
43-46
: No change required — CRD expects resources.limits.gpu
DynamoGraphDeployment CRD uses plain "gpu" under resources.limits (not "nvidia.com/gpu"); the YAML snippet is correct.
16-18
: ```shell
#!/bin/bash
set -euo pipefailecho "PWD: $(pwd)"
TARGET="examples/basics/kubernetes/shared_frontend/shared_frontend.yaml"echo
if [ -f "$TARGET" ]; then
echo "---- $TARGET (full) ----"
sed -n '1,200p' "$TARGET"
else
echo "FILE NOT FOUND: $TARGET"
fiecho
echo "---- Search for 'dynamoNamespace' ----"
rg -n --hidden -S "dynamoNamespace" || trueecho
echo "---- Search for 'dynamo' (word) ----"
rg -n --hidden -S "\bdynamo\b" || trueecho
echo "---- YAML files under examples/basics/kubernetes ----"
find examples/basics/kubernetes -maxdepth 4 -type f -name "*.yaml" -print | sed -n '1,200p' || trueecho
echo "---- Search for 'namespace:' occurrences in examples (show context) ----"
rg -n -C2 --hidden -S "namespace:" examples || trueecho
echo "---- Show 'metadata:' blocks in that examples path (context) ----"
rg -n -C2 --hidden -S "metadata:" examples/basics/kubernetes || true</blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD) | ||
and two models. | ||
Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo | ||
namespace name for frontend to observe deployed models across all dynamo namespaces. | ||
A shared PVC is configured to store model checkpoint weights fetched from HF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix typos and clarify DGD/namespace behavior.
- “DynamoGraphDeploymen” → “DynamoGraphDeployment”.
- Clarify that the CR schedules workloads to multiple namespaces and that secrets/PVCs must exist there.
-This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD)
-and two models.
-Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo
-namespace name for frontend to observe deployed models across all dynamo namespaces.
+This folder contains Kubernetes manifests to deploy the Dynamo frontend as a standalone DynamoGraphDeployment (DGD)
+and two models.
+The frontend is shared across the two models. It is deployed into the reserved Dynamo namespace `dynamo`, which allows
+the frontend to discover models across all Dynamo namespaces.
+A shared PVC stores model checkpoint weights fetched from Hugging Face.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
This folder contains kubernetes manifests to deploy Dynamo frontend component as a standalone DynamoGraphDeploymen (DGD) | |
and two models. | |
Frontend is shared across the two models. Frontend is deployed to dynamo namespace `dynamo`, which is a reserved dynamo | |
namespace name for frontend to observe deployed models across all dynamo namespaces. | |
A shared PVC is configured to store model checkpoint weights fetched from HF. | |
This folder contains Kubernetes manifests to deploy the Dynamo frontend as a standalone DynamoGraphDeployment (DGD) | |
and two models. | |
The frontend is shared across the two models. It is deployed into the reserved Dynamo namespace `dynamo`, which allows | |
the frontend to discover models across all Dynamo namespaces. | |
A shared PVC stores model checkpoint weights fetched from Hugging Face. |
🤖 Prompt for AI Agents
In examples/basics/kubernetes/shared_frontend/README.md around lines 2 to 6, fix
typos and clarify CR/namespace behavior: change “DynamoGraphDeploymen” to
“DynamoGraphDeployment”, correct spacing and punctuation, and update the
description to state that the DGD CR schedules workloads into multiple target
namespaces (not just deploys the frontend), and that any required secrets and
PVCs must be created in each target namespace where models run; also note the
frontend itself is deployed into the reserved dynamo namespace `dynamo` which
watches those namespaces.
1. Install Dynamo k8s platform helm chart | ||
2. Create a K8S secret with your Huggingface token and then render k8s manifests | ||
```sh | ||
export HF_TOKEN=YOUR_HF_TOKEN | ||
kubectl create secret generic hf-token-secret \ | ||
--from-literal=HF_TOKEN=${HF_TOKEN} \ | ||
--namespace ${NAMESPACE} | ||
kubectl apply -f shared_frontend.yaml --namespace ${NAMESPACE} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secret creation: create in each runtime namespace; avoid apply --namespace.
The pods run in namespaces: dynamo, vllm-agg, agg-qwen. Create the secret in each, and apply the manifest without overriding namespaces.
-2. Create a K8S secret with your Huggingface token and then render k8s manifests
+2. Create a K8S secret with your Hugging Face token in each runtime namespace, then apply the manifests
```sh
export HF_TOKEN=YOUR_HF_TOKEN
-kubectl create secret generic hf-token-secret \
- --from-literal=HF_TOKEN=${HF_TOKEN} \
- --namespace ${NAMESPACE}
-kubectl apply -f shared_frontend.yaml --namespace ${NAMESPACE}
+for ns in dynamo vllm-agg agg-qwen; do
+ kubectl create namespace "$ns" --dry-run=client -o yaml | kubectl apply -f -
+ kubectl -n "$ns" create secret generic hf-token-secret \
+ --from-literal=HF_TOKEN="${HF_TOKEN}" --dry-run=client -o yaml | kubectl apply -f -
+done
+kubectl apply -f shared_frontend.yaml
🤖 Prompt for AI Agents
In examples/basics/kubernetes/shared_frontend/README.md around lines 8 to 16,
the secret creation instructions currently create the HF_TOKEN secret only in a
single namespace and apply the manifest with an overridden namespace; update
instructions to create the hf-token-secret in each runtime namespace (dynamo,
vllm-agg, agg-qwen) and ensure those namespaces exist (create them if missing)
using kubectl dry-run + apply, then create the secret in each namespace via
kubectl -n <ns> create secret ... --from-literal=HF_TOKEN="$HF_TOKEN"
--dry-run=client -o yaml | kubectl apply -f -, and finally apply the
shared_frontend.yaml without passing --namespace so the manifests’ own
namespaces are respected.
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
Outdated
Show resolved
Hide resolved
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
Outdated
Show resolved
Hide resolved
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
Outdated
Show resolved
Hide resolved
examples/basics/kubernetes/shared_frontend/shared_frontend.yaml
Outdated
Show resolved
Hide resolved
c0e89f5
to
f01958f
Compare
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Maksim Khadkevich <[email protected]>
Overview:
added example for a frontend shared across multiple models
Details:
added example for a frontend shared across multiple models
Where should the reviewer start?
review all the files
Summary by CodeRabbit