Skip to content

aihpi/litellm-k8s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

115 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LiteLLM Kubernetes Deployment

AI model gateway

Quick Start

# 1. Clone repo
git clone https://github.com/hpi/litellm-k8s.git
cd litellm-k8s

# 2. Create secrets (see secrets/README.md)
cp secrets/example-secrets.yaml secrets/secrets.yaml
# Edit secrets/secrets.yaml with your values

# 3. Deploy
./scripts/deploy.sh dev

# 4. Port-forward
kubectl port-forward -n litellm service/litellm-service 4000:4000

# 5. Access UI
open http://localhost:4000/ui/login/

Architecture

Internet -> nginx proxy (LoadBalancer) -> LiteLLM (ClusterIP)
                                               |
                                               v
                                         ClusterIP Services
                                               |
                                               v
                                         vLLM Model Pods

An nginx reverse proxy sits in front of LiteLLM and acts as the external-facing LoadBalancer. It blocks the /metrics endpoint from external access (returns 403) while internal Prometheus scrapers can still reach LiteLLM directly via the ClusterIP service.

Staging Namespace

Use ./scripts/deploy.sh staging to create a litellm-staging namespace without redeploying the GPU model workloads.

This staging overlay deploys LiteLLM, the KISZ Auth Wrapper, Postgres, Qdrant, and the Postgres PVC in litellm-staging, then points LiteLLM at the existing model services in litellm via cluster DNS. The staging litellm-service and kisz-auth-wrapper-service are both exposed as LoadBalancer services.

./scripts/deploy.sh staging
kubectl get svc -n litellm-staging litellm-service
kubectl get svc -n litellm-staging kisz-auth-wrapper-service

For the wrapper OIDC flow, create a temporary HTTPS hostname such as llm-portal-staging.<your-domain> that points to the wrapper LoadBalancer and set Authentik's redirect URI to:

https://llm-portal-staging.<your-domain>/callback

Adding Models

See docs/adding-models.md

Infrastructure

  • Cluster: HPI K8s (40x A30)
  • Namespace: litellm
  • GPU Scheduling: Uses GPU requests in model deployments

Maintenance

  • Logs: kubectl logs -n litellm deployment/litellm-proxy -f
  • Restart: kubectl rollout restart -n litellm deployment/litellm-proxy
  • Scale: kubectl scale -n litellm deployment/llama-3b --replicas=2

Handoff / Recent Changes

  • Added scripts:
    • scripts/call_qwen_image_edit.py (image edit via LiteLLM /v1/images/edits)
    • scripts/test_octen_embedding.py (embeddings via LiteLLM /v1/embeddings)
  • Added octen-embedding-8b to LiteLLM model list (default encoding_format: float).
  • Added models/gpt-oss-120b (deployment/service/pvc) with vLLM config mounted from models/gpt-oss-120b/configmap.yaml using GPT-OSS_EAGLE3_Hopper.yaml. (Note: model is not yet added to LiteLLM proxy config.)

Apply model resources:

kubectl apply -k models

Calling the API (via LiteLLM)

Port-forward in dev or access via your ingress.

kubectl port-forward -n litellm service/litellm-service 4000:4000

Chat/completions (example)

curl -sS -H "Authorization: Bearer $LITELLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3b","messages":[{"role":"user","content":"Hello"}]}' \
  http://localhost:4000/v1/chat/completions

Embeddings (octen-embedding-8b)

curl -sS -H "Authorization: Bearer $LITELLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"octen-embedding-8b","input":"Hello from octen","encoding_format":"float"}' \
  http://localhost:4000/v1/embeddings

Or run:

LITELLM_API_KEY=sk-... python3 scripts/test_octen_embedding.py

Image edits (qwen-image-edit)

LITELLM_API_KEY=sk-... python3 scripts/call_qwen_image_edit.py \
  --api-base http://localhost:4000 \
  --prompt "Remove the sleeves; keep fabric/lighting unchanged"

UI Login

Default credentials:

  • Username: admin
  • Password: your LITELLM_MASTER_KEY

Contributors

  • Felix Boelter (@felixboelter)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors