Dynamic Model Routing with Semantic Evaluation and LoRA Adapters

Welcome to the LLM Dynamic Model Routing quickstart! Use this to quickly deploy a semantic router with LoRA adapters and a fallback base model using LiteLLM and vLLM.

Detailed description

This quickstart demonstrates how to build a cost-efficient and scalable LLM deployment by combining semantic routing, LoRA adapters, and vLLM on Red Hat OpenShift AI. Incoming user requests are processed through LiteLLM, semantically evaluated, and routed to the most appropriate LoRA adapter or the default base model to ensure more precise and context-aware responses without hosting multiple full models.

Architecture diagrams

The architecture integrates several components to ensure efficient request handling and accurate responses. Open WebUI provides an intuitive interface for users to interact smoothly with the system, and LiteLLM, which acts as a proxy, utilizes the semantic router to determine which is the most suitable destination—whether it’s the base model or a specialized LoRA adapter. Based on this decision, LiteLLM forwards the request to vLLM for inference.

References

Requirements

Minimum hardware requirements

This demo is designed to run a model using GPU acceleration. The following hardware resources are required:

CPU: 1 vCPU
Memory: 4 GiB
GPU: 1 NVIDIA GPU (e.g., A10, A100, L40S, or similar)

Required software

Red Hat OpenShift
Red Hat OpenShift AI

Required permissions

Standard user. No elevated cluster permissions required.

Install

Let’s dive into the technical aspects of building this setup with a practical example. We are using the Phi-2 LLM as the base model, and two LoRA adapters Phi2-Doctor28e and phi-2-dcot.

This repository packages all the required components as a helm chart, which helps you deploy with one command.

Clone

git clone https://github.com/rh-ai-quickstart/dynamic-model-router
cd dynamic-model-router/chart

Create a project

PROJECT="dynamic-model-router-demo"
oc new-project ${PROJECT}

Install with Helm

helm install dynamic-model-router . --namespace ${PROJECT}

Validating the deployment

After deployment, verify that the setup works as expected by logging into Open WebUI and sending various queries. By checking the LiteLLM logs, you can ensure that requests are correctly routed to either the base model or the relevant adapter.

For instance, in a clinical healthcare scenario, if you ask, “Is there medication for my symptoms?” the query will be routed to the phi-2-doctor adapter. You can then verify the routing by checking the LiteLLM logs, as shown below.

Please keep in mind that while the model may provide useful information, all output should be reviewed for its suitability and it is essential to consult a professional for personalized advice.

To validate the connection between the OpenWebUI and the LiteLLM proxy click on the top left and you should see phi2, dcot and doctor models listed. And to confirm that the proxy is working, you should see similar logs in LiteLLM pod as you ask questions.

LiteLLM: Proxy initialized with Config, Set models:
dcot
doctor
phi2
INFO: 10.131.166.110:55626 - "GET /models HTTP/1.1" 200 OK
INFO: 10.131.166.110:52568 - "GET /models HTTP/1.1" 200 OK
[RouteChoice(name='dcot', function_call=None, similarity_score=0.6218189497571418), RouteChoice(name='doctor', function_call=None, similarity_score=0.6802293320602901)]
doctor
INFO: 10.131.166.110:52576 - "POST /chat/completions HTTP/1.1" 200 OK

Configuring the Semantic Router

The semantic router is invoked by a LiteLLM pre-invoke function and is run before the call to the actual LLM endpoint is made. This functions uses the semantic router framework to decide which models the request should be sent to.

The code is located in litellm-config/custom_router.py if you'd like to make changes.

Uninstall

helm uninstall dynamic-model-router --namespace ${PROJECT}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
chart		chart
images		images
modelcar		modelcar
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic Model Routing with Semantic Evaluation and LoRA Adapters

Detailed description

Architecture diagrams

References

Requirements

Minimum hardware requirements

Required software

Required permissions

Install

Clone

Create a project

Install with Helm

Validating the deployment

Configuring the Semantic Router

Uninstall

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

rh-ai-quickstart/dynamic-model-router

Folders and files

Latest commit

History

Repository files navigation

Dynamic Model Routing with Semantic Evaluation and LoRA Adapters

Detailed description

Architecture diagrams

References

Requirements

Minimum hardware requirements

Required software

Required permissions

Install

Clone

Create a project

Install with Helm

Validating the deployment

Configuring the Semantic Router

Uninstall

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages