From e87efbef23cd6db25380db8a417febe1b2e85eb6 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Tue, 1 Apr 2025 18:27:50 -0700 Subject: [PATCH 01/13] clean up and simplify chatqna tutorials Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/gaudi.md | 788 +++++++------------------------ tutorial/ChatQnA/deploy/xeon.md | 735 ++++++---------------------- tutorial/CodeGen/deploy/xeon.md | 104 +--- 3 files changed, 310 insertions(+), 1317 deletions(-) diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index 851355cc..dec2e63f 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -1,19 +1,10 @@ # Single node on-prem deployment with vLLM or TGI on Gaudi AI Accelerator -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using vLLM or TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and meta-llama/Meta-Llama-3-8B-Instruct model, -deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA in just 5 minutes -and set up the required hardware and software, please follow the instructions in the -[Getting Started](../../../getting-started/README.md) section. +This deployment section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node vLLM or TGI megaservice solution. +The list of microservices from OPEA GenAIComps are used to deploy a single node vLLM or TGI megaservice solution for ChatQnA. 1. Data Prep 2. Embedding @@ -21,49 +12,22 @@ GenAIComps to deploy a single node vLLM or TGI megaservice solution. 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectordb for RAG and -Meta-Llama-3-8B-Instruct model on Intel Gaudi AI Accelerator. We will go through -how to setup docker container to start a microservices and megaservice . The -solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. There are 2 modes you can -use: +The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: 1. Basic UI 2. Conversational UI -Conversational UI is optional, but a feature supported in this example if you -are interested to use. - -To summarize, Below is the flow of contents we will be covering in this tutorial: - -1. Prerequisites -2. Prepare (Building / Pulling) Docker images -3. Use case setup -4. Deploy the use case -5. Interacting with ChatQnA deployment - ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. +The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. ```bash # Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE # Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. +export RELEASE_VERSION= # GenAIExamples git clone https://github.com/opea-project/GenAIExamples.git @@ -72,211 +36,27 @@ git checkout tags/v${RELEASE_VERSION} cd .. ``` -The examples utilize model weights from HuggingFace and Langchain. - -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Setup the HuggingFace token +Set up the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable -```bash -export host_ip=$(hostname -I | awk '{print $1}') -``` - -Make sure to setup Proxies if you are behind a firewall -```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -``` - -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI (conversational React UI is optional). In total, -there are 8 required and 1 optional docker images. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash -docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -Build vLLM docker image with hpu support -```bash -bash ./comps/llms/text-generation/vllm/langchain/dependency/build_docker_vllm.sh hpu -``` - -Build vLLM Microservice image -```bash -docker build --no-cache -t opea/llm-vllm:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/langchain/Dockerfile . -cd .. -``` -::: -:::{tab-item} TGI -:sync: TGI - -```bash -docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . -``` -::: -:::: - -### Build TEI Gaudi Image - -Since a TEI Gaudi Docker image hasn't been published, we'll need to build it from the [tei-gaudi](https://github.com/huggingface/tei-gaudi) repository. - -```bash -git clone https://github.com/huggingface/tei-gaudi -cd tei-gaudi/ -docker build --no-cache -f Dockerfile-hpu -t opea/tei-gaudi:${RELEASE_VERSION} . -cd .. -``` - - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - -```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build Other Service images - -If you want to enable guardrails microservice in the pipeline, please use the below command instead: - -```bash -docker build --no-cache -t opea/chatqna-guardrails:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.guardrails . -``` - -### Build the UI Image - -As mentioned, you can build 2 modes of UI - -*Basic UI* - +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +export host_ip="localhost" ``` -*Conversation UI* -If you want a conversational experience with chatqna megaservice. - +For machines behind a firewall, set up the proxy environment variables: ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-conversation-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service ``` -### Sanity Check -Check if you have the below set of docker images before moving on to the next step: - -::::{tab-set} -:::{tab-item} vllm -:sync: vllm - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/tei-gaudi:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} or opea/chatqna-guardrails:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/vllm:${RELEASE_VERSION} -* opea/llm-vllm:${RELEASE_VERSION} - -::: -:::{tab-item} TGI -:sync: TGI - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/tei-gaudi:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} or opea/chatqna-guardrails:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-tgi:${RELEASE_VERSION} -::: -:::: - -::::: -:::::: - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. ::::{tab-set} @@ -307,13 +87,10 @@ environment variable or `compose.yaml` file. |LLM | TGI | meta-llama/Meta-Llama-3-8B-Instruct|OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi @@ -322,9 +99,7 @@ source ./set_env.sh ## Deploy the use case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} vllm @@ -352,83 +127,81 @@ docker compose -f compose_guardrails.yaml up -d ::: :::: -### Validate microservice -#### Check Env Variables -Check the start up log by `docker compose -f compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +### Check Env Variables +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} vllm :sync: vllm -```bash - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d - [+] Running 12/12 - ✔ Network gaudi_default Created 0.1s - ✔ Container tei-embedding-gaudi-server Started 1.3s - ✔ Container vllm-gaudi-server Started 1.3s - ✔ Container tei-reranking-gaudi-server Started 0.8s - ✔ Container redis-vector-db Started 0.7s - ✔ Container reranking-tei-gaudi-server Started 1.7s - ✔ Container retriever-redis-server Started 1.3s - ✔ Container llm-vllm-gaudi-server Started 2.1s - ✔ Container dataprep-redis-server Started 2.1s - ✔ Container embedding-tei-server Started 2.0s - ✔ Container chatqna-gaudi-backend-server Started 2.3s - ✔ Container chatqna-gaudi-ui-server Started 2.6s -``` - + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f compose.yaml up -d + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: `version` is obsolete ::: + :::{tab-item} TGI :sync: TGI -```bash - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d - [+] Running 12/12 - ✔ Network gaudi_default Created 0.1s - ✔ Container tei-reranking-gaudi-server Started 1.1s - ✔ Container tgi-gaudi-server Started 0.8s - ✔ Container redis-vector-db Started 1.5s - ✔ Container tei-embedding-gaudi-server Started 1.1s - ✔ Container retriever-redis-server Started 2.7s - ✔ Container reranking-tei-gaudi-server Started 2.0s - ✔ Container dataprep-redis-server Started 2.5s - ✔ Container embedding-tei-server Started 2.1s - ✔ Container llm-tgi-gaudi-server Started 1.8s - ✔ Container chatqna-gaudi-backend-server Started 2.9s - ✔ Container chatqna-gaudi-ui-server Started 3.3s -``` + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f compose_tgi.yaml up -d + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml: `version` is obsolete ::: :::: -#### Check the container status +### Check container statuses + +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -Check if all the containers launched via docker compose has started +Run this command to see this info: +```bash +docker ps -a +``` -For example, the ChatQnA example starts 11 docker (services), check these docker -containers are all running, i.e, all the containers `STATUS` are `Up` -To do a quick sanity check, try `docker ps -a` to see if all the containers are running. +The sample output is for OPEA release v1.2. ::::{tab-set} :::{tab-item} vllm :sync: vllm ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -42c8d5ec67e9 opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-gaudi-ui-server -7f7037a75f8b opea/chatqna:${RELEASE_VERSION} "python chatqna.py" About a minute ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-gaudi-backend-server -4049c181da93 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" About a minute ago Up About a minute 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -171816f0a789 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" About a minute ago Up About a minute 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -10ee6dec7d37 opea/llm-vllm:${RELEASE_VERSION} "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-vllm-gaudi-server -ce4e7802a371 opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" About a minute ago Up About a minute 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -be6cd2d0ea38 opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" About a minute ago Up About a minute 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server -cc45ff032e8c opea/tei-gaudi:${RELEASE_VERSION} "text-embeddings-rou…" About a minute ago Up About a minute 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server -4969ec3aea02 opea/vllm-gaudi:${RELEASE_VERSION} "/bin/bash -c 'expor…" About a minute ago Up About a minute 0.0.0.0:8007->80/tcp, :::8007->80/tcp vllm-gaudi-server -0657cb66df78 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -684d3e9d204a ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" About a minute ago Up About a minute 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS + NAMES +eabb930edad6 opea/nginx:1.2 "/docker-entrypoint.…" 9 seconds ago Up 8 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp + chatqna-gaudi-nginx-server +7e3c16a791b1 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 9 seconds ago Up 8 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp + chatqna-gaudi-ui-server +482365a6e945 opea/chatqna:1.2 "python chatqna.py" 9 seconds ago Up 9 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp + chatqna-gaudi-backend-server +1379226ad3ff opea/dataprep:1.2 "sh -c 'python $( [ …" 9 seconds ago Up 9 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp + dataprep-redis-server +1cebe2d70e40 opea/retriever:1.2 "python opea_retriev…" 9 seconds ago Up 9 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp + retriever-redis-server +bfe41a5353b6 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 10 seconds ago Up 9 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp + tei-reranking-gaudi-server +11a94e7ce3c9 opea/vllm-gaudi:1.2 "python3 -m vllm.ent…" 10 seconds ago Up 9 seconds (health: starting) 0.0.0.0:8007->80/tcp, [::]:8007->80/tcp + vllm-gaudi-server +4d7b9aab82b1 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 10 seconds ago Up 9 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0. +0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +9e0d0807bbf6 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 10 seconds ago Up 9 seconds 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp + tei-embedding-gaudi-server ``` ::: :::{tab-item} TGI :sync: TGI +TODO: update ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0355d705484a opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-gaudi-ui-server @@ -447,85 +220,19 @@ c59178629901 redis/redis-stack:7.2.0-v9 "/entrypo ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed. - -### Dataprep Microservice(Optional) - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to get the file on a terminal: +Each docker container's log can also be checked using: ```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +docker logs ``` -Upload the file: +## Validate microservices -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` - -#### Delete all uploaded files and links - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` +This section will walk through the different ways to interact with the microservices deployed. ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:8090/embed \ @@ -534,31 +241,13 @@ curl ${host_ip}:8090/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. - -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -569,24 +258,19 @@ curl http://${host_ip}:7000/v1/retrieval \ -H 'Content-Type: application/json' ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top +`n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. -The output is retrieved text that relevant to the input data: +The output is retrieved text that is relevant to the input data: ```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } - +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document +index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -594,65 +278,27 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - +Sample output: ```bash -curl http://${host_ip}:8000/v1/reranking \ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking \ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - -```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - -``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. - ### vLLM and TGI Service -In first startup, this service will take more time to download the model files. -After it's finished, the service will be ready. +In first startup, this service will take a few minutes to download the model files and perform warm up. After it's finished, the service will be ready. -Try the command below to check whether the LLM serving is ready. +::::{tab-set} -```bash -docker logs ${CONTAINER_ID} | grep Connected -``` +:::{tab-item} vllm +:sync: vllm -If the service is ready, you will get the response like below. +Try the command below to check whether the LLM service is ready. The output should be "Application startup complete." +```bash +docker logs vllm-service 2>&1 | grep complete ``` -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected -``` - -::::{tab-set} -:::{tab-item} vllm -:sync: vllm +Run the command below to use the vLLM service to generate text for the input prompt. Sample output is also shown. ```bash curl http://${host_ip}:8007/v1/completions \ @@ -665,19 +311,22 @@ curl http://${host_ip}:8007/v1/completions \ }' ``` -vLLM service generate text for the input prompt. Here is the expected result -from vllm: - -``` +```bash {"id":"cmpl-be8e1d681eb045f082a7b26d5dba42ff","object":"text_completion","created":1726269914,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":6,"total_tokens":38,"completion_tokens":32}}d ``` -**NOTE**: After launch the vLLM, it takes few minutes for vLLM server to load -LLM model and warm up. ::: :::{tab-item} TGI :sync: TGI +Try the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" + +```bash +docker logs tgi-service | grep Connected +``` + +Run the command below to use the TGI service to generate text for the input prompt. Sample output is also shown. + ```bash curl http://${host_ip}:8005/generate \ -X POST \ @@ -685,71 +334,70 @@ curl http://${host_ip}:8005/generate \ -H 'Content-Type: application/json' ``` -TGI service generate text for the input prompt. Here is the expected result from TGI: - ```bash {"generated_text":"Artificial Intelligence (AI) has become a very popular buzzword in the tech industry. While the phrase conjures images of sentient robots and self-driving cars, our current AI landscape is much more subtle. In fact, it most often manifests in the forms of algorithms that help recognize the faces of"} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. ::: :::: +### Dataprep Microservice -### LLM Microservice +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. -This service depends on the above LLM backend service startup. Give it a couple minutes to be ready on the first startup. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` -::::{tab-set} -:::{tab-item} vllm -:sync: vllm +Upload the file: ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,\ - "frequency_penalty":0,"presence_penalty":0, "streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" ``` -For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) -::: -:::{tab-item} TGI -:sync: TGI +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' ``` -For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".) -::: -:::: +The list of uploaded files can be retrieved using this command: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" +``` -You will get generated text from LLM: +To delete the file or link, use the following commands: +#### Delete link ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' Learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' Machine' -data: b' Learning' -data: b' that' -data: b' is' -data: b' concerned' -data: b' with' -data: b' algorithms' -data: b' inspired' -data: b' by' -data: [DONE] +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" +``` + +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` + +#### Delete all uploaded files and links + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" ``` -### MegaService +### ChatQnA MegaService ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ @@ -795,9 +443,8 @@ data: b'' data: [DONE] ``` -#### Guardrail Microservice -If you had enabled Guardrail microservice, access via the below curl command - +#### (Optional) Guardrail Microservice +If the Guardrail microservice is enabled, test it using the command below: ```bash curl http://${host_ip}:9090/v1/guardrails\ -X POST \ @@ -806,152 +453,49 @@ curl http://${host_ip}:9090/v1/guardrails\ ``` ## Launch UI + ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: -```bash - chaqna-gaudi-ui-server: +To access the frontend, open the following URL in a web browser: http://{host_ip}:80. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +```yaml + chatqna-gaudi-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - "5173:5173" ``` -### Conversational UI -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-gaudi-ui-server` service with the `chatqna-gaudi-conversation-ui-server` service as per the config below: -```bash -chaqna-gaudi-conversation-ui-server: +### (Optional) Conversational UI + +To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-gaudi-ui-server` service with the `chatqna-gaudi-conversation-ui-server` service as shown below: +```yaml +chaqtna-gaudi-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} container_name: chatqna-gaudi-conversation-ui-server environment: - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} ports: - - "5174:5174" + - "5174:80" depends_on: - - chaqna-gaudi-backend-server + - chatqna-gaudi-backend-server ipc: host restart: always ``` -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: -```bash - chaqna-gaudi-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" -``` - -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied -2024-06-05T01:30:30.697123534Z -2024-06-05T01:30:30.697148330Z For more information, try '--help'. - -``` - -The log indicates the `MODEL_ID` is not set. - - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml` - -```yaml - vllm-service: - image: ${REGISTRY:-opea}/vllm-gaudi:${RELEASE_VERSION:-latest} - container_name: vllm-gaudi-server - ports: - - "8007:80" - volumes: - - "./data:/data" - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HABANA_VISIBLE_DEVICES: all - OMPI_MCA_btl_vader_single_copy_mechanism: none - LLM_MODEL_ID: ${LLM_MODEL_ID} - runtime: habana - cap_add: - - SYS_NICE - ipc: host - command: /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $LLM_MODEL_ID --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048" -``` - -::: -:::{tab-item} TGI -:sync: TGI -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml` +In addition, modify the `chatqna-gaudi-nginx-server` `depends_on` field to include `chatqna-gaudi-conversation-ui-server` instead of `chatqna-gaudi-ui-server`. +Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml - tgi-service: - image: ghcr.io/huggingface/tgi-gaudi:2.0.1 - container_name: tgi-gaudi-server + chatqna-gaudi-conversation-ui-server: + image: opea/chatqna-conversation-ui:${TAG:-latest} + ... ports: - - "8005:80" - volumes: - - "./data:/data" - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HF_HUB_DISABLE_PROGRESS_BARS: 1 - HF_HUB_ENABLE_HF_TRANSFER: 0 - HABANA_VISIBLE_DEVICES: ${llm_service_devices} - OMPI_MCA_btl_vader_single_copy_mechanism: none - runtime: habana - cap_add: - - SYS_NICE - ipc: host - command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 -``` -::: -:::: - - -The input `MODEL_ID` is `${LLM_MODEL_ID}` - -Check environment variable `LLM_MODEL_ID` is set correctly, spelled correctly. -Set the `LLM_MODEL_ID` then restart the containers. - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -```bash -docker compose -f compose.yaml logs -``` -::: -:::{tab-item} TGI -:sync: TGI - -```bash -docker compose -f compose_tgi.yaml logs + - "80:80" ``` -::: -:::: ## Stop the services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} vllm diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index fa59120f..1ff48f77 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -1,19 +1,10 @@ # Single node on-prem deployment with vLLM or TGI on Xeon Scalable processors -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using vLLM or TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and meta-llama/Meta-Llama-3-8B-Instruct model, -deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes -and set up the required hardware and software, please follow the instructions in the -[Getting Started](../../../getting-started/README.md) section. +This deployment section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node vLLM or TGI megaservice solution. +The list of microservices from OPEA GenAIComps are used to deploy a single node vLLM or TGI megaservice solution for ChatQnA. 1. Data Prep 2. Embedding @@ -21,41 +12,22 @@ GenAIComps to deploy a single node vLLM or TGI megaservice solution. 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectordb for RAG and -Meta-Llama-3-8B-Instruct model on Intel Xeon Scalable processors. We will go through -how to setup docker container to start a microservices and megaservice . The -solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. There are 2 modes you can -use: +The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: 1. Basic UI 2. Conversational UI -Conversational UI is optional, but a feature supported in this example if you -are interested to use. - ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. +The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. ```bash # Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE # Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. +export RELEASE_VERSION= # GenAIExamples git clone https://github.com/opea-project/GenAIExamples.git @@ -64,199 +36,27 @@ git checkout tags/v${RELEASE_VERSION} cd .. ``` -Setup the HuggingFace token +Set up the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable -```bash -export host_ip=$(hostname -I | awk '{print $1}') -``` - -Make sure to setup Proxies if you are behind a firewall -```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -``` - -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI (conversational React UI is optional). In total, -there are 8 required and an optional docker images. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash - docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -We build the vllm docker image from source +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -git clone https://github.com/vllm-project/vllm.git -cd vllm -docker build --no-cache -t opea/vllm:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f Dockerfile.cpu . -cd .. +export host_ip="localhost" ``` -Next, we'll build the vllm microservice docker. This will set the entry point -needed for the vllm to suit the ChatQnA examples +For machines behind a firewall, set up the proxy environment variables: ```bash -docker build --no-cache -t opea/llm-vllm:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy \ - -f comps/llms/text-generation/vllm/langchain/Dockerfile.microservice . - +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service ``` -::: -:::{tab-item} TGI -:sync: TGI - -```bash -docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . -``` -::: -:::: - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - -```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build Other Service images - -#### Build the UI Image - -As mentioned, you can build 2 modes of UI - -*Basic UI* - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . -``` - -*Conversation UI* -If you want a conversational experience with chatqna megaservice. - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-conversation-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . -``` - -### Sanity Check -Check if you have the below set of docker images, before moving on to the next step: - -::::{tab-set} -:::{tab-item} vllm -:sync: vllm - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/vllm:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-vllm:${RELEASE_VERSION} -::: -:::{tab-item} TGI -:sync: TGI - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-tgi:${RELEASE_VERSION} -::: -:::: - -::::: -:::::: ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. ::::{tab-set} @@ -272,8 +72,6 @@ with the tools |LLM | vLLM | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::{tab-item} TGI :sync: TGI @@ -287,13 +85,10 @@ environment variable or `compose.yaml` file. |LLM | TGI | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon @@ -302,9 +97,7 @@ source ./set_env.sh ## Deploy the use case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} vllm @@ -323,16 +116,13 @@ docker compose -f compose_tgi.yaml up -d ::: :::: -### Validate microservice -#### Check Env Variables +### Check Env Variables +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} vllm :sync: vllm - Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. - - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f compose.yaml up -d WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. @@ -343,13 +133,11 @@ The warning messages print out the variables if they are **NOT** set. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml: `version` is obsolete ::: + :::{tab-item} TGI :sync: TGI - Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. - - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f compose_tgi.yaml up -d WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. @@ -358,17 +146,20 @@ The warning messages print out the variables if they are **NOT** set. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. - WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml: `version` is obsolete + WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose_tgi.yaml: `version` is obsolete ::: :::: -#### Check the container status +### Check container statuses + +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -Check if all the containers launched via docker compose has started +Run this command to see this info: +```bash +docker ps -a +``` -For example, the ChatQnA example starts 11 docker (services), check these docker -containers are all running, i.e, all the containers `STATUS` are `Up` -To do a quick sanity check, try `docker ps -a` to see if all the containers are running +The sample output is for OPEA release v1.2. ::::{tab-set} @@ -376,117 +167,49 @@ To do a quick sanity check, try `docker ps -a` to see if all the containers are :sync: vllm ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server -d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server -b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server -24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -24cae0db1a70 opea/llm-vllm:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-vllm-server -ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -b98fa07a4f5c opea/vllm:${RELEASE_VERSION} "python3 -m vllm.ent…" 32 hours ago Up 2 hours 0.0.0.0:9009->80/tcp, :::9009->80/tcp vllm-service -79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server -4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +25964cd40c51 opea/nginx:1.2 "/docker-entrypoint.…" 37 minutes ago Up 37 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server +bca19cf35370 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 37 minutes ago Up 37 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server +e9622436428a opea/chatqna:1.2 "python chatqna.py" 37 minutes ago Up 37 minutes 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server +514acfb8f398 opea/dataprep:1.2 "sh -c 'python $( [ …" 37 minutes ago Up 37 minutes 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +dbaf2116ae4b opea/retriever:1.2 "python opea_retriev…" 37 minutes ago Up 37 minutes 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +82d802dd79c0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 minutes ago Up 37 minutes 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server +20aebf41b92b opea/vllm:1.2 "python3 -m vllm.ent…" 37 minutes ago Up 37 minutes (unhealthy) 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp vllm-service +590ee468e4b7 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 minutes ago Up 37 minutes 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +df543e8425ea ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 minutes ago Up 37 minutes 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` ::: :::{tab-item} TGI :sync: TGI ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server -d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server -b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server -24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -24cae0db1a70 opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server -ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server -4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +f303bf48dd43 opea/nginx:1.2 "/docker-entrypoint.…" 4 seconds ago Up 3 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server +0a2597a4baa0 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 4 seconds ago Up 3 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server +5b5a37ba59ed opea/chatqna:1.2 "python chatqna.py" 4 seconds ago Up 3 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server +b2ec04f4d3d5 opea/dataprep:1.2 "sh -c 'python $( [ …" 4 seconds ago Up 3 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +c6347c8758e4 opea/retriever:1.2 "python opea_retriev…" 4 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +13403b62e768 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 4 seconds ago Up 3 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service +00509c41487b redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 4 seconds ago Up 3 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +3e6e650f73a9 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 4 seconds ago Up 3 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server +105d130b80ac ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 4 seconds ago Up 3 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed - -### Dataprep Microservice(Optional) - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to get the file on a terminal: - -```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -``` - -Upload the file: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: +Each docker container's log can also be checked using: ```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' +docker logs ``` -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` +## Validate microservices -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` +This section will walk through the different ways to interact with the microservices deployed. -#### Delete file - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` - -#### Delete all uploaded files and links - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:6006/embed \ @@ -495,31 +218,13 @@ curl ${host_ip}:6006/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. - -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -528,26 +233,21 @@ curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. -The output is retrieved text that relevant to the input data: -```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top +`n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. +The output is retrieved text that is relevant to the input data: +```bash +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document +index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -555,179 +255,116 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - +Sample output: ```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -Here is the output: - -```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - -``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. - ### vLLM and TGI Service -In first startup, this service will take more time to download the model files. -After it's finished, the service will be ready. - -Try the command below to check whether the LLM serving is ready. - -```bash -docker logs ${CONTAINER_ID} | grep Connected -``` - -If the service is ready, you will get the response like below. - -``` -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected -``` +In first startup, this service will take a few minutes to download the model files and perform warm up. After it's finished, the service will be ready. ::::{tab-set} :::{tab-item} vllm :sync: vllm +Try the command below to check whether the LLM service is ready. The output should be "Application startup complete." + ```bash -curl http://${host_ip}:9009/v1/completions \ - -H "Content-Type: application/json" \ - -d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", \ - "prompt": "What is Deep Learning?", \ - "max_tokens": 32, "temperature": 0}' +docker logs vllm-service 2>&1 | grep complete ``` -vLLM service generates text for the input prompt. Here is the expected result -from vllm: +::: +:::{tab-item} TGI +:sync: TGI + +Try the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" ```bash -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +docker logs tgi-service | grep Connected ``` -**NOTE**: After launch the vLLM, it takes few minutes for vLLM server to load -LLM model and warm up. ::: -:::{tab-item} TGI -:sync: TGI +:::: +Run the command below to use the vLLM or TGI service to generate text for the input prompt. Sample output is also shown. ```bash -curl http://${host_ip}:9009/generate \ +curl http://${host_ip}:9009/v1/chat/completions \ -X POST \ - -d '{"inputs":"What is Deep Learning?", \ - "parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \ -H 'Content-Type: application/json' - ``` -TGI service generate text for the input prompt. Here is the expected result from TGI: - ```bash -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +{"id":"chatcmpl-cc4300a173af48989cac841f54ebca09","object":"chat.completion","created":1743553002,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning is a subfield of machine learning that is inspired by the structure and function","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":32,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. -::: -:::: - +### Dataprep Microservice -### LLM Microservice +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. -This service depends on above LLM backend service startup. It will be ready after long time, -to wait for them being ready in first startup. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` -::::{tab-set} +Upload the file: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" +``` -:::{tab-item} vllm -:sync: vllm +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' +``` +The list of uploaded files can be retrieved using this command: ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,\ - "frequency_penalty":0,"presence_penalty":0, "streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" ``` -For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) -::: -:::{tab-item} TGI -:sync: TGI +To delete the file or link, use the following commands: +#### Delete link ```bash -curl http://${host_ip}:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +# The dataprep service will add a .txt postfix for link file +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" ``` -For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".) -::: -:::: +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` -You will get generated text from LLM: +#### Delete all uploaded files and links ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" ``` -### MegaService +### ChatQnA MegaService ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' - ``` -Here is the output for your reference: - +Here is the output for reference: ```bash data: b'\n' data: b'An' @@ -764,149 +401,51 @@ data: b'' data: [DONE] ``` -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied -2024-06-05T01:30:30.697123534Z -2024-06-05T01:30:30.697148330Z For more information, try '--help'. - -``` - -The log indicates the `MODEL_ID` is not set. - - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml` - -```yaml -vllm_service: - image: ${REGISTRY:-opea}/vllm:${TAG:-latest} - container_name: vllm-service - ports: - - "9009:80" - volumes: - - "./data:/data" - shm_size: 128g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - LLM_MODEL_ID: ${LLM_MODEL_ID} - command: --model $LLM_MODEL_ID --host 0.0.0.0 --port 80 - -``` -::: -:::{tab-item} TGI -:sync: TGI - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose_tgi.yaml` - -```yaml - tgi-service: - image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu - container_name: tgi-service - ports: - - "9009:80" - volumes: - - "./data:/data" - shm_size: 1g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HF_HUB_DISABLE_PROGRESS_BARS: 1 - HF_HUB_ENABLE_HF_TRANSFER: 0 - command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0 - -``` -::: -:::: - - -The input `MODEL_ID` is `${LLM_MODEL_ID}` - -Check environment variable `LLM_MODEL_ID` is set correctly, spelled correctly. -Set the `LLM_MODEL_ID` then restart the containers. - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -```bash -docker compose -f compose.yaml logs -``` -::: -:::{tab-item} TGI -:sync: TGI - -```bash -docker compose -f compose_tgi.yaml logs -``` -::: -:::: - ## Launch UI ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: +To access the frontend, open the following URL in a web browser: http://{host_ip}:80. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-xeon-ui-server: + chatqna-xeon-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - "5173:5173" ``` -### Conversational UI +### (Optional) Conversational UI -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below: +To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as shown below: ```yaml -chaqna-xeon-conversation-ui-server: +chaqtna-xeon-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} container_name: chatqna-xeon-conversation-ui-server environment: - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} ports: - - "5174:5174" + - "5174:80" depends_on: - - chaqna-xeon-backend-server + - chatqna-xeon-backend-server ipc: host restart: always ``` -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +In addition, modify the `chatqna-xeon-nginx-server` `depends_on` field to include `chatqna-xeon-conversation-ui-server` instead of `chatqna-xeon-ui-server`. +Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-xeon-conversation-ui-server: + chatqna-xeon-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} ... ports: - "80:80" ``` -### Stop the services +## Stop the services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} vllm @@ -920,7 +459,7 @@ docker compose -f compose.yaml down :sync: TGI ```bash -docker compose -f compose.yaml down +docker compose -f compose_tgi.yaml down ``` ::: :::: diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md index 4541d970..771ac968 100644 --- a/tutorial/CodeGen/deploy/xeon.md +++ b/tutorial/CodeGen/deploy/xeon.md @@ -49,12 +49,6 @@ cd $WORKSPACE # Set desired release version - number only export RELEASE_VERSION= -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. - # GenAIExamples git clone https://github.com/opea-project/GenAIExamples.git cd GenAIExamples @@ -85,94 +79,9 @@ export http_proxy=${your_http_proxy} export https_proxy=${your_http_proxy} ``` -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -CodeGen, the following docker images will be needed: LLM with TGI. -Additionally, you will need to build docker images for the -CodeGen megaservice, and UI (React UI is optional). In total, -there are **3 required docker images** and an optional docker image. - -### Build/Pull Microservice image - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build LLM Image - -```bash -docker build -t opea/llm-textgen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . -``` - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. The LLM microservice and -flow of data are defined in the `codegen.py` file. You can also add or -remove microservices and customize the megaservice to suit your needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/CodeGen -``` - -```bash -docker build -t opea/codegen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build the UI Image - -You can build 2 modes of UI - -*Basic UI* - -```bash -cd $WORKSPACE/GenAIExamples/CodeGen/ui/ -docker build -t opea/codegen-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . -``` - -*React UI (Optional)* -If you want a React-based frontend. - -```bash -cd $WORKSPACE/GenAIExamples/CodeGen/ui/ -docker build --no-cache -t opea/codegen-react-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . -``` - -### Sanity Check -Check if you have the following set of docker images by running the command `docker images` before moving on to the next step: - -* `opea/llm-tgi:${RELEASE_VERSION}` -* `opea/codegen:${RELEASE_VERSION}` -* `opea/codegen-ui:${RELEASE_VERSION}` -* `opea/codegen-react-ui:${RELEASE_VERSION}` (optional) - -::::: -:::::: - ## Use Case Setup -The use case will use the following combination of GenAIComps and tools +The use case will use the following combination of GenAIComps and tools: |Use Case Components | Tools | Model | Service Type | |---------------- |--------------|-----------------------------|-------| @@ -239,11 +148,12 @@ containers are all running, i.e, all the containers `STATUS` are `Up`. You can do this with the `docker ps -a` command. ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -bbd235074c3d opea/codegen-ui:${RELEASE_VERSION} "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-xeon-ui-server -8d3872ca66fa opea/codegen:${RELEASE_VERSION} "python codegen.py" About a minute ago Up About a minute 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-xeom-backend-server -b9fc39f51cdb opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-xeon-server -39994e007f15 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" About a minute ago Up About a minute 0.0.0.0:8028->80/tcp, :::8028->80/tcp tgi-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +2b8b191b30f7 opea/codegen-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 8 minutes ago Up 6 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codegen-xeon-ui-server +01000c65d1b8 opea/codegen:${RELEASE_VERSION} "python codegen.py" 8 minutes ago Up 6 minutes 0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp codegen-xeon-backend-server +aa1b05a9a148 opea/llm-textgen:${RELEASE_VERSION} "bash entrypoint.sh" 8 minutes ago Up 6 minutes 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp llm-textgen-server +948d45c46721 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 8 minutes ago Up 8 minutes (healthy) 0.0.0.0:8028->80/tcp, [::]:8028->80/tcp tgi-service + ``` ## Interacting with CodeGen for Deployment From 9e43a2a0e1a631b762fe8adc1bd35a2ef9daa0e8 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Tue, 1 Apr 2025 18:34:03 -0700 Subject: [PATCH 02/13] undo CodeGen changes Signed-off-by: alexsin368 --- tutorial/CodeGen/deploy/xeon.md | 106 +++++++++++++++++++++++++++++--- 1 file changed, 98 insertions(+), 8 deletions(-) diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md index 771ac968..8cdefb4e 100644 --- a/tutorial/CodeGen/deploy/xeon.md +++ b/tutorial/CodeGen/deploy/xeon.md @@ -49,6 +49,12 @@ cd $WORKSPACE # Set desired release version - number only export RELEASE_VERSION= +# GenAIComps +git clone https://github.com/opea-project/GenAIComps.git +cd GenAIComps +git checkout tags/v${RELEASE_VERSION} +cd .. + # GenAIExamples git clone https://github.com/opea-project/GenAIExamples.git cd GenAIExamples @@ -79,9 +85,94 @@ export http_proxy=${your_http_proxy} export https_proxy=${your_http_proxy} ``` +## Prepare (Building / Pulling) Docker images + +This step will involve building/pulling relevant docker +images with step-by-step process along with sanity check in the end. For +CodeGen, the following docker images will be needed: LLM with TGI. +Additionally, you will need to build docker images for the +CodeGen megaservice, and UI (React UI is optional). In total, +there are **3 required docker images** and an optional docker image. + +### Build/Pull Microservice image + +::::::{tab-set} + +:::::{tab-item} Pull +:sync: Pull + +If you decide to pull the docker containers and not build them locally, +you can proceed to the next step where all the necessary containers will +be pulled in from Docker Hub. + +::::: +:::::{tab-item} Build +:sync: Build + +Follow the steps below to build the docker images from within the `GenAIComps` folder. +**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front +of ${RELEASE_VERSION} to reference the correct image on Docker Hub. + +```bash +cd $WORKSPACE/GenAIComps +``` + +#### Build LLM Image + +```bash +docker build -t opea/llm-textgen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . +``` + +### Build Mega Service images + +The Megaservice is a pipeline that channels data through different +microservices, each performing varied tasks. The LLM microservice and +flow of data are defined in the `codegen.py` file. You can also add or +remove microservices and customize the megaservice to suit your needs. + +Build the megaservice image for this use case + +```bash +cd $WORKSPACE/GenAIExamples/CodeGen +``` + +```bash +docker build -t opea/codegen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +``` + +### Build the UI Image + +You can build 2 modes of UI + +*Basic UI* + +```bash +cd $WORKSPACE/GenAIExamples/CodeGen/ui/ +docker build -t opea/codegen-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +``` + +*React UI (Optional)* +If you want a React-based frontend. + +```bash +cd $WORKSPACE/GenAIExamples/CodeGen/ui/ +docker build --no-cache -t opea/codegen-react-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +``` + +### Sanity Check +Check if you have the following set of docker images by running the command `docker images` before moving on to the next step: + +* `opea/llm-tgi:${RELEASE_VERSION}` +* `opea/codegen:${RELEASE_VERSION}` +* `opea/codegen-ui:${RELEASE_VERSION}` +* `opea/codegen-react-ui:${RELEASE_VERSION}` (optional) + +::::: +:::::: + ## Use Case Setup -The use case will use the following combination of GenAIComps and tools: +The use case will use the following combination of GenAIComps and tools |Use Case Components | Tools | Model | Service Type | |---------------- |--------------|-----------------------------|-------| @@ -148,12 +239,11 @@ containers are all running, i.e, all the containers `STATUS` are `Up`. You can do this with the `docker ps -a` command. ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -2b8b191b30f7 opea/codegen-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 8 minutes ago Up 6 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codegen-xeon-ui-server -01000c65d1b8 opea/codegen:${RELEASE_VERSION} "python codegen.py" 8 minutes ago Up 6 minutes 0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp codegen-xeon-backend-server -aa1b05a9a148 opea/llm-textgen:${RELEASE_VERSION} "bash entrypoint.sh" 8 minutes ago Up 6 minutes 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp llm-textgen-server -948d45c46721 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 8 minutes ago Up 8 minutes (healthy) 0.0.0.0:8028->80/tcp, [::]:8028->80/tcp tgi-service - +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +bbd235074c3d opea/codegen-ui:${RELEASE_VERSION} "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-xeon-ui-server +8d3872ca66fa opea/codegen:${RELEASE_VERSION} "python codegen.py" About a minute ago Up About a minute 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-xeom-backend-server +b9fc39f51cdb opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-xeon-server +39994e007f15 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" About a minute ago Up About a minute 0.0.0.0:8028->80/tcp, :::8028->80/tcp tgi-server ``` ## Interacting with CodeGen for Deployment @@ -296,4 +386,4 @@ the newly selected model. Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: ```bash docker compose down -``` +``` \ No newline at end of file From d06b05e7591af7f9bba7f29fc341d3fc5cfa6625 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Wed, 2 Apr 2025 14:50:41 -0700 Subject: [PATCH 03/13] update remaining docs, typos, NGINX Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 538 +++++++---------------------- tutorial/ChatQnA/deploy/gaudi.md | 70 ++-- tutorial/ChatQnA/deploy/nvidia.md | 543 +++++------------------------- tutorial/ChatQnA/deploy/xeon.md | 38 ++- 4 files changed, 266 insertions(+), 923 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 43ba6229..8def0abd 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -1,17 +1,10 @@ # Single node on-prem deployment with Ollama on AIPC -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using Ollama. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and the llama-3 model, -deployed on the client CPU. +This deployment section covers single-node on-prem deployment of the ChatQnA example using the Ollama. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and a llama-3 model deployed on the client CPU. ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node Ollama megaservice solution. +The list of microservices from OPEA GenAIComps are used to deploy a single node Ollama megaservice solution for ChatQnA. 1. Data Prep 2. Embedding @@ -19,20 +12,11 @@ GenAIComps to deploy a single node Ollama megaservice solution. 4. Reranking 5. LLM with Ollama -The solution is aimed to show how to use Redis vectordb for RAG and -the llama-3 model on Intel Client PCs. We will go through -how to setup docker container to start microservices and megaservice. -The solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. +The solution is aimed to show how to use Redis vectorDB for RAG and the llama-3 model for LLM inference on Intel Client PCs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. +The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. ```bash # Set workspace @@ -42,12 +26,6 @@ cd $WORKSPACE # Set desired release version - number only export RELEASE_VERSION= -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. - # GenAIExamples git clone https://github.com/opea-project/GenAIExamples.git cd GenAIExamples @@ -55,33 +33,36 @@ git checkout tags/v${RELEASE_VERSION} cd .. ``` -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Setup the HuggingFace token +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -export host_ip=$(hostname -I | awk '{print $1}') +export host_ip="localhost" ``` -Make sure to setup Proxies if you are behind a firewall +Set the NGINX port. ```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} +# Example: NGINX_PORT=80 +export NGINX_PORT= +``` + +For machines behind a firewall, set up the proxy environment variables: +```bash +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service ``` The examples utilize model weights from Ollama and langchain. ### Set Up Ollama LLM Service -We use [Ollama](https://ollama.com/) as our LLM service for AIPC. +Use [Ollama](https://ollama.com/) as the LLM service for AIPC. -Please follow the instructions to set up Ollama on your PC. This will set the entrypoint needed for the Ollama to suit the ChatQnA examples. +Please follow the instructions to set up Ollama on the PC. This will set the entrypoint needed for the Ollama to work with the ChatQnA example. #### Install Ollama Service @@ -94,7 +75,7 @@ curl -fsSL https://ollama.com/install.sh | sh #### Set Ollama Service Configuration Ollama Service Configuration file is /etc/systemd/system/ollama.service. Edit the file to set OLLAMA_HOST environment. -Replace **** with your host IPV4 (please use external public IP). For example the host_ip is 10.132.x.y, then `Environment="OLLAMA_HOST=10.132.x.y:11434"'. +Replace **** with the host IPV4 (please use external public IP). For example if the host_ip is 10.132.x.y, then `Environment="OLLAMA_HOST=10.132.x.y:11434"'. ```bash Environment="OLLAMA_HOST=host_ip:11434" @@ -102,8 +83,7 @@ Environment="OLLAMA_HOST=host_ip:11434" #### Set https_proxy environment for Ollama -If your system access network through proxy, add https_proxy in Ollama Service Configuration file - +If the system access network is through a proxy, add https_proxy in the Ollama Service Configuration file: ```bash Environment="https_proxy=Your_HTTPS_Proxy" ``` @@ -115,13 +95,13 @@ sudo systemctl daemon-reload sudo systemctl restart ollama.service ``` -#### Check the service started +#### Check if the service started ```bash netstat -tuln | grep 11434 ``` -The output are: +The output is: ```bash tcp 0 0 10.132.x.y:11434 0.0.0.0:* LISTEN @@ -137,7 +117,7 @@ export OLLAMA_HOST=http://${host_ip}:11434 ollama pull llama3.2 ``` -After downloaded the models, you can list the models by `ollama list`. +After downloading the models, list the models by `ollama list`. The output should be similar to the following: @@ -148,13 +128,13 @@ llama3.2:latest a80c4f17acd5 2.0 GB 2 minutes ago ### Consume Ollama LLM Service -Access ollama service to verify that the ollama is functioning correctly. +Access ollama service to verify that Ollama is functioning correctly. ```bash curl http://${host_ip}:11434/api/generate -d '{"model": "llama3.2", "prompt":"What is Deep Learning?"}' ``` -The outputs are similar to these: +The output may look like this: ```bash {"model":"llama3.2","created_at":"2024-10-12T12:55:28.098813868Z","response":"Deep","done":false} @@ -173,146 +153,9 @@ The outputs are similar to these: ... ``` -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI. In total, there are 7 required docker images. - -The docker images needed to setup the example needs to be build local, however -the images will be pushed to docker hub soon by Intel. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash - docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} Ollama -:sync: Ollama - - -Next, we'll build the Ollama microservice docker. This will set the entry point -needed for Ollama to suit the ChatQnA examples -```bash -docker build --no-cache -t opea/llm-ollama:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/ollama/langchain/Dockerfile . -``` - -::: -:::: - - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - -```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build Other Service images - -#### Build the UI Image - -*UI* - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . -``` - -### Sanity Check -Check if you have the below set of docker images, before moving on to the next step: - -::::{tab-set} -:::{tab-item} Ollama -:sync: Ollama - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/llm-ollama:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -::: - -:::: - -::::: -:::::: - - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. ::::{tab-set} @@ -328,24 +171,19 @@ with the tools |LLM | Ollama | llama3 |OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/aipc source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} Ollama @@ -358,12 +196,8 @@ docker compose -f compose.yaml up -d ::: :::: - -### Validate microservice - #### Check Env Variables -Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} Ollama @@ -384,9 +218,14 @@ The warning messages print out the variables if they are **NOT** set. #### Check the container status -Check if all the containers launched via docker compose has started. +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -For example, the ChatQnA example starts 11 docker (services), check these docker containers are all running. That is, all the containers `STATUS` are `Up`. To do a quick sanity check, try `docker ps -a` to see if all the containers are running. +Run this command to see this info: +```bash +docker ps -a +``` + +The sample output is for OPEA release v1.2. ::::{tab-set} @@ -395,12 +234,12 @@ For example, the ChatQnA example starts 11 docker (services), check these docker ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -5db065a9fdf9 opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-aipc-ui-server -6fa87927d00c opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 29 seconds ago Up 25 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-aipc-backend-server -bdc93be9ce0c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 29 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -add761b504bc opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 29 seconds ago Up 26 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-aipc-server -d6b540a423ac opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 29 seconds ago Up 26 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -6662d857a154 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 29 seconds ago Up 26 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server +5db065a9fdf9 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-aipc-ui-server +6fa87927d00c opea/chatqna:1.2 "python chatqna.py" 29 seconds ago Up 25 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-aipc-backend-server +bdc93be9ce0c opea/retriever-redis:1.2 "python retriever_re…" 29 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server +add761b504bc opea/reranking-tei:1.2 "python reranking_te…" 29 seconds ago Up 26 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-aipc-server +d6b540a423ac opea/dataprep-redis:1.2 "python prepare_doc_…" 29 seconds ago Up 26 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server +6662d857a154 opea/embedding-tei:1.2 "python embedding_te…" 29 seconds ago Up 26 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server 8b226edcd9db ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 29 seconds ago Up 27 seconds 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server e1fc81b1d542 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 29 seconds ago Up 27 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 051e0d68e263 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 29 seconds ago Up 27 seconds 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server @@ -409,83 +248,19 @@ e1fc81b1d542 redis/redis-stack:7.2.0-v9 "/entrypo ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed - -### Dataprep Microservice(Optional) -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run in a terminal window this command to download the file: +Each docker container's log can also be checked using: ```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -``` - -Upload the file: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" +docker logs ``` -#### Delete file +## Validate Microservices -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` +This section will walk through the different ways to interact with the microservices deployed. -#### Delete all uploaded files and links - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:6006/embed \ @@ -494,31 +269,13 @@ curl ${host_ip}:6006/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. - -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and adds other default parameters that are required for the -retrieval microservice and returns it. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector using Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -527,25 +284,19 @@ curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. -The output is retrieved text that relevant to the input data: -```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. + +The output is retrieved text that is relevant to the input data: +```bash +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -553,60 +304,19 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the LLM microservice. - +Sample output: ```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as the temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -Here is the output: - -```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - -``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. - ### Ollama Service -::::{tab-set} - -:::{tab-item} Ollama -:sync: Ollama - +Run the command below to use Ollama to generate text fo the input prompt. ```bash curl http://${host_ip}:11434/api/generate -d '{"model": "llama3", "prompt":"What is Deep Learning?"}' ``` -Ollama service generates text for the input prompt. Here is the expected result -from Ollama: - +Ollama service generates text for the input prompt. Here is the expected result from Ollama: ```bash {"model":"llama3","created_at":"2024-09-05T08:47:17.160752424Z","response":"Deep","done":false} {"model":"llama3","created_at":"2024-09-05T08:47:18.229472564Z","response":" learning","done":false} @@ -624,58 +334,74 @@ from Ollama: {"model":"llama3","created_at":"2024-09-05T08:47:32.231884525Z","response":" of","done":false} {"model":"llama3","created_at":"2024-09-05T08:47:33.510913894Z","response":" artificial","done":false} {"model":"llama3","created_at":"2024-09-05T08:47:34.516291108Z","response":" neural","done":false} -... ``` -::: -:::: +### Dataprep Microservice +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` -### LLM Microservice +Upload the file: ```bash -curl http://${host_ip}:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" +``` +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' ``` -You will get the below generated text from LLM: +The list of uploaded files can be retrieved using this command: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" +``` +To delete the file or link, use the following commands: + +#### Delete link ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" ``` -### MegaService +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` + +#### Delete all uploaded files and links + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" +``` + +### ChatQnA MegaService ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "model": "'"${OLLAMA_MODEL}"'", "messages": "What is the revenue of Nike in 2023?" }' - ``` -Here is the output for your reference: +Here is the output for reference: ```bash data: b'\n' @@ -713,56 +439,38 @@ data: b'' data: [DONE] ``` -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - - - - - - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - - -::::{tab-set} - -:::{tab-item} Ollama -:sync: Ollama +### NGINX Service +This will ensure the NGINX ervice is working properly. ```bash -docker compose -f compose.yaml logs +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` -::: -:::: + +The output will be similar to that of the ChatQnA megaservice. ## Launch UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-aipc-ui-server: + chatqna-aipc-ui-server: image: opea/chatqna-ui${TAG:-latest} ... ports: - "5173:5173" ``` -### Stop the services +### Stop the Services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} Ollama :sync: Ollama - +To stop and remove all the containers, use the command below: ```bash docker compose -f compose.yaml down ``` + ::: :::: diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index dec2e63f..1482a091 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -36,7 +36,7 @@ git checkout tags/v${RELEASE_VERSION} cd .. ``` -Set up the HuggingFace token: +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` @@ -46,6 +46,12 @@ The example requires setting the `host_ip` to "localhost" to deploy the microser export host_ip="localhost" ``` +Set the NGINX port. +```bash +# Example: NGINX_PORT=80 +export NGINX_PORT= +``` + For machines behind a firewall, set up the proxy environment variables: ```bash export http_proxy="Your_HTTP_Proxy" @@ -72,8 +78,6 @@ ChatQnA will use the following GenAIComps and corresponding tools. Tools and mod |LLM | vLLM | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::{tab-item} TGI :sync: TGI @@ -97,7 +101,7 @@ cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. @@ -160,7 +164,7 @@ After running `docker compose`, check for warning messages for environment varia ::: :::: -### Check container statuses +### Check Container Statuses Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. @@ -201,20 +205,18 @@ bfe41a5353b6 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-emb ::: :::{tab-item} TGI :sync: TGI -TODO: update -```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -0355d705484a opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-gaudi-ui-server -29a7a43abcef opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 2 minutes ago Up 2 minutes 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-gaudi-backend-server -1eb6f5ad6f85 opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-gaudi-server -ad27729caf68 opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server -84f02cf2a904 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -367459f6e65b opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 2 minutes ago Up 2 minutes 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -8c78cde9f588 opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -fa80772de92c ghcr.io/huggingface/tgi-gaudi:2.0.1 "text-generation-lau…" 2 minutes ago Up 2 minutes 0.0.0.0:8005->80/tcp, :::8005->80/tcp tgi-gaudi-server -581687a2cc1a opea/tei-gaudi:${RELEASE_VERSION} "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server -c59178629901 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -5c3a78144498 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server +```bash +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +353775bfa0dc opea/nginx:1.2 "/docker-entrypoint.…" 52 seconds ago Up 50 seconds 0.0.0.0:8010->80/tcp, [::]:8010->80/tcp chatqna-gaudi-nginx-server +c4f75d75f18e opea/chatqna-ui:1.2 "docker-entrypoint.s…" 52 seconds ago Up 50 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-gaudi-ui-server +4c5dc803c8c8 opea/chatqna:1.2 "python chatqna.py" 52 seconds ago Up 51 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-gaudi-backend-server +6bdfebe016c0 opea/dataprep:1.2 "sh -c 'python $( [ …" 52 seconds ago Up 51 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +6fb5264a8465 opea/retriever:1.2 "python opea_retriev…" 52 seconds ago Up 51 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +1f4f4f691d36 ghcr.io/huggingface/tgi-gaudi:2.0.6 "text-generation-lau…" 55 seconds ago Up 51 seconds 0.0.0.0:8005->80/tcp, [::]:8005->80/tcp tgi-gaudi-server +9c50dfc17428 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 55 seconds ago Up 51 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server +a8de74b4594d ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 55 seconds ago Up 51 seconds 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-gaudi-server +e01438eafa7d redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 55 seconds ago Up 51 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +b8ecf10c0c2d jaegertracing/all-in-one:latest "/go/bin/all-in-one-…" 55 seconds ago Up 51 seconds 0.0.0.0:4317-4318->4317-4318/tcp, [::]:4317-4318->4317-4318/tcp, 14250/tcp, 0.0.0.0:9411->9411/tcp, [::]:9411->9411/tcp, 0.0.0.0:16686->16686/tcp, [::]:16686->16686/tcp, 14268/tcp jaeger ``` ::: @@ -226,7 +228,7 @@ Each docker container's log can also be checked using: docker logs ``` -## Validate microservices +## Validate Microservices This section will walk through the different ways to interact with the microservices deployed. @@ -258,8 +260,7 @@ curl http://${host_ip}:7000/v1/retrieval \ -H 'Content-Type: application/json' ``` -The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. The output is retrieved text that is relevant to the input data: ```bash @@ -268,8 +269,7 @@ The output is retrieved text that is relevant to the input data: ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document -index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. ```bash curl http://${host_ip}:8808/rerank \ @@ -398,15 +398,14 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ ``` ### ChatQnA MegaService - +This will ensure the megaservice is working properly. ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' ``` -Here is the output for your reference: - +Here is the output for reference: ```bash data: b'\n' data: b'An' @@ -443,6 +442,17 @@ data: b'' data: [DONE] ``` +### NGINX Service + +This will ensure the NGINX ervice is working properly. +```bash +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' +``` + +The output will be similar to that of the ChatQnA megaservice. + #### (Optional) Guardrail Microservice If the Guardrail microservice is enabled, test it using the command below: ```bash @@ -455,7 +465,7 @@ curl http://${host_ip}:9090/v1/guardrails\ ## Launch UI ### Basic UI -To access the frontend, open the following URL in a web browser: http://{host_ip}:80. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-gaudi-ui-server: image: opea/chatqna-ui:${TAG:-latest} @@ -484,7 +494,7 @@ chaqtna-gaudi-conversation-ui-server: In addition, modify the `chatqna-gaudi-nginx-server` `depends_on` field to include `chatqna-gaudi-conversation-ui-server` instead of `chatqna-gaudi-ui-server`. -Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port {NGINX_PORT} internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-gaudi-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} @@ -493,7 +503,7 @@ Once the services are up, open the following URL in a web browser: http://{host_ - "80:80" ``` -## Stop the services +## Stop the Services To stop and remove all the containers, use the command below: ::::{tab-set} diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index 404f7d8c..4223e5e4 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -1,16 +1,10 @@ # Single node on-prem deployment with TGI on Nvidia gpu -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and meta-llama/Meta-Llama-3-8B-Instruct model, -deployed on on-prem. +This deployment section covers single-node on-prem deployment of the ChatQnA example using the TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on NVIDIA GPUs. + ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node vLLM or TGI megaservice solution. +The list of microservices from OPEA GenAIComps are used to deploy a single node vLLM or TGI megaservice solution for ChatQnA. 1. Data Prep 2. Embedding @@ -18,41 +12,22 @@ GenAIComps to deploy a single node vLLM or TGI megaservice solution. 4. Reranking 5. LLM with TGI -The solution is aimed to show how to use Redis vectordb for RAG and -meta-llama/Meta-Llama-3-8B-Instruct model on Nvidia GPU. We will go through -how to setup docker container to start a microservices and megaservice . The -solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. There are 2 modes you can -use: +The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on NVIDIA GPUs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: 1. Basic UI 2. Conversational UI -Conversational UI is optional, but a feature supported in this example if you -are interested to use. - ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. +The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. ```bash # Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE # Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. +export RELEASE_VERSION= # GenAIExamples git clone https://github.com/opea-project/GenAIExamples.git @@ -61,164 +36,33 @@ git checkout tags/v${RELEASE_VERSION} cd .. ``` -The examples utilize model weights from HuggingFace and langchain. - -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Setup the HuggingFace token +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable -```bash -export host_ip=$(hostname -I | awk '{print $1}') -``` - -Make sure to setup Proxies if you are behind a firewall -```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -``` - -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling ( maybe in future) relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI (conversational React UI is optional). In total, -there are 8 required and 1 optional docker images. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash - docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} TGI -:sync: TGI - -```bash -docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . -``` -::: -:::: - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +export host_ip="localhost" ``` -### Build Other Service images - -#### Build the UI Image - -As mentioned, you can build 2 modes of UI - -*Basic UI* - +Set the NGINX port. ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +# Example: NGINX_PORT=80 +export NGINX_PORT= ``` -*Conversation UI* -If you want a conversational experience with chatqna megaservice. - +For machines behind a firewall, set up the proxy environment variables: ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-conversation-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service ``` -### Sanity Check -Check if you have the below set of docker images, before moving on to the next step: - -::::{tab-set} - -:::{tab-item} TGI -:sync: TGI - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-tgi:${RELEASE_VERSION} -::: -:::: - -::::: -:::::: - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. ::::{tab-set} @@ -234,24 +78,19 @@ with the tools |LLM | TGI | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} TGI @@ -264,10 +103,8 @@ docker compose -f compose.yaml up -d ::: :::: -### Validate microservice #### Check Env Variables -Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} TGI @@ -286,13 +123,16 @@ The warning messages print out the variables if they are **NOT** set. ::: :::: -#### Check the container status +### Check Container Statuses -Check if all the containers launched via docker compose has started +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. + +Run this command to see this info: +```bash +docker ps -a +``` -For example, the ChatQnA example starts 11 docker (services), check these docker -containers are all running, i.e, all the containers `STATUS` are `Up` -To do a quick sanity check, try `docker ps -a` to see if all the containers are running +The sample output will vary based on the `RELEASE_VERSION`. ::::{tab-set} :::{tab-item} TGI @@ -314,84 +154,19 @@ e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypo ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed - -### Dataprep Microservice(Optional) - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to get the file on a terminal: - -```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -``` - -Upload the file: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: +Each docker container's log can also be checked using: ```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' +docker logs ``` -This command updates a knowledge base by submitting a list of HTTP links for processing. +## Validate Microservices -Also, you are able to get the file list that you uploaded: +This section will walk through the different ways to interact with the microservices deployed. -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` - -#### Delete all uploaded files and links - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:8090/embed \ @@ -400,31 +175,13 @@ curl ${host_ip}:8090/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. - -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -433,26 +190,19 @@ curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. -The output is retrieved text that relevant to the input data: -```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. +The output is retrieved text that is relevant to the input data: +```bash +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -460,136 +210,51 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - +Sample output: ```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. ### TGI Service +In first startup, this service will take a few minutes to download the model files and perform warm up. After it's finished, the service will be ready. + ::::{tab-set} :::{tab-item} TGI :sync: TGI +Try the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" +```bash +docker logs tgi-service | grep Connected +``` + +Run the command below to use the TGI service to generate text for the input prompt. Sample output is also shown. ```bash curl http://${host_ip}:8008/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?", \ "parameters":{"max_new_tokens":17, "do_sample": true}}' \ -H 'Content-Type: application/json' - ``` -TGI service generates text for the input prompt. Here is the expected result from TGI: - ```bash -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +{"id":"chatcmpl-cc4300a173af48989cac841f54ebca09","object":"chat.completion","created":1743553002,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning is a subfield of machine learning that is inspired by the structure and function","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":32,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. ::: :::: +### ChatQnA MegaService -If you get - -``` -curl: (7) Failed to connect to 100.81.104.168 port 8008 after 0 ms: Connection refused - -``` - -and the log shows model warm up, please wait for a while and try it later. - -``` -2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set -2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit. -2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model meta-llama/Meta-Llama-3-8B-Instruct -2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model - -``` - -### LLM Microservice - -```bash -curl http://${host_ip}:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' - -``` - -You will get generated text from LLM: - -```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] -``` - -### MegaService - +This will ensure the megaservice is working properly. ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ - "model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": "What is the revenue of Nike in 2023?" }' - ``` -Here is the output for your reference: - +Here is the output for reference: ```bash data: b'\n' data: b'An' @@ -626,80 +291,24 @@ data: b'' data: [DONE] ``` -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied -2024-06-05T01:30:30.697123534Z -2024-06-05T01:30:30.697148330Z For more information, try '--help'. - -``` - -The log indicates the `MODEL_ID` is not set. - - -::::{tab-set} -:::{tab-item} TGI -:sync: TGI - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/compose.yaml` - -```yaml - tgi-service: - image: ghcr.io/huggingface/text-generation-inference:2.2.0 - container_name: tgi-service - ports: - - "9009:80" - volumes: - - "./data:/data" - shm_size: 1g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HF_HUB_DISABLE_PROGRESS_BARS: 1 - HF_HUB_ENABLE_HF_TRANSFER: 0 - command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0 - -``` -::: -:::: - - -The input `MODEL_ID` is `${LLM_MODEL_ID}` - -Check environment variable `LLM_MODEL_ID` is set correctly, spelled correctly. -Set the `LLM_MODEL_ID` then restart the containers. - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - -::::{tab-set} - -:::{tab-item} TGI -:sync: TGI +### NGINX Service +This will ensure the NGINX ervice is working properly. ```bash -docker compose -f ./docker_compose/nvidia/gpu/compose.yaml logs +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` -::: -:::: + +The output will be similar to that of the ChatQnA megaservice. ## Launch UI ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: +To access the frontend, open the following URL in your browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: ```yaml - chaqna-ui-server: + chatqna-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: @@ -708,35 +317,35 @@ To access the frontend, open the following URL in your browser: http://{host_ip} ### Conversational UI -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-ui-server` service with the `chatqna-conversation-ui-server` service as per the config below: +To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-ui-server` service with the `chatqna-conversation-ui-server` service as per the config below: ```yaml -chaqna-conversation-ui-server: +chatqna-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} container_name: chatqna-conversation-ui-server environment: - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} ports: - - "5174:5174" + - "5174:80" depends_on: - - chaqna-backend-server + - chatqna-backend-server ipc: host restart: always ``` -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port {NGINX_PORT} internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-conversation-ui-server: + chatqna-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} ... ports: - "80:80" ``` -### Stop the services +### Stop the Services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} TGI :sync: TGI diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index 1ff48f77..38bd821e 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -36,7 +36,7 @@ git checkout tags/v${RELEASE_VERSION} cd .. ``` -Set up the HuggingFace token: +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` @@ -46,6 +46,12 @@ The example requires setting the `host_ip` to "localhost" to deploy the microser export host_ip="localhost" ``` +Set the NGINX port. +```bash +# Example: NGINX_PORT=80 +export NGINX_PORT= +``` + For machines behind a firewall, set up the proxy environment variables: ```bash export http_proxy="Your_HTTP_Proxy" @@ -95,7 +101,7 @@ cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. @@ -150,7 +156,7 @@ After running `docker compose`, check for warning messages for environment varia ::: :::: -### Check container statuses +### Check Container Statuses Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. @@ -203,7 +209,7 @@ Each docker container's log can also be checked using: docker logs ``` -## Validate microservices +## Validate Microservices This section will walk through the different ways to interact with the microservices deployed. @@ -235,8 +241,7 @@ curl http://${host_ip}:7000/v1/retrieval \ -H 'Content-Type: application/json' ``` -The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. The output is retrieved text that is relevant to the input data: ```bash @@ -245,8 +250,7 @@ The output is retrieved text that is relevant to the input data: ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document -index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. ```bash curl http://${host_ip}:8808/rerank \ @@ -358,6 +362,7 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ ### ChatQnA MegaService +This will ensure the megaservice is working properly. ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" @@ -401,11 +406,22 @@ data: b'' data: [DONE] ``` +### NGINX Service + +This will ensure the NGINX ervice is working properly. +```bash +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' +``` + +The output will be similar to that of the ChatQnA megaservice. + ## Launch UI ### Basic UI -To access the frontend, open the following URL in a web browser: http://{host_ip}:80. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-xeon-ui-server: image: opea/chatqna-ui:${TAG:-latest} @@ -434,7 +450,7 @@ chaqtna-xeon-conversation-ui-server: In addition, modify the `chatqna-xeon-nginx-server` `depends_on` field to include `chatqna-xeon-conversation-ui-server` instead of `chatqna-xeon-ui-server`. -Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port {NGINX_PORT} internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-xeon-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} @@ -443,7 +459,7 @@ Once the services are up, open the following URL in a web browser: http://{host_ - "80:80" ``` -## Stop the services +## Stop the Services To stop and remove all the containers, use the command below: ::::{tab-set} From ce483e7d04726a9df684b77c8f53ab9926d7e7b4 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Wed, 2 Apr 2025 15:30:48 -0700 Subject: [PATCH 04/13] fix headers Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 4 ++-- tutorial/ChatQnA/deploy/nvidia.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 8def0abd..1a8f6c4a 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -196,7 +196,7 @@ docker compose -f compose.yaml up -d ::: :::: -#### Check Env Variables +### Check Env Variables After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} @@ -216,7 +216,7 @@ After running `docker compose`, check for warning messages for environment varia ::: :::: -#### Check the container status +### Check the container status Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index 4223e5e4..b8ec522d 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -103,7 +103,7 @@ docker compose -f compose.yaml up -d ::: :::: -#### Check Env Variables +### Check Env Variables After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} From 0bf350ac02093945423d8b36dab49fffbe01eb36 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Tue, 8 Apr 2025 09:33:00 -0700 Subject: [PATCH 05/13] fix typos Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/gaudi.md | 2 +- tutorial/ChatQnA/deploy/xeon.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index 1482a091..a3e25aa2 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -478,7 +478,7 @@ To access the frontend, open the following URL in a web browser: http://{host_ip To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-gaudi-ui-server` service with the `chatqna-gaudi-conversation-ui-server` service as shown below: ```yaml -chaqtna-gaudi-conversation-ui-server: +chatqna-gaudi-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} container_name: chatqna-gaudi-conversation-ui-server environment: diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index 38bd821e..c9db1464 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -434,7 +434,7 @@ To access the frontend, open the following URL in a web browser: http://{host_ip To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as shown below: ```yaml -chaqtna-xeon-conversation-ui-server: +chatqna-xeon-conversation-ui-server: image: opea/chatqna-conversation-ui:${TAG:-latest} container_name: chatqna-xeon-conversation-ui-server environment: From a237be17d13fce0559ddbdd81abab98d92e6570e Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Wed, 9 Apr 2025 16:14:53 -0700 Subject: [PATCH 06/13] add expected output and next steps, remove 2nd person words Signed-off-by: alexsin368 --- tutorial/ChatQnA/ChatQnA_Guide.rst | 102 +++++++++++++++++++++-------- 1 file changed, 76 insertions(+), 26 deletions(-) diff --git a/tutorial/ChatQnA/ChatQnA_Guide.rst b/tutorial/ChatQnA/ChatQnA_Guide.rst index 335eba6f..26e59e29 100644 --- a/tutorial/ChatQnA/ChatQnA_Guide.rst +++ b/tutorial/ChatQnA/ChatQnA_Guide.rst @@ -3,14 +3,11 @@ ChatQnA #################### -.. note:: This guide is in its early development and is a work-in-progress with - placeholder content. - Overview ******** -Chatbots are a widely adopted use case for leveraging the powerful chat and -reasoning capabilities of large language models (LLMs). The ChatQnA example +Chatbots are a widely adopted use case for leveraging the powerful chat and +reasoning capabilities of large language models (LLMs). The ChatQnA example provides the starting point for developers to begin working in the GenAI space. Consider it the “hello world” of GenAI applications and can be leveraged for solutions across wide enterprise verticals, both internally and externally. @@ -38,16 +35,22 @@ generating human-like responses. Developers can easily swap out the generative model or vector database with their own custom models or databases. This allows developers to build chatbots that are tailored to their specific use cases and requirements. By combining the generative model with the vector database, RAG -can provide accurate and contextually relevant responses specific to your users' +can provide accurate and contextually relevant responses specific to users' queries. The ChatQnA example is designed to be a simple, yet powerful, demonstration of the RAG architecture. It is a great starting point for developers looking to build chatbots that can provide accurate and up-to-date information to users. -To facilitate sharing of individual services across multiple GenAI applications, use the GenAI Microservices Connector (GMC) to deploy your application. Apart from service sharing , it also supports specifying sequential, parallel, and alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching between models used in any stage of a GenAI pipeline. For example, within the ChatQnA pipeline, using GMC one could switch the model used in the embedder, re-ranker, and/or the LLM. -Upstream Vanilla Kubernetes or Red Hat OpenShift Container -Platform (RHOCP) can be used with or without GMC, while use with GMC provides additional features. +To facilitate sharing of individual services across multiple GenAI applications, +use the GenAI Microservices Connector (GMC) to deploy the application. Apart +from service sharing , it also supports specifying sequential, parallel, and +alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching +between models used in any stage of a GenAI pipeline. For example, within the +ChatQnA pipeline, using GMC one could switch the model used in the embedder, +re-ranker, and/or the LLM. Upstream Vanilla Kubernetes or Red Hat OpenShift Container +Platform (RHOCP) can be used with or without GMC, while use with GMC provides +additional features. The ChatQnA provides several deployment options, including single-node deployments on-premise or in a cloud environment using hardware such as Xeon @@ -113,7 +116,51 @@ The architecture follows a series of steps to process user queries and generate Expected Output =============== -TBD +After launching the ChatQnA application, a curl command can be used to ensure the +megaservice is working properly. The example below assumes a document containing +new information is uploaded to the vector database before querying. +.. code-block:: bash + curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' + +Here is the output for reference: +.. code-block:: bash + data: b'\n' + data: b'An' + data: b'swer' + data: b':' + data: b' In' + data: b' fiscal' + data: b' ' + data: b'2' + data: b'0' + data: b'2' + data: b'3' + data: b',' + data: b' N' + data: b'I' + data: b'KE' + data: b',' + data: b' Inc' + data: b'.' + data: b' achieved' + data: b' record' + data: b' Rev' + data: b'en' + data: b'ues' + data: b' of' + data: b' $' + data: b'5' + data: b'1' + data: b'.' + data: b'2' + data: b' billion' + data: b'.' + data: b'' + data: [DONE] + +The UI will show a similar response with formatted output. Validation Matrix and Prerequisites =================================== @@ -204,9 +251,9 @@ The gateway serves as the interface for users to access. The gateway routes inco Deployment ********** -Here are some deployment options depending on your hardware and environment. +Here are some deployment options depending on the hardware and environment. It includes both single-node and orchestrated multi-node configurations. -Choose the one that best fits your requirements. +Choose the one that best fits requirements. Single Node *********** @@ -242,7 +289,7 @@ Troubleshooting Q:For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection - A: That is because by default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, you can specify the certificate file paths in the MicroService class. For more details, please refer to the `source code `_. + A: That is because by default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, specify the certificate file paths in the MicroService class. For more details, please refer to the `source code `_. 2. For other troubles, please check the `doc `_. @@ -250,11 +297,9 @@ Troubleshooting Monitoring ********** -Now that you have deployed the ChatQnA example, let's talk about monitoring the performance of the microservices in the ChatQnA pipeline. - -Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. By monitoring metrics such as latency and throughput, you can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This allows us to proactively address any issues and ensure that the ChatQnA pipeline is running efficiently. +Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. Monitoring metrics such as latency and throughput can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This helps proactively address any issues and ensure that the ChatQnA pipeline is running efficiently. -This document will help you understand how to monitor in real time the latency, throughput, and other metrics of different microservices. You will use **Prometheus** and **Grafana**, both open-source toolkits, to collect metrics and visualize them in a dashboard. +**Prometheus** and **Grafana**, both open-source toolkits, are used to collect metrics including latency and throughput of different microservices in real time, and visualize them in a dashboard. Set Up the Prometheus Server ============================ @@ -290,7 +335,7 @@ Edit the `prometheus.yml` file: vim prometheus.yml -Change the ``job_name`` to the name of the microservice you want to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint. +Change the ``job_name`` to the name of the microservice to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint. Here is an example of exporting metrics data from a TGI microservice to Prometheus: @@ -333,7 +378,7 @@ nohup ./prometheus --config.file=./prometheus.yml & >Note: Before starting Prometheus, ensure that no other processes are running on the designated port (default is 9090). Otherwise, Prometheus will not be able to scrape the metrics. -On the Prometheus UI, you can see the status of the targets and the metrics that are being scraped. You can search for a metrics variable by typing it in the search bar. +On the Prometheus UI, look at the status of the targets and the metrics that are being scraped. To search for a metrics variable, type it in the search bar. The TGI metrics can be accessed at: @@ -372,7 +417,7 @@ Run the Grafana server, without hanging-up the process: nohup ./bin/grafana-server & 3. Access the Grafana dashboard UI: - On your browser, access the Grafana dashboard UI at the following URL: + On a web browser, access the Grafana dashboard UI at the following URL: .. code-block:: bash @@ -388,23 +433,28 @@ Log in to Grafana using the default credentials: password: admin 4. Add Prometheus as a data source: - You need to configure the data source for Grafana to scrape data from. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``. + The data source for Grafana needs to be configured to scrape data. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``. - Then, you need to upload a JSON file for the dashboard's configuration. You can upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json `_ + Then, upload a JSON file for the dashboard's configuration. Upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json `_ 5. View the dashboard: - Finally, open the dashboard in the Grafana UI, and you will see different panels displaying the metrics data. + Finally, open the dashboard in the Grafana UI to see different panels displaying the metrics data. - Taking the TGI microservice as an example, you can see the following metrics: + Taking the TGI microservice as an example, look at the following metrics: * Time to first token * Decode per-token latency * Throughput (generated tokens/sec) * Number of tokens per prompt * Number of generated tokens per request - You can also monitor the incoming requests to the microservice, the response time per token, etc., in real time. + Incoming requests to the microservice, the response time per token, etc., can also be monitored in real time. Summary and Next Steps ======================= -TBD \ No newline at end of file +The ChatQnA application deploys a RAG architecture consisting of the following microservices - +embedding, vectorDB, retrieval, reranker, and LLM text generation. It is a chatbot that can +leverage new information from uploaded documents and websites to provide more accurate answers. +The microservices can be customized by modifying and building them in `GenAIComponents `_. +Explore additional `GenAIExamples `_ and use them +as starting points for other use cases. \ No newline at end of file From 623134c02ee50e544fdaa0a379db7de25cae337d Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Wed, 9 Apr 2025 16:44:18 -0700 Subject: [PATCH 07/13] fix indentation Signed-off-by: alexsin368 --- tutorial/ChatQnA/ChatQnA_Guide.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tutorial/ChatQnA/ChatQnA_Guide.rst b/tutorial/ChatQnA/ChatQnA_Guide.rst index 5adf07bb..fd988891 100644 --- a/tutorial/ChatQnA/ChatQnA_Guide.rst +++ b/tutorial/ChatQnA/ChatQnA_Guide.rst @@ -133,12 +133,14 @@ After launching the ChatQnA application, a curl command can be used to ensure th megaservice is working properly. The example below assumes a document containing new information is uploaded to the vector database before querying. .. code-block:: bash + curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' Here is the output for reference: .. code-block:: bash + data: b'\n' data: b'An' data: b'swer' From 726cc77e984ff615ebe5d03f9f469f9de70b34bc Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Mon, 14 Apr 2025 17:52:31 -0700 Subject: [PATCH 08/13] remove extra UI info, update sample output, remove optional microservice validation Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 21 +++++---- tutorial/ChatQnA/deploy/gaudi.md | 72 +++++++------------------------ tutorial/ChatQnA/deploy/nvidia.md | 54 +++++------------------ tutorial/ChatQnA/deploy/xeon.md | 63 +++++++-------------------- 4 files changed, 51 insertions(+), 159 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 1a8f6c4a..fdde63a3 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -12,7 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with Ollama -The solution is aimed to show how to use Redis vectorDB for RAG and the llama-3 model for LLM inference on Intel Client PCs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: +The solution is aimed to show how to use Redis vectorDB for RAG and the llama-3 model for LLM inference on Intel Client PCs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. ## Prerequisites @@ -54,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen ``` The examples utilize model weights from Ollama and langchain. @@ -225,8 +225,7 @@ Run this command to see this info: docker ps -a ``` -The sample output is for OPEA release v1.2. - +Sample output: ::::{tab-set} :::{tab-item} Ollama @@ -234,12 +233,12 @@ The sample output is for OPEA release v1.2. ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -5db065a9fdf9 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-aipc-ui-server -6fa87927d00c opea/chatqna:1.2 "python chatqna.py" 29 seconds ago Up 25 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-aipc-backend-server -bdc93be9ce0c opea/retriever-redis:1.2 "python retriever_re…" 29 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -add761b504bc opea/reranking-tei:1.2 "python reranking_te…" 29 seconds ago Up 26 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-aipc-server -d6b540a423ac opea/dataprep-redis:1.2 "python prepare_doc_…" 29 seconds ago Up 26 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -6662d857a154 opea/embedding-tei:1.2 "python embedding_te…" 29 seconds ago Up 26 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server +5db065a9fdf9 opea/chatqna-ui:latest "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-aipc-ui-server +6fa87927d00c opea/chatqna:latest "python chatqna.py" 29 seconds ago Up 25 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-aipc-backend-server +bdc93be9ce0c opea/retriever-redis:latest "python retriever_re…" 29 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server +add761b504bc opea/reranking-tei:latest "python reranking_te…" 29 seconds ago Up 26 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-aipc-server +d6b540a423ac opea/dataprep-redis:latest "python prepare_doc_…" 29 seconds ago Up 26 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server +6662d857a154 opea/embedding-tei:latest "python embedding_te…" 29 seconds ago Up 26 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server 8b226edcd9db ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 29 seconds ago Up 27 seconds 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server e1fc81b1d542 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 29 seconds ago Up 27 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 051e0d68e263 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 29 seconds ago Up 27 seconds 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server @@ -452,7 +451,7 @@ The output will be similar to that of the ChatQnA megaservice. ## Launch UI -To access the frontend, open the following URL in a web browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-aipc-ui-server: image: opea/chatqna-ui${TAG:-latest} diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index a3e25aa2..ef749696 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -12,10 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: - -1. Basic UI -2. Conversational UI +The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. ## Prerequisites @@ -57,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen ``` ## Use Case Setup @@ -173,8 +170,7 @@ Run this command to see this info: docker ps -a ``` -The sample output is for OPEA release v1.2. - +Sample output: ::::{tab-set} :::{tab-item} vllm @@ -182,19 +178,19 @@ The sample output is for OPEA release v1.2. ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -eabb930edad6 opea/nginx:1.2 "/docker-entrypoint.…" 9 seconds ago Up 8 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp +eabb930edad6 opea/nginx:latest "/docker-entrypoint.…" 9 seconds ago Up 8 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-gaudi-nginx-server -7e3c16a791b1 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 9 seconds ago Up 8 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp +7e3c16a791b1 opea/chatqna-ui:latest "docker-entrypoint.s…" 9 seconds ago Up 8 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-gaudi-ui-server -482365a6e945 opea/chatqna:1.2 "python chatqna.py" 9 seconds ago Up 9 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp +482365a6e945 opea/chatqna:latest "python chatqna.py" 9 seconds ago Up 9 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-gaudi-backend-server -1379226ad3ff opea/dataprep:1.2 "sh -c 'python $( [ …" 9 seconds ago Up 9 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp +1379226ad3ff opea/dataprep:latest "sh -c 'python $( [ …" 9 seconds ago Up 9 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server -1cebe2d70e40 opea/retriever:1.2 "python opea_retriev…" 9 seconds ago Up 9 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp +1cebe2d70e40 opea/retriever:latest "python opea_retriev…" 9 seconds ago Up 9 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server bfe41a5353b6 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 10 seconds ago Up 9 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server -11a94e7ce3c9 opea/vllm-gaudi:1.2 "python3 -m vllm.ent…" 10 seconds ago Up 9 seconds (health: starting) 0.0.0.0:8007->80/tcp, [::]:8007->80/tcp +11a94e7ce3c9 opea/vllm-gaudi:latest "python3 -m vllm.ent…" 10 seconds ago Up 9 seconds (health: starting) 0.0.0.0:8007->80/tcp, [::]:8007->80/tcp vllm-gaudi-server 4d7b9aab82b1 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 10 seconds ago Up 9 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0. 0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db @@ -207,11 +203,11 @@ bfe41a5353b6 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-emb :sync: TGI ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -353775bfa0dc opea/nginx:1.2 "/docker-entrypoint.…" 52 seconds ago Up 50 seconds 0.0.0.0:8010->80/tcp, [::]:8010->80/tcp chatqna-gaudi-nginx-server -c4f75d75f18e opea/chatqna-ui:1.2 "docker-entrypoint.s…" 52 seconds ago Up 50 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-gaudi-ui-server -4c5dc803c8c8 opea/chatqna:1.2 "python chatqna.py" 52 seconds ago Up 51 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-gaudi-backend-server -6bdfebe016c0 opea/dataprep:1.2 "sh -c 'python $( [ …" 52 seconds ago Up 51 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server -6fb5264a8465 opea/retriever:1.2 "python opea_retriev…" 52 seconds ago Up 51 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +353775bfa0dc opea/nginx:latest "/docker-entrypoint.…" 52 seconds ago Up 50 seconds 0.0.0.0:8010->80/tcp, [::]:8010->80/tcp chatqna-gaudi-nginx-server +c4f75d75f18e opea/chatqna-ui:latest "docker-entrypoint.s…" 52 seconds ago Up 50 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-gaudi-ui-server +4c5dc803c8c8 opea/chatqna:latest "python chatqna.py" 52 seconds ago Up 51 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-gaudi-backend-server +6bdfebe016c0 opea/dataprep:latest "sh -c 'python $( [ …" 52 seconds ago Up 51 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +6fb5264a8465 opea/retriever:latest "python opea_retriev…" 52 seconds ago Up 51 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server 1f4f4f691d36 ghcr.io/huggingface/tgi-gaudi:2.0.6 "text-generation-lau…" 55 seconds ago Up 51 seconds 0.0.0.0:8005->80/tcp, [::]:8005->80/tcp tgi-gaudi-server 9c50dfc17428 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 55 seconds ago Up 51 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server a8de74b4594d ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 55 seconds ago Up 51 seconds 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-gaudi-server @@ -453,19 +449,10 @@ curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ The output will be similar to that of the ChatQnA megaservice. -#### (Optional) Guardrail Microservice -If the Guardrail microservice is enabled, test it using the command below: -```bash -curl http://${host_ip}:9090/v1/guardrails\ - -X POST \ - -d '{"text":"How do you buy a tiger in the US?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' -``` - ## Launch UI ### Basic UI -To access the frontend, open the following URL in a web browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-gaudi-ui-server: image: opea/chatqna-ui:${TAG:-latest} @@ -474,35 +461,6 @@ To access the frontend, open the following URL in a web browser: http://{host_ip - "5173:5173" ``` -### (Optional) Conversational UI - -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-gaudi-ui-server` service with the `chatqna-gaudi-conversation-ui-server` service as shown below: -```yaml -chatqna-gaudi-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - container_name: chatqna-gaudi-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:80" - depends_on: - - chatqna-gaudi-backend-server - ipc: host - restart: always -``` - -In addition, modify the `chatqna-gaudi-nginx-server` `depends_on` field to include `chatqna-gaudi-conversation-ui-server` instead of `chatqna-gaudi-ui-server`. - -Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port {NGINX_PORT} internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: -```yaml - chatqna-gaudi-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" -``` - ## Stop the Services To stop and remove all the containers, use the command below: diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index b8ec522d..9c9a1a66 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -12,10 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with TGI -The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on NVIDIA GPUs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: - -1. Basic UI -2. Conversational UI +The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on NVIDIA GPUs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. ## Prerequisites @@ -57,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen ``` ## Use Case Setup @@ -132,21 +129,20 @@ Run this command to see this info: docker ps -a ``` -The sample output will vary based on the `RELEASE_VERSION`. - +Sample output: ::::{tab-set} :::{tab-item} TGI :sync: TGI ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-ui-server -d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-backend-server -b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-server -24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -24cae0db1a70 opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server -ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server +3b5fa9a722da opea/chatqna-ui:latest "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-ui-server +d3b37f3d1faa opea/chatqna:latest "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-backend-server +b3e1388fa2ca opea/reranking-tei:latest "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-server +24a240f8ad1c opea/retriever-redis:latest "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server +9c0d2a2553e8 opea/embedding-tei:latest "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server +24cae0db1a70 opea/llm-tgi:latest "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server +ea3986c3cf82 opea/dataprep-redis:latest "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-server 4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server @@ -306,7 +302,7 @@ The output will be similar to that of the ChatQnA megaservice. ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: +To access the frontend, open the following URL in your browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: ```yaml chatqna-ui-server: image: opea/chatqna-ui:${TAG:-latest} @@ -315,34 +311,6 @@ To access the frontend, open the following URL in your browser: http://{host_ip} - "5173:5173" ``` -### Conversational UI - -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-ui-server` service with the `chatqna-conversation-ui-server` service as per the config below: -```yaml -chatqna-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - container_name: chatqna-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:80" - depends_on: - - chatqna-backend-server - ipc: host - restart: always -``` - -Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port {NGINX_PORT} internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: - -```yaml - chatqna-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" -``` - ### Stop the Services To stop and remove all the containers, use the command below: diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index c9db1464..d27fde34 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -12,10 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are 2 modes of UI that can be deployed: - -1. Basic UI -2. Conversational UI +The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. ## Prerequisites @@ -57,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen ``` ## Use Case Setup @@ -165,8 +162,7 @@ Run this command to see this info: docker ps -a ``` -The sample output is for OPEA release v1.2. - +Sample output: ::::{tab-set} :::{tab-item} vllm @@ -174,13 +170,13 @@ The sample output is for OPEA release v1.2. ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -25964cd40c51 opea/nginx:1.2 "/docker-entrypoint.…" 37 minutes ago Up 37 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server -bca19cf35370 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 37 minutes ago Up 37 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server -e9622436428a opea/chatqna:1.2 "python chatqna.py" 37 minutes ago Up 37 minutes 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server -514acfb8f398 opea/dataprep:1.2 "sh -c 'python $( [ …" 37 minutes ago Up 37 minutes 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server -dbaf2116ae4b opea/retriever:1.2 "python opea_retriev…" 37 minutes ago Up 37 minutes 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +25964cd40c51 opea/nginx:latest "/docker-entrypoint.…" 37 minutes ago Up 37 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server +bca19cf35370 opea/chatqna-ui:latest "docker-entrypoint.s…" 37 minutes ago Up 37 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server +e9622436428a opea/chatqna:latest "python chatqna.py" 37 minutes ago Up 37 minutes 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server +514acfb8f398 opea/dataprep:latest "sh -c 'python $( [ …" 37 minutes ago Up 37 minutes 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +dbaf2116ae4b opea/retriever:latest "python opea_retriev…" 37 minutes ago Up 37 minutes 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server 82d802dd79c0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 minutes ago Up 37 minutes 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server -20aebf41b92b opea/vllm:1.2 "python3 -m vllm.ent…" 37 minutes ago Up 37 minutes (unhealthy) 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp vllm-service +20aebf41b92b opea/vllm:latest "python3 -m vllm.ent…" 37 minutes ago Up 37 minutes (unhealthy) 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp vllm-service 590ee468e4b7 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 minutes ago Up 37 minutes 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db df543e8425ea ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 minutes ago Up 37 minutes 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` @@ -190,11 +186,11 @@ df543e8425ea ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-emb ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -f303bf48dd43 opea/nginx:1.2 "/docker-entrypoint.…" 4 seconds ago Up 3 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server -0a2597a4baa0 opea/chatqna-ui:1.2 "docker-entrypoint.s…" 4 seconds ago Up 3 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server -5b5a37ba59ed opea/chatqna:1.2 "python chatqna.py" 4 seconds ago Up 3 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server -b2ec04f4d3d5 opea/dataprep:1.2 "sh -c 'python $( [ …" 4 seconds ago Up 3 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server -c6347c8758e4 opea/retriever:1.2 "python opea_retriev…" 4 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +f303bf48dd43 opea/nginx:latest "/docker-entrypoint.…" 4 seconds ago Up 3 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server +0a2597a4baa0 opea/chatqna-ui:latest "docker-entrypoint.s…" 4 seconds ago Up 3 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server +5b5a37ba59ed opea/chatqna:latest "python chatqna.py" 4 seconds ago Up 3 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server +b2ec04f4d3d5 opea/dataprep:latest "sh -c 'python $( [ …" 4 seconds ago Up 3 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +c6347c8758e4 opea/retriever:latest "python opea_retriev…" 4 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server 13403b62e768 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 4 seconds ago Up 3 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service 00509c41487b redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 4 seconds ago Up 3 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db 3e6e650f73a9 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 4 seconds ago Up 3 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server @@ -421,7 +417,7 @@ The output will be similar to that of the ChatQnA megaservice. ### Basic UI -To access the frontend, open the following URL in a web browser: http://{host_ip}:{NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-xeon-ui-server: image: opea/chatqna-ui:${TAG:-latest} @@ -430,35 +426,6 @@ To access the frontend, open the following URL in a web browser: http://{host_ip - "5173:5173" ``` -### (Optional) Conversational UI - -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as shown below: -```yaml -chatqna-xeon-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - container_name: chatqna-xeon-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:80" - depends_on: - - chatqna-xeon-backend-server - ipc: host - restart: always -``` - -In addition, modify the `chatqna-xeon-nginx-server` `depends_on` field to include `chatqna-xeon-conversation-ui-server` instead of `chatqna-xeon-ui-server`. - -Once the services are up, open the following URL in a web browser: http://{host_ip}:5174. By default, the UI runs on port {NGINX_PORT} internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: -```yaml - chatqna-xeon-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" -``` - ## Stop the Services To stop and remove all the containers, use the command below: From 405a0ab46cb9f5476624b6cbb498a0256ba6f5f3 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Tue, 15 Apr 2025 16:07:43 -0700 Subject: [PATCH 09/13] make specifying release version optional, add model access info Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 20 ++++++++++---------- tutorial/ChatQnA/deploy/gaudi.md | 18 +++++++++--------- tutorial/ChatQnA/deploy/nvidia.md | 18 +++++++++--------- tutorial/ChatQnA/deploy/xeon.md | 18 +++++++++--------- 4 files changed, 37 insertions(+), 37 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index fdde63a3..2f6cb53e 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -16,24 +16,24 @@ The solution is aimed to show how to use Redis vectorDB for RAG and the llama-3 ## Prerequisites -The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. + +Set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index ef749696..b6e46eca 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -16,24 +16,24 @@ The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3 ## Prerequisites -The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. + +Set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index 9c9a1a66..f09f2672 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -16,24 +16,24 @@ The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3 ## Prerequisites -The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. + +Set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index d27fde34..ae62130c 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -16,24 +16,24 @@ The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3 ## Prerequisites -The first step is to clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token: +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. + +Set an environment variable with the HuggingFace token: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` From 72a4d5e66445f9c43ae479828186c89ee2e7118a Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Tue, 15 Apr 2025 16:10:32 -0700 Subject: [PATCH 10/13] add parentheses Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 2 +- tutorial/ChatQnA/deploy/gaudi.md | 2 +- tutorial/ChatQnA/deploy/nvidia.md | 2 +- tutorial/ChatQnA/deploy/xeon.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 2f6cb53e..71eaae51 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -23,7 +23,7 @@ cd $WORKSPACE git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples ``` -**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. ```bash export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index b6e46eca..33d99b75 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -23,7 +23,7 @@ cd $WORKSPACE git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples ``` -**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. ```bash export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index f09f2672..f4a776d9 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -23,7 +23,7 @@ cd $WORKSPACE git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples ``` -**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. ```bash export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index ae62130c..145bab5a 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -23,7 +23,7 @@ cd $WORKSPACE git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples ``` -**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. ```bash export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples From e7a6a13fc44053a56713e2a389d785b00f9a4cef Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Wed, 16 Apr 2025 09:09:12 -0700 Subject: [PATCH 11/13] minor fixes Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 17 ++++++++++++----- tutorial/ChatQnA/deploy/gaudi.md | 13 ++++++++++--- tutorial/ChatQnA/deploy/nvidia.md | 14 ++++++++++---- tutorial/ChatQnA/deploy/xeon.md | 13 ++++++++++--- 4 files changed, 42 insertions(+), 15 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 71eaae51..830d6ce5 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -51,10 +51,9 @@ export NGINX_PORT= For machines behind a firewall, set up the proxy environment variables: ```bash -export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen +export no_proxy=$no_proxy,chatqna-aipc-backend-server,tei-embedding-service,retriever,tei-reranking-service,redis-vector-db,dataprep-redis-service,ollama-service ``` The examples utilize model weights from Ollama and langchain. @@ -190,7 +189,6 @@ Run `docker compose` with the provided YAML file to start all the services menti :sync: Ollama ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/aipc docker compose -f compose.yaml up -d ``` ::: @@ -451,17 +449,26 @@ The output will be similar to that of the ChatQnA megaservice. ## Launch UI -To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-aipc-ui-server: image: opea/chatqna-ui${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` +After making this change, rebuild and restart the containers for the change to take effect. + ### Stop the Services +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/aipc +``` + +To stop and remove all the containers, use the command below: + ::::{tab-set} :::{tab-item} Ollama diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index 33d99b75..ae85a3cb 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -54,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen +export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails,jaeger,prometheus,grafana,gaudi-node-exporter-1 ``` ## Use Case Setup @@ -452,17 +452,24 @@ The output will be similar to that of the ChatQnA megaservice. ## Launch UI ### Basic UI -To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-gaudi-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` +After making this change, rebuild and restart the containers for the change to take effect. + ## Stop the Services +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi +``` + To stop and remove all the containers, use the command below: ::::{tab-set} diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index f4a776d9..e6f8a9a9 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -54,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen +export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service ``` ## Use Case Setup @@ -94,7 +94,6 @@ Run `docker compose` with the provided YAML file to start all the services menti :sync: TGI ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/ docker compose -f compose.yaml up -d ``` ::: @@ -302,17 +301,24 @@ The output will be similar to that of the ChatQnA megaservice. ### Basic UI -To access the frontend, open the following URL in your browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: +To access the frontend, open the following URL in your browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. If you prefer to use a different to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` +After making this change, rebuild and restart the containers for the change to take effect. + ### Stop the Services +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu +``` + To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} TGI diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index 145bab5a..e39048ae 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -54,7 +54,7 @@ For machines behind a firewall, set up the proxy environment variables: export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPs_Proxy" # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" -export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen,jaeger,prometheus,grafana,xeon-node-exporter-1 ``` ## Use Case Setup @@ -417,17 +417,24 @@ The output will be similar to that of the ChatQnA megaservice. ### Basic UI -To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml chatqna-xeon-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` +After making this change, rebuild and restart the containers for the change to take effect. + ## Stop the Services +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon +``` + To stop and remove all the containers, use the command below: ::::{tab-set} From e27a9b1b658b12780caaa1d5cdf82b66e7c9efa1 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Thu, 17 Apr 2025 15:42:38 -0700 Subject: [PATCH 12/13] address all comments Signed-off-by: alexsin368 --- tutorial/ChatQnA/ChatQnA_Guide.rst | 6 +-- tutorial/ChatQnA/deploy/aipc.md | 25 +++++------ tutorial/ChatQnA/deploy/gaudi.md | 20 ++++----- tutorial/ChatQnA/deploy/nvidia.md | 72 ++++++++++++++++++++++++++---- tutorial/ChatQnA/deploy/xeon.md | 20 ++++----- 5 files changed, 99 insertions(+), 44 deletions(-) diff --git a/tutorial/ChatQnA/ChatQnA_Guide.rst b/tutorial/ChatQnA/ChatQnA_Guide.rst index fd988891..b749a92b 100644 --- a/tutorial/ChatQnA/ChatQnA_Guide.rst +++ b/tutorial/ChatQnA/ChatQnA_Guide.rst @@ -46,7 +46,7 @@ To facilitate sharing of individual services across multiple GenAI applications, use the GenAI Microservices Connector (GMC) to deploy the application. Apart from service sharing , it also supports specifying sequential, parallel, and alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching -between models used in any stage of a GenAI pipeline. For example, within the +between models used in any stage of a GenAI pipeline. For example, within the ChatQnA pipeline, using GMC one could switch the model used in the embedder, re-ranker, and/or the LLM. Upstream Vanilla Kubernetes or Red Hat OpenShift Container Platform (RHOCP) can be used with or without GMC, while use with GMC provides @@ -302,9 +302,9 @@ Troubleshooting 1. Browser interface https link failed - Q:For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection + Q: For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection - A: That is because by default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, specify the certificate file paths in the MicroService class. For more details, please refer to the `source code `_. + A: By default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, specify the certificate file paths in the MicroService class. For more details, please refer to the `source code `_. 2. For other troubles, please check the `doc `_. diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 830d6ce5..4f2a66ec 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -1,10 +1,10 @@ # Single node on-prem deployment with Ollama on AIPC -This deployment section covers single-node on-prem deployment of the ChatQnA example using the Ollama. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and a llama-3 model deployed on the client CPU. +This section covers single-node on-prem deployment of the ChatQnA example using Ollama. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and a llama-3 model deployed on the client CPU. ## Overview -The list of microservices from OPEA GenAIComps are used to deploy a single node Ollama megaservice solution for ChatQnA. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -12,7 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with Ollama -The solution is aimed to show how to use Redis vectorDB for RAG and the llama-3 model for LLM inference on Intel Client PCs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on Intel Client PCs. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites @@ -33,7 +33,7 @@ cd .. Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Set an environment variable with the HuggingFace token: +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` @@ -73,8 +73,7 @@ curl -fsSL https://ollama.com/install.sh | sh #### Set Ollama Service Configuration -Ollama Service Configuration file is /etc/systemd/system/ollama.service. Edit the file to set OLLAMA_HOST environment. -Replace **** with the host IPV4 (please use external public IP). For example if the host_ip is 10.132.x.y, then `Environment="OLLAMA_HOST=10.132.x.y:11434"'. +The Ollama Service Configuration file is /etc/systemd/system/ollama.service. Edit the file to set OLLAMA_HOST environment, replacing with the hosts IPV4 external public IP address. For example, if the host_ip is 10.132.x.y, then `Environment="OLLAMA_HOST=10.132.x.y:11434"' should be used. ```bash Environment="OLLAMA_HOST=host_ip:11434" @@ -82,7 +81,7 @@ Environment="OLLAMA_HOST=host_ip:11434" #### Set https_proxy environment for Ollama -If the system access network is through a proxy, add https_proxy in the Ollama Service Configuration file: +If the system's network is accessed through a proxy, add a https_proxy entry to the Ollama Service Configuration file: ```bash Environment="https_proxy=Your_HTTPS_Proxy" ``` @@ -116,7 +115,7 @@ export OLLAMA_HOST=http://${host_ip}:11434 ollama pull llama3.2 ``` -After downloading the models, list the models by `ollama list`. +After downloading the models, list the models by executing the `ollama list` command. The output should be similar to the following: @@ -154,7 +153,7 @@ The output may look like this: ## Use Case Setup -ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -253,7 +252,7 @@ docker logs ## Validate Microservices -This section will walk through the different ways to interact with the microservices deployed. +This section will guide through the various methods for interacting with the deployed microservices. ### TEI Embedding Service @@ -308,7 +307,7 @@ Sample output: ### Ollama Service -Run the command below to use Ollama to generate text fo the input prompt. +Run the command below to use Ollama to generate text for the input prompt. ```bash curl http://${host_ip}:11434/api/generate -d '{"model": "llama3", "prompt":"What is Deep Learning?"}' ``` @@ -335,7 +334,7 @@ Ollama service generates text for the input prompt. Here is the expected result ### Dataprep Microservice -The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. `nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: ```bash @@ -438,7 +437,7 @@ data: [DONE] ### NGINX Service -This will ensure the NGINX ervice is working properly. +This will ensure the NGINX service is working properly. ```bash curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ -H "Content-Type: application/json" \ diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index ae85a3cb..a4bf7624 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -1,10 +1,10 @@ # Single node on-prem deployment with vLLM or TGI on Gaudi AI Accelerator -This deployment section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). +This section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). ## Overview -The list of microservices from OPEA GenAIComps are used to deploy a single node vLLM or TGI megaservice solution for ChatQnA. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -12,7 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites @@ -33,7 +33,7 @@ cd .. Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Set an environment variable with the HuggingFace token: +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` @@ -59,7 +59,7 @@ export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-se ## Use Case Setup -ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -226,7 +226,7 @@ docker logs ## Validate Microservices -This section will walk through the different ways to interact with the microservices deployed. +This section will guide through the various methods for interacting with the deployed microservices. ### TEI Embedding Service @@ -281,14 +281,14 @@ Sample output: ### vLLM and TGI Service -In first startup, this service will take a few minutes to download the model files and perform warm up. After it's finished, the service will be ready. +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. ::::{tab-set} :::{tab-item} vllm :sync: vllm -Try the command below to check whether the LLM service is ready. The output should be "Application startup complete." +Run the command below to check whether the LLM service is ready. The output should be "Application startup complete." ```bash docker logs vllm-service 2>&1 | grep complete @@ -315,7 +315,7 @@ curl http://${host_ip}:8007/v1/completions \ :::{tab-item} TGI :sync: TGI -Try the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" +Run the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" ```bash docker logs tgi-service | grep Connected @@ -339,7 +339,7 @@ curl http://${host_ip}:8005/generate \ ### Dataprep Microservice -The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. `nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: ```bash diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index e6f8a9a9..3da256a6 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -1,10 +1,10 @@ # Single node on-prem deployment with TGI on Nvidia gpu -This deployment section covers single-node on-prem deployment of the ChatQnA example using the TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on NVIDIA GPUs. +This section covers single-node on-prem deployment of the ChatQnA example using the TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on NVIDIA GPUs. ## Overview -The list of microservices from OPEA GenAIComps are used to deploy a single node vLLM or TGI megaservice solution for ChatQnA. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -12,7 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with TGI -The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on NVIDIA GPUs. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on NVIDIA GPUs. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites @@ -33,7 +33,7 @@ cd .. Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Set an environment variable with the HuggingFace token: +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` @@ -59,7 +59,7 @@ export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,datapre ## Use Case Setup -ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -157,7 +157,7 @@ docker logs ## Validate Microservices -This section will walk through the different ways to interact with the microservices deployed. +This section will guide through the various methods for interacting with the deployed microservices. ### TEI Embedding Service @@ -212,14 +212,14 @@ Sample output: ### TGI Service -In first startup, this service will take a few minutes to download the model files and perform warm up. After it's finished, the service will be ready. +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. ::::{tab-set} :::{tab-item} TGI :sync: TGI -Try the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" +Run the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" ```bash docker logs tgi-service | grep Connected ``` @@ -240,6 +240,62 @@ curl http://${host_ip}:8008/generate \ ::: :::: +### Dataprep Microservice + +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. + +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` + +Upload the file: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" +``` + +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' +``` + +The list of uploaded files can be retrieved using this command: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" +``` + +To delete the file or link, use the following commands: + +#### Delete link +```bash +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" +``` + +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` + +#### Delete all uploaded files and links + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" +``` + ### ChatQnA MegaService This will ensure the megaservice is working properly. diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index e39048ae..2e8e06d9 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -1,10 +1,10 @@ # Single node on-prem deployment with vLLM or TGI on Xeon Scalable processors -This deployment section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). +This section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). ## Overview -The list of microservices from OPEA GenAIComps are used to deploy a single node vLLM or TGI megaservice solution for ChatQnA. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -12,7 +12,7 @@ The list of microservices from OPEA GenAIComps are used to deploy a single node 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectorDB for RAG and Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. Steps will include setting up docker containers, utilizing a sample Nike dataset in PDF format, and asking a question about Nike to get a response. There are multiple versions of the UI that can be deployed but only the default one will be covered in this tutorial. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites @@ -33,7 +33,7 @@ cd .. Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Set an environment variable with the HuggingFace token: +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` @@ -59,7 +59,7 @@ export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-serv ## Use Case Setup -ChatQnA will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file. +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -207,7 +207,7 @@ docker logs ## Validate Microservices -This section will walk through the different ways to interact with the microservices deployed. +This section will guide through the various methods for interacting with the deployed microservices. ### TEI Embedding Service @@ -262,14 +262,14 @@ Sample output: ### vLLM and TGI Service -In first startup, this service will take a few minutes to download the model files and perform warm up. After it's finished, the service will be ready. +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. ::::{tab-set} :::{tab-item} vllm :sync: vllm -Try the command below to check whether the LLM service is ready. The output should be "Application startup complete." +Run the command below to check whether the LLM service is ready. The output should be "Application startup complete." ```bash docker logs vllm-service 2>&1 | grep complete @@ -279,7 +279,7 @@ docker logs vllm-service 2>&1 | grep complete :::{tab-item} TGI :sync: TGI -Try the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" +Run the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" ```bash docker logs tgi-service | grep Connected @@ -302,7 +302,7 @@ curl http://${host_ip}:9009/v1/chat/completions \ ### Dataprep Microservice -The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. `nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: ```bash From 7d3d2e043b10bfa81f34a90cf20ebbd1168d11c6 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Thu, 17 Apr 2025 18:13:02 -0700 Subject: [PATCH 13/13] fix typo Signed-off-by: alexsin368 --- tutorial/ChatQnA/deploy/aipc.md | 2 +- tutorial/ChatQnA/deploy/gaudi.md | 2 +- tutorial/ChatQnA/deploy/nvidia.md | 2 +- tutorial/ChatQnA/deploy/xeon.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 4f2a66ec..4a88e5a9 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -4,7 +4,7 @@ This section covers single-node on-prem deployment of the ChatQnA example using ## Overview -The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index a4bf7624..4a43de53 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -4,7 +4,7 @@ This section covers single-node on-prem deployment of the ChatQnA example using ## Overview -The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index 3da256a6..e65de390 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -4,7 +4,7 @@ This section covers single-node on-prem deployment of the ChatQnA example using ## Overview -The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index 2e8e06d9..955dfce5 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -4,7 +4,7 @@ This section covers single-node on-prem deployment of the ChatQnA example using ## Overview -The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI magaservice solution for ChatQnA are listed below: +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding