diff --git a/tutorial/ChatQnA/ChatQnA_Guide.rst b/tutorial/ChatQnA/ChatQnA_Guide.rst index 2dfe2374..b749a92b 100644 --- a/tutorial/ChatQnA/ChatQnA_Guide.rst +++ b/tutorial/ChatQnA/ChatQnA_Guide.rst @@ -3,14 +3,11 @@ ChatQnA #################### -.. note:: This guide is in its early development and is a work-in-progress with - placeholder content. - Overview ******** -Chatbots are a widely adopted use case for leveraging the powerful chat and -reasoning capabilities of large language models (LLMs). The ChatQnA example +Chatbots are a widely adopted use case for leveraging the powerful chat and +reasoning capabilities of large language models (LLMs). The ChatQnA example provides the starting point for developers to begin working in the GenAI space. Consider it the “hello world” of GenAI applications and can be leveraged for solutions across wide enterprise verticals, both internally and externally. @@ -38,16 +35,22 @@ generating human-like responses. Developers can easily swap out the generative model or vector database with their own custom models or databases. This allows developers to build chatbots that are tailored to their specific use cases and requirements. By combining the generative model with the vector database, RAG -can provide accurate and contextually relevant responses specific to your users' +can provide accurate and contextually relevant responses specific to users' queries. The ChatQnA example is designed to be a simple, yet powerful, demonstration of the RAG architecture. It is a great starting point for developers looking to build chatbots that can provide accurate and up-to-date information to users. -To facilitate sharing of individual services across multiple GenAI applications, use the GenAI Microservices Connector (GMC) to deploy your application. Apart from service sharing , it also supports specifying sequential, parallel, and alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching between models used in any stage of a GenAI pipeline. For example, within the ChatQnA pipeline, using GMC one could switch the model used in the embedder, re-ranker, and/or the LLM. -Upstream Vanilla Kubernetes or Red Hat OpenShift Container -Platform (RHOCP) can be used with or without GMC, while use with GMC provides additional features. +To facilitate sharing of individual services across multiple GenAI applications, +use the GenAI Microservices Connector (GMC) to deploy the application. Apart +from service sharing , it also supports specifying sequential, parallel, and +alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching +between models used in any stage of a GenAI pipeline. For example, within the +ChatQnA pipeline, using GMC one could switch the model used in the embedder, +re-ranker, and/or the LLM. Upstream Vanilla Kubernetes or Red Hat OpenShift Container +Platform (RHOCP) can be used with or without GMC, while use with GMC provides +additional features. The ChatQnA provides several deployment options, including single-node deployments on-premise or in a cloud environment using hardware such as Xeon @@ -126,7 +129,53 @@ For more details, please refer to the following document: Expected Output =============== -TBD +After launching the ChatQnA application, a curl command can be used to ensure the +megaservice is working properly. The example below assumes a document containing +new information is uploaded to the vector database before querying. +.. code-block:: bash + + curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' + +Here is the output for reference: +.. code-block:: bash + + data: b'\n' + data: b'An' + data: b'swer' + data: b':' + data: b' In' + data: b' fiscal' + data: b' ' + data: b'2' + data: b'0' + data: b'2' + data: b'3' + data: b',' + data: b' N' + data: b'I' + data: b'KE' + data: b',' + data: b' Inc' + data: b'.' + data: b' achieved' + data: b' record' + data: b' Rev' + data: b'en' + data: b'ues' + data: b' of' + data: b' $' + data: b'5' + data: b'1' + data: b'.' + data: b'2' + data: b' billion' + data: b'.' + data: b'' + data: [DONE] + +The UI will show a similar response with formatted output. Validation Matrix and Prerequisites =================================== @@ -217,9 +266,9 @@ The gateway serves as the interface for users to access. The gateway routes inco Deployment ********** -Here are some deployment options depending on your hardware and environment. +Here are some deployment options depending on the hardware and environment. It includes both single-node and orchestrated multi-node configurations. -Choose the one that best fits your requirements. +Choose the one that best fits requirements. Single Node *********** @@ -253,9 +302,9 @@ Troubleshooting 1. Browser interface https link failed - Q:For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection + Q: For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection - A: That is because by default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, you can specify the certificate file paths in the MicroService class. For more details, please refer to the `source code `_. + A: By default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, specify the certificate file paths in the MicroService class. For more details, please refer to the `source code `_. 2. For other troubles, please check the `doc `_. @@ -263,11 +312,9 @@ Troubleshooting Monitoring ********** -Now that you have deployed the ChatQnA example, let's talk about monitoring the performance of the microservices in the ChatQnA pipeline. - -Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. By monitoring metrics such as latency and throughput, you can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This allows us to proactively address any issues and ensure that the ChatQnA pipeline is running efficiently. +Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. Monitoring metrics such as latency and throughput can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This helps proactively address any issues and ensure that the ChatQnA pipeline is running efficiently. -This document will help you understand how to monitor in real time the latency, throughput, and other metrics of different microservices. You will use **Prometheus** and **Grafana**, both open-source toolkits, to collect metrics and visualize them in a dashboard. +**Prometheus** and **Grafana**, both open-source toolkits, are used to collect metrics including latency and throughput of different microservices in real time, and visualize them in a dashboard. Set Up the Prometheus Server ============================ @@ -303,7 +350,7 @@ Edit the `prometheus.yml` file: vim prometheus.yml -Change the ``job_name`` to the name of the microservice you want to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint. +Change the ``job_name`` to the name of the microservice to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint. Here is an example of exporting metrics data from a TGI microservice to Prometheus: @@ -346,7 +393,7 @@ nohup ./prometheus --config.file=./prometheus.yml & >Note: Before starting Prometheus, ensure that no other processes are running on the designated port (default is 9090). Otherwise, Prometheus will not be able to scrape the metrics. -On the Prometheus UI, you can see the status of the targets and the metrics that are being scraped. You can search for a metrics variable by typing it in the search bar. +On the Prometheus UI, look at the status of the targets and the metrics that are being scraped. To search for a metrics variable, type it in the search bar. The TGI metrics can be accessed at: @@ -385,7 +432,7 @@ Run the Grafana server, without hanging-up the process: nohup ./bin/grafana-server & 3. Access the Grafana dashboard UI: - On your browser, access the Grafana dashboard UI at the following URL: + On a web browser, access the Grafana dashboard UI at the following URL: .. code-block:: bash @@ -401,23 +448,28 @@ Log in to Grafana using the default credentials: password: admin 4. Add Prometheus as a data source: - You need to configure the data source for Grafana to scrape data from. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``. + The data source for Grafana needs to be configured to scrape data. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``. - Then, you need to upload a JSON file for the dashboard's configuration. You can upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json `_ + Then, upload a JSON file for the dashboard's configuration. Upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json `_ 5. View the dashboard: - Finally, open the dashboard in the Grafana UI, and you will see different panels displaying the metrics data. + Finally, open the dashboard in the Grafana UI to see different panels displaying the metrics data. - Taking the TGI microservice as an example, you can see the following metrics: + Taking the TGI microservice as an example, look at the following metrics: * Time to first token * Decode per-token latency * Throughput (generated tokens/sec) * Number of tokens per prompt * Number of generated tokens per request - You can also monitor the incoming requests to the microservice, the response time per token, etc., in real time. + Incoming requests to the microservice, the response time per token, etc., can also be monitored in real time. Summary and Next Steps ======================= -TBD +The ChatQnA application deploys a RAG architecture consisting of the following microservices - +embedding, vectorDB, retrieval, reranker, and LLM text generation. It is a chatbot that can +leverage new information from uploaded documents and websites to provide more accurate answers. +The microservices can be customized by modifying and building them in `GenAIComponents `_. +Explore additional `GenAIExamples `_ and use them +as starting points for other use cases. diff --git a/tutorial/ChatQnA/deploy/aipc.md b/tutorial/ChatQnA/deploy/aipc.md index 43ba6229..4a88e5a9 100644 --- a/tutorial/ChatQnA/deploy/aipc.md +++ b/tutorial/ChatQnA/deploy/aipc.md @@ -1,17 +1,10 @@ # Single node on-prem deployment with Ollama on AIPC -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using Ollama. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and the llama-3 model, -deployed on the client CPU. +This section covers single-node on-prem deployment of the ChatQnA example using Ollama. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and a llama-3 model deployed on the client CPU. ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node Ollama megaservice solution. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -19,69 +12,56 @@ GenAIComps to deploy a single node Ollama megaservice solution. 4. Reranking 5. LLM with Ollama -The solution is aimed to show how to use Redis vectordb for RAG and -the llama-3 model on Intel Client PCs. We will go through -how to setup docker container to start microservices and megaservice. -The solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on Intel Client PCs. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Setup the HuggingFace token +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. +```bash +export host_ip="localhost" +``` + +Set the NGINX port. ```bash -export host_ip=$(hostname -I | awk '{print $1}') +# Example: NGINX_PORT=80 +export NGINX_PORT= ``` -Make sure to setup Proxies if you are behind a firewall +For machines behind a firewall, set up the proxy environment variables: ```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy=$no_proxy,chatqna-aipc-backend-server,tei-embedding-service,retriever,tei-reranking-service,redis-vector-db,dataprep-redis-service,ollama-service ``` The examples utilize model weights from Ollama and langchain. ### Set Up Ollama LLM Service -We use [Ollama](https://ollama.com/) as our LLM service for AIPC. +Use [Ollama](https://ollama.com/) as the LLM service for AIPC. -Please follow the instructions to set up Ollama on your PC. This will set the entrypoint needed for the Ollama to suit the ChatQnA examples. +Please follow the instructions to set up Ollama on the PC. This will set the entrypoint needed for the Ollama to work with the ChatQnA example. #### Install Ollama Service @@ -93,8 +73,7 @@ curl -fsSL https://ollama.com/install.sh | sh #### Set Ollama Service Configuration -Ollama Service Configuration file is /etc/systemd/system/ollama.service. Edit the file to set OLLAMA_HOST environment. -Replace **** with your host IPV4 (please use external public IP). For example the host_ip is 10.132.x.y, then `Environment="OLLAMA_HOST=10.132.x.y:11434"'. +The Ollama Service Configuration file is /etc/systemd/system/ollama.service. Edit the file to set OLLAMA_HOST environment, replacing with the hosts IPV4 external public IP address. For example, if the host_ip is 10.132.x.y, then `Environment="OLLAMA_HOST=10.132.x.y:11434"' should be used. ```bash Environment="OLLAMA_HOST=host_ip:11434" @@ -102,8 +81,7 @@ Environment="OLLAMA_HOST=host_ip:11434" #### Set https_proxy environment for Ollama -If your system access network through proxy, add https_proxy in Ollama Service Configuration file - +If the system's network is accessed through a proxy, add a https_proxy entry to the Ollama Service Configuration file: ```bash Environment="https_proxy=Your_HTTPS_Proxy" ``` @@ -115,13 +93,13 @@ sudo systemctl daemon-reload sudo systemctl restart ollama.service ``` -#### Check the service started +#### Check if the service started ```bash netstat -tuln | grep 11434 ``` -The output are: +The output is: ```bash tcp 0 0 10.132.x.y:11434 0.0.0.0:* LISTEN @@ -137,7 +115,7 @@ export OLLAMA_HOST=http://${host_ip}:11434 ollama pull llama3.2 ``` -After downloaded the models, you can list the models by `ollama list`. +After downloading the models, list the models by executing the `ollama list` command. The output should be similar to the following: @@ -148,13 +126,13 @@ llama3.2:latest a80c4f17acd5 2.0 GB 2 minutes ago ### Consume Ollama LLM Service -Access ollama service to verify that the ollama is functioning correctly. +Access ollama service to verify that Ollama is functioning correctly. ```bash curl http://${host_ip}:11434/api/generate -d '{"model": "llama3.2", "prompt":"What is Deep Learning?"}' ``` -The outputs are similar to these: +The output may look like this: ```bash {"model":"llama3.2","created_at":"2024-10-12T12:55:28.098813868Z","response":"Deep","done":false} @@ -173,146 +151,9 @@ The outputs are similar to these: ... ``` -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI. In total, there are 7 required docker images. - -The docker images needed to setup the example needs to be build local, however -the images will be pushed to docker hub soon by Intel. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash - docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} Ollama -:sync: Ollama - - -Next, we'll build the Ollama microservice docker. This will set the entry point -needed for Ollama to suit the ChatQnA examples -```bash -docker build --no-cache -t opea/llm-ollama:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/ollama/langchain/Dockerfile . -``` - -::: -:::: - - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - -```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build Other Service images - -#### Build the UI Image - -*UI* - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . -``` - -### Sanity Check -Check if you have the below set of docker images, before moving on to the next step: - -::::{tab-set} -:::{tab-item} Ollama -:sync: Ollama - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/llm-ollama:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -::: - -:::: - -::::: -:::::: - - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -328,42 +169,32 @@ with the tools |LLM | Ollama | llama3 |OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/aipc source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} Ollama :sync: Ollama ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/aipc docker compose -f compose.yaml up -d ``` ::: :::: - -### Validate microservice - -#### Check Env Variables -Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +### Check Env Variables +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} Ollama @@ -382,12 +213,16 @@ The warning messages print out the variables if they are **NOT** set. ::: :::: -#### Check the container status +### Check the container status -Check if all the containers launched via docker compose has started. +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -For example, the ChatQnA example starts 11 docker (services), check these docker containers are all running. That is, all the containers `STATUS` are `Up`. To do a quick sanity check, try `docker ps -a` to see if all the containers are running. +Run this command to see this info: +```bash +docker ps -a +``` +Sample output: ::::{tab-set} :::{tab-item} Ollama @@ -395,12 +230,12 @@ For example, the ChatQnA example starts 11 docker (services), check these docker ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -5db065a9fdf9 opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-aipc-ui-server -6fa87927d00c opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 29 seconds ago Up 25 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-aipc-backend-server -bdc93be9ce0c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 29 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -add761b504bc opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 29 seconds ago Up 26 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-aipc-server -d6b540a423ac opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 29 seconds ago Up 26 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -6662d857a154 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 29 seconds ago Up 26 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server +5db065a9fdf9 opea/chatqna-ui:latest "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-aipc-ui-server +6fa87927d00c opea/chatqna:latest "python chatqna.py" 29 seconds ago Up 25 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-aipc-backend-server +bdc93be9ce0c opea/retriever-redis:latest "python retriever_re…" 29 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server +add761b504bc opea/reranking-tei:latest "python reranking_te…" 29 seconds ago Up 26 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-aipc-server +d6b540a423ac opea/dataprep-redis:latest "python prepare_doc_…" 29 seconds ago Up 26 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server +6662d857a154 opea/embedding-tei:latest "python embedding_te…" 29 seconds ago Up 26 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server 8b226edcd9db ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 29 seconds ago Up 27 seconds 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server e1fc81b1d542 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 29 seconds ago Up 27 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 051e0d68e263 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 29 seconds ago Up 27 seconds 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server @@ -409,83 +244,19 @@ e1fc81b1d542 redis/redis-stack:7.2.0-v9 "/entrypo ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed - -### Dataprep Microservice(Optional) -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run in a terminal window this command to download the file: +Each docker container's log can also be checked using: ```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +docker logs ``` -Upload the file: +## Validate Microservices -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" +This section will guide through the various methods for interacting with the deployed microservices. -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` - -#### Delete all uploaded files and links - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:6006/embed \ @@ -494,31 +265,13 @@ curl ${host_ip}:6006/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and adds other default parameters that are required for the -retrieval microservice and returns it. - -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector using Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -527,25 +280,19 @@ curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. -The output is retrieved text that relevant to the input data: -```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. + +The output is retrieved text that is relevant to the input data: +```bash +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -553,60 +300,19 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the LLM microservice. - +Sample output: ```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as the temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - -```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - -``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. - ### Ollama Service -::::{tab-set} - -:::{tab-item} Ollama -:sync: Ollama - +Run the command below to use Ollama to generate text for the input prompt. ```bash curl http://${host_ip}:11434/api/generate -d '{"model": "llama3", "prompt":"What is Deep Learning?"}' ``` -Ollama service generates text for the input prompt. Here is the expected result -from Ollama: - +Ollama service generates text for the input prompt. Here is the expected result from Ollama: ```bash {"model":"llama3","created_at":"2024-09-05T08:47:17.160752424Z","response":"Deep","done":false} {"model":"llama3","created_at":"2024-09-05T08:47:18.229472564Z","response":" learning","done":false} @@ -624,58 +330,74 @@ from Ollama: {"model":"llama3","created_at":"2024-09-05T08:47:32.231884525Z","response":" of","done":false} {"model":"llama3","created_at":"2024-09-05T08:47:33.510913894Z","response":" artificial","done":false} {"model":"llama3","created_at":"2024-09-05T08:47:34.516291108Z","response":" neural","done":false} -... ``` -::: -:::: +### Dataprep Microservice +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` -### LLM Microservice +Upload the file: ```bash -curl http://${host_ip}:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" +``` + +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' +``` +The list of uploaded files can be retrieved using this command: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" ``` -You will get the below generated text from LLM: +To delete the file or link, use the following commands: +#### Delete link ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" ``` -### MegaService +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` + +#### Delete all uploaded files and links + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" +``` + +### ChatQnA MegaService ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "model": "'"${OLLAMA_MODEL}"'", "messages": "What is the revenue of Nike in 2023?" }' - ``` -Here is the output for your reference: +Here is the output for reference: ```bash data: b'\n' @@ -713,56 +435,47 @@ data: b'' data: [DONE] ``` -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - - - - - - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - - -::::{tab-set} - -:::{tab-item} Ollama -:sync: Ollama +### NGINX Service +This will ensure the NGINX service is working properly. ```bash -docker compose -f compose.yaml logs +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` -::: -:::: + +The output will be similar to that of the ChatQnA megaservice. ## Launch UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-aipc-ui-server: + chatqna-aipc-ui-server: image: opea/chatqna-ui${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` -### Stop the services +After making this change, rebuild and restart the containers for the change to take effect. + +### Stop the Services + +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/aipc +``` + +To stop and remove all the containers, use the command below: -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} Ollama :sync: Ollama - +To stop and remove all the containers, use the command below: ```bash docker compose -f compose.yaml down ``` + ::: :::: diff --git a/tutorial/ChatQnA/deploy/gaudi.md b/tutorial/ChatQnA/deploy/gaudi.md index 851355cc..4a43de53 100644 --- a/tutorial/ChatQnA/deploy/gaudi.md +++ b/tutorial/ChatQnA/deploy/gaudi.md @@ -1,19 +1,10 @@ # Single node on-prem deployment with vLLM or TGI on Gaudi AI Accelerator -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using vLLM or TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and meta-llama/Meta-Llama-3-8B-Instruct model, -deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA in just 5 minutes -and set up the required hardware and software, please follow the instructions in the -[Getting Started](../../../getting-started/README.md) section. +This section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node vLLM or TGI megaservice solution. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -21,262 +12,54 @@ GenAIComps to deploy a single node vLLM or TGI megaservice solution. 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectordb for RAG and -Meta-Llama-3-8B-Instruct model on Intel Gaudi AI Accelerator. We will go through -how to setup docker container to start a microservices and megaservice . The -solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. There are 2 modes you can -use: - -1. Basic UI -2. Conversational UI - -Conversational UI is optional, but a feature supported in this example if you -are interested to use. - -To summarize, Below is the flow of contents we will be covering in this tutorial: - -1. Prerequisites -2. Prepare (Building / Pulling) Docker images -3. Use case setup -4. Deploy the use case -5. Interacting with ChatQnA deployment +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -The examples utilize model weights from HuggingFace and Langchain. - -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Setup the HuggingFace token +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable -```bash -export host_ip=$(hostname -I | awk '{print $1}') -``` - -Make sure to setup Proxies if you are behind a firewall -```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -``` - -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI (conversational React UI is optional). In total, -there are 8 required and 1 optional docker images. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash -docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -Build vLLM docker image with hpu support -```bash -bash ./comps/llms/text-generation/vllm/langchain/dependency/build_docker_vllm.sh hpu -``` - -Build vLLM Microservice image -```bash -docker build --no-cache -t opea/llm-vllm:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/langchain/Dockerfile . -cd .. -``` -::: -:::{tab-item} TGI -:sync: TGI - -```bash -docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . -``` -::: -:::: - -### Build TEI Gaudi Image - -Since a TEI Gaudi Docker image hasn't been published, we'll need to build it from the [tei-gaudi](https://github.com/huggingface/tei-gaudi) repository. - -```bash -git clone https://github.com/huggingface/tei-gaudi -cd tei-gaudi/ -docker build --no-cache -f Dockerfile-hpu -t opea/tei-gaudi:${RELEASE_VERSION} . -cd .. -``` - - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - -```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build Other Service images - -If you want to enable guardrails microservice in the pipeline, please use the below command instead: - +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -docker build --no-cache -t opea/chatqna-guardrails:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.guardrails . +export host_ip="localhost" ``` -### Build the UI Image - -As mentioned, you can build 2 modes of UI - -*Basic UI* - +Set the NGINX port. ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +# Example: NGINX_PORT=80 +export NGINX_PORT= ``` -*Conversation UI* -If you want a conversational experience with chatqna megaservice. - +For machines behind a firewall, set up the proxy environment variables: ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-conversation-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails,jaeger,prometheus,grafana,gaudi-node-exporter-1 ``` -### Sanity Check -Check if you have the below set of docker images before moving on to the next step: - -::::{tab-set} -:::{tab-item} vllm -:sync: vllm - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/tei-gaudi:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} or opea/chatqna-guardrails:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/vllm:${RELEASE_VERSION} -* opea/llm-vllm:${RELEASE_VERSION} - -::: -:::{tab-item} TGI -:sync: TGI - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/tei-gaudi:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} or opea/chatqna-guardrails:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-tgi:${RELEASE_VERSION} -::: -:::: - -::::: -:::::: - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -292,8 +75,6 @@ with the tools |LLM | vLLM | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::{tab-item} TGI :sync: TGI @@ -307,24 +88,19 @@ environment variable or `compose.yaml` file. |LLM | TGI | meta-llama/Meta-Llama-3-8B-Instruct|OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} vllm @@ -352,180 +128,109 @@ docker compose -f compose_guardrails.yaml up -d ::: :::: -### Validate microservice -#### Check Env Variables -Check the start up log by `docker compose -f compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +### Check Env Variables +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} vllm :sync: vllm -```bash - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d - [+] Running 12/12 - ✔ Network gaudi_default Created 0.1s - ✔ Container tei-embedding-gaudi-server Started 1.3s - ✔ Container vllm-gaudi-server Started 1.3s - ✔ Container tei-reranking-gaudi-server Started 0.8s - ✔ Container redis-vector-db Started 0.7s - ✔ Container reranking-tei-gaudi-server Started 1.7s - ✔ Container retriever-redis-server Started 1.3s - ✔ Container llm-vllm-gaudi-server Started 2.1s - ✔ Container dataprep-redis-server Started 2.1s - ✔ Container embedding-tei-server Started 2.0s - ✔ Container chatqna-gaudi-backend-server Started 2.3s - ✔ Container chatqna-gaudi-ui-server Started 2.6s -``` - + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f compose.yaml up -d + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: `version` is obsolete ::: + :::{tab-item} TGI :sync: TGI -```bash - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d - [+] Running 12/12 - ✔ Network gaudi_default Created 0.1s - ✔ Container tei-reranking-gaudi-server Started 1.1s - ✔ Container tgi-gaudi-server Started 0.8s - ✔ Container redis-vector-db Started 1.5s - ✔ Container tei-embedding-gaudi-server Started 1.1s - ✔ Container retriever-redis-server Started 2.7s - ✔ Container reranking-tei-gaudi-server Started 2.0s - ✔ Container dataprep-redis-server Started 2.5s - ✔ Container embedding-tei-server Started 2.1s - ✔ Container llm-tgi-gaudi-server Started 1.8s - ✔ Container chatqna-gaudi-backend-server Started 2.9s - ✔ Container chatqna-gaudi-ui-server Started 3.3s -``` + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f compose_tgi.yaml up -d + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. + WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. + WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml: `version` is obsolete ::: :::: -#### Check the container status +### Check Container Statuses -Check if all the containers launched via docker compose has started +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -For example, the ChatQnA example starts 11 docker (services), check these docker -containers are all running, i.e, all the containers `STATUS` are `Up` -To do a quick sanity check, try `docker ps -a` to see if all the containers are running. +Run this command to see this info: +```bash +docker ps -a +``` +Sample output: ::::{tab-set} :::{tab-item} vllm :sync: vllm ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -42c8d5ec67e9 opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-gaudi-ui-server -7f7037a75f8b opea/chatqna:${RELEASE_VERSION} "python chatqna.py" About a minute ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-gaudi-backend-server -4049c181da93 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" About a minute ago Up About a minute 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -171816f0a789 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" About a minute ago Up About a minute 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -10ee6dec7d37 opea/llm-vllm:${RELEASE_VERSION} "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-vllm-gaudi-server -ce4e7802a371 opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" About a minute ago Up About a minute 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -be6cd2d0ea38 opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" About a minute ago Up About a minute 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server -cc45ff032e8c opea/tei-gaudi:${RELEASE_VERSION} "text-embeddings-rou…" About a minute ago Up About a minute 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server -4969ec3aea02 opea/vllm-gaudi:${RELEASE_VERSION} "/bin/bash -c 'expor…" About a minute ago Up About a minute 0.0.0.0:8007->80/tcp, :::8007->80/tcp vllm-gaudi-server -0657cb66df78 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -684d3e9d204a ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" About a minute ago Up About a minute 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS + NAMES +eabb930edad6 opea/nginx:latest "/docker-entrypoint.…" 9 seconds ago Up 8 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp + chatqna-gaudi-nginx-server +7e3c16a791b1 opea/chatqna-ui:latest "docker-entrypoint.s…" 9 seconds ago Up 8 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp + chatqna-gaudi-ui-server +482365a6e945 opea/chatqna:latest "python chatqna.py" 9 seconds ago Up 9 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp + chatqna-gaudi-backend-server +1379226ad3ff opea/dataprep:latest "sh -c 'python $( [ …" 9 seconds ago Up 9 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp + dataprep-redis-server +1cebe2d70e40 opea/retriever:latest "python opea_retriev…" 9 seconds ago Up 9 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp + retriever-redis-server +bfe41a5353b6 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 10 seconds ago Up 9 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp + tei-reranking-gaudi-server +11a94e7ce3c9 opea/vllm-gaudi:latest "python3 -m vllm.ent…" 10 seconds ago Up 9 seconds (health: starting) 0.0.0.0:8007->80/tcp, [::]:8007->80/tcp + vllm-gaudi-server +4d7b9aab82b1 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 10 seconds ago Up 9 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0. +0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +9e0d0807bbf6 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 10 seconds ago Up 9 seconds 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp + tei-embedding-gaudi-server ``` ::: :::{tab-item} TGI :sync: TGI ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -0355d705484a opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-gaudi-ui-server -29a7a43abcef opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 2 minutes ago Up 2 minutes 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-gaudi-backend-server -1eb6f5ad6f85 opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-gaudi-server -ad27729caf68 opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server -84f02cf2a904 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -367459f6e65b opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 2 minutes ago Up 2 minutes 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -8c78cde9f588 opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -fa80772de92c ghcr.io/huggingface/tgi-gaudi:2.0.1 "text-generation-lau…" 2 minutes ago Up 2 minutes 0.0.0.0:8005->80/tcp, :::8005->80/tcp tgi-gaudi-server -581687a2cc1a opea/tei-gaudi:${RELEASE_VERSION} "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server -c59178629901 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -5c3a78144498 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +353775bfa0dc opea/nginx:latest "/docker-entrypoint.…" 52 seconds ago Up 50 seconds 0.0.0.0:8010->80/tcp, [::]:8010->80/tcp chatqna-gaudi-nginx-server +c4f75d75f18e opea/chatqna-ui:latest "docker-entrypoint.s…" 52 seconds ago Up 50 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-gaudi-ui-server +4c5dc803c8c8 opea/chatqna:latest "python chatqna.py" 52 seconds ago Up 51 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-gaudi-backend-server +6bdfebe016c0 opea/dataprep:latest "sh -c 'python $( [ …" 52 seconds ago Up 51 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +6fb5264a8465 opea/retriever:latest "python opea_retriev…" 52 seconds ago Up 51 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +1f4f4f691d36 ghcr.io/huggingface/tgi-gaudi:2.0.6 "text-generation-lau…" 55 seconds ago Up 51 seconds 0.0.0.0:8005->80/tcp, [::]:8005->80/tcp tgi-gaudi-server +9c50dfc17428 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 55 seconds ago Up 51 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server +a8de74b4594d ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 55 seconds ago Up 51 seconds 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-gaudi-server +e01438eafa7d redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 55 seconds ago Up 51 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +b8ecf10c0c2d jaegertracing/all-in-one:latest "/go/bin/all-in-one-…" 55 seconds ago Up 51 seconds 0.0.0.0:4317-4318->4317-4318/tcp, [::]:4317-4318->4317-4318/tcp, 14250/tcp, 0.0.0.0:9411->9411/tcp, [::]:9411->9411/tcp, 0.0.0.0:16686->16686/tcp, [::]:16686->16686/tcp, 14268/tcp jaeger ``` ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed. - -### Dataprep Microservice(Optional) - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to get the file on a terminal: - -```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -``` - -Upload the file: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file +Each docker container's log can also be checked using: ```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" +docker logs ``` -#### Delete all uploaded files and links +## Validate Microservices -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` +This section will guide through the various methods for interacting with the deployed microservices. ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:8090/embed \ @@ -534,31 +239,13 @@ curl ${host_ip}:8090/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. - -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -569,24 +256,17 @@ curl http://${host_ip}:7000/v1/retrieval \ -H 'Content-Type: application/json' ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. -The output is retrieved text that relevant to the input data: +The output is retrieved text that is relevant to the input data: ```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } - +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -594,65 +274,27 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - -```bash -curl http://${host_ip}:8000/v1/reranking \ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking \ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - +Sample output: ```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. ### vLLM and TGI Service -In first startup, this service will take more time to download the model files. -After it's finished, the service will be ready. +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. -Try the command below to check whether the LLM serving is ready. +::::{tab-set} -```bash -docker logs ${CONTAINER_ID} | grep Connected -``` +:::{tab-item} vllm +:sync: vllm -If the service is ready, you will get the response like below. +Run the command below to check whether the LLM service is ready. The output should be "Application startup complete." -``` -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +```bash +docker logs vllm-service 2>&1 | grep complete ``` -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm +Run the command below to use the vLLM service to generate text for the input prompt. Sample output is also shown. ```bash curl http://${host_ip}:8007/v1/completions \ @@ -665,19 +307,22 @@ curl http://${host_ip}:8007/v1/completions \ }' ``` -vLLM service generate text for the input prompt. Here is the expected result -from vllm: - -``` +```bash {"id":"cmpl-be8e1d681eb045f082a7b26d5dba42ff","object":"text_completion","created":1726269914,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":6,"total_tokens":38,"completion_tokens":32}}d ``` -**NOTE**: After launch the vLLM, it takes few minutes for vLLM server to load -LLM model and warm up. ::: :::{tab-item} TGI :sync: TGI +Run the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" + +```bash +docker logs tgi-service | grep Connected +``` + +Run the command below to use the TGI service to generate text for the input prompt. Sample output is also shown. + ```bash curl http://${host_ip}:8005/generate \ -X POST \ @@ -685,80 +330,78 @@ curl http://${host_ip}:8005/generate \ -H 'Content-Type: application/json' ``` -TGI service generate text for the input prompt. Here is the expected result from TGI: - ```bash {"generated_text":"Artificial Intelligence (AI) has become a very popular buzzword in the tech industry. While the phrase conjures images of sentient robots and self-driving cars, our current AI landscape is much more subtle. In fact, it most often manifests in the forms of algorithms that help recognize the faces of"} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. ::: :::: +### Dataprep Microservice -### LLM Microservice +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. -This service depends on the above LLM backend service startup. Give it a couple minutes to be ready on the first startup. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` -::::{tab-set} -:::{tab-item} vllm -:sync: vllm +Upload the file: ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,\ - "frequency_penalty":0,"presence_penalty":0, "streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" ``` -For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) -::: -:::{tab-item} TGI -:sync: TGI +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' ``` -For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".) -::: -:::: +The list of uploaded files can be retrieved using this command: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" +``` -You will get generated text from LLM: +To delete the file or link, use the following commands: +#### Delete link ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' Learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' Machine' -data: b' Learning' -data: b' that' -data: b' is' -data: b' concerned' -data: b' with' -data: b' algorithms' -data: b' inspired' -data: b' by' -data: [DONE] +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" +``` + +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" ``` -### MegaService +#### Delete all uploaded files and links + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" +``` +### ChatQnA MegaService +This will ensure the megaservice is working properly. ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' ``` -Here is the output for your reference: - +Here is the output for reference: ```bash data: b'\n' data: b'An' @@ -795,163 +438,39 @@ data: b'' data: [DONE] ``` -#### Guardrail Microservice -If you had enabled Guardrail microservice, access via the below curl command +### NGINX Service +This will ensure the NGINX ervice is working properly. ```bash -curl http://${host_ip}:9090/v1/guardrails\ - -X POST \ - -d '{"text":"How do you buy a tiger in the US?","parameters":{"max_new_tokens":32}}' \ - -H 'Content-Type: application/json' +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` +The output will be similar to that of the ChatQnA megaservice. + ## Launch UI + ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: -```bash - chaqna-gaudi-ui-server: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: +```yaml + chatqna-gaudi-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - - "5173:5173" -``` - -### Conversational UI -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-gaudi-ui-server` service with the `chatqna-gaudi-conversation-ui-server` service as per the config below: -```bash -chaqna-gaudi-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - container_name: chatqna-gaudi-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:5174" - depends_on: - - chaqna-gaudi-backend-server - ipc: host - restart: always -``` -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: -```bash - chaqna-gaudi-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" -``` - -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied -2024-06-05T01:30:30.697123534Z -2024-06-05T01:30:30.697148330Z For more information, try '--help'. - -``` - -The log indicates the `MODEL_ID` is not set. - - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml` - -```yaml - vllm-service: - image: ${REGISTRY:-opea}/vllm-gaudi:${RELEASE_VERSION:-latest} - container_name: vllm-gaudi-server - ports: - - "8007:80" - volumes: - - "./data:/data" - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HABANA_VISIBLE_DEVICES: all - OMPI_MCA_btl_vader_single_copy_mechanism: none - LLM_MODEL_ID: ${LLM_MODEL_ID} - runtime: habana - cap_add: - - SYS_NICE - ipc: host - command: /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $LLM_MODEL_ID --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048" -``` - -::: -:::{tab-item} TGI -:sync: TGI - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose_tgi.yaml` - -```yaml - tgi-service: - image: ghcr.io/huggingface/tgi-gaudi:2.0.1 - container_name: tgi-gaudi-server - ports: - - "8005:80" - volumes: - - "./data:/data" - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HF_HUB_DISABLE_PROGRESS_BARS: 1 - HF_HUB_ENABLE_HF_TRANSFER: 0 - HABANA_VISIBLE_DEVICES: ${llm_service_devices} - OMPI_MCA_btl_vader_single_copy_mechanism: none - runtime: habana - cap_add: - - SYS_NICE - ipc: host - command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` -::: -:::: - - -The input `MODEL_ID` is `${LLM_MODEL_ID}` -Check environment variable `LLM_MODEL_ID` is set correctly, spelled correctly. -Set the `LLM_MODEL_ID` then restart the containers. - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm +After making this change, rebuild and restart the containers for the change to take effect. -```bash -docker compose -f compose.yaml logs -``` -::: -:::{tab-item} TGI -:sync: TGI +## Stop the Services +Navigate to the `docker compose` directory for this hardware platform. ```bash -docker compose -f compose_tgi.yaml logs +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi ``` -::: -:::: - -## Stop the services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} vllm diff --git a/tutorial/ChatQnA/deploy/nvidia.md b/tutorial/ChatQnA/deploy/nvidia.md index 404f7d8c..e65de390 100644 --- a/tutorial/ChatQnA/deploy/nvidia.md +++ b/tutorial/ChatQnA/deploy/nvidia.md @@ -1,16 +1,10 @@ # Single node on-prem deployment with TGI on Nvidia gpu -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and meta-llama/Meta-Llama-3-8B-Instruct model, -deployed on on-prem. +This section covers single-node on-prem deployment of the ChatQnA example using the TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on NVIDIA GPUs. + ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node vLLM or TGI megaservice solution. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -18,207 +12,54 @@ GenAIComps to deploy a single node vLLM or TGI megaservice solution. 4. Reranking 5. LLM with TGI -The solution is aimed to show how to use Redis vectordb for RAG and -meta-llama/Meta-Llama-3-8B-Instruct model on Nvidia GPU. We will go through -how to setup docker container to start a microservices and megaservice . The -solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. There are 2 modes you can -use: - -1. Basic UI -2. Conversational UI - -Conversational UI is optional, but a feature supported in this example if you -are interested to use. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on NVIDIA GPUs. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -The examples utilize model weights from HuggingFace and langchain. +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Setup the HuggingFace token +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable -```bash -export host_ip=$(hostname -I | awk '{print $1}') -``` - -Make sure to setup Proxies if you are behind a firewall -```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -``` - -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling ( maybe in future) relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI (conversational React UI is optional). In total, -there are 8 required and 1 optional docker images. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - -```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash - docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} TGI -:sync: TGI - -```bash -docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . -``` -::: -:::: - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case - -```bash -cd $WORKSPACE/GenAIExamples/ChatQnA -``` - +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +export host_ip="localhost" ``` -### Build Other Service images - -#### Build the UI Image - -As mentioned, you can build 2 modes of UI - -*Basic UI* - +Set the NGINX port. ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +# Example: NGINX_PORT=80 +export NGINX_PORT= ``` -*Conversation UI* -If you want a conversational experience with chatqna megaservice. - +For machines behind a firewall, set up the proxy environment variables: ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-conversation-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service ``` -### Sanity Check -Check if you have the below set of docker images, before moving on to the next step: - -::::{tab-set} - -:::{tab-item} TGI -:sync: TGI - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-tgi:${RELEASE_VERSION} -::: -:::: - -::::: -:::::: - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -234,40 +75,32 @@ with the tools |LLM | TGI | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} TGI :sync: TGI ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/ docker compose -f compose.yaml up -d ``` ::: :::: -### Validate microservice -#### Check Env Variables -Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +### Check Env Variables +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} TGI @@ -286,27 +119,29 @@ The warning messages print out the variables if they are **NOT** set. ::: :::: -#### Check the container status +### Check Container Statuses -Check if all the containers launched via docker compose has started +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -For example, the ChatQnA example starts 11 docker (services), check these docker -containers are all running, i.e, all the containers `STATUS` are `Up` -To do a quick sanity check, try `docker ps -a` to see if all the containers are running +Run this command to see this info: +```bash +docker ps -a +``` +Sample output: ::::{tab-set} :::{tab-item} TGI :sync: TGI ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-ui-server -d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-backend-server -b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-server -24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -24cae0db1a70 opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server -ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server +3b5fa9a722da opea/chatqna-ui:latest "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-ui-server +d3b37f3d1faa opea/chatqna:latest "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-backend-server +b3e1388fa2ca opea/reranking-tei:latest "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-server +24a240f8ad1c opea/retriever-redis:latest "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server +9c0d2a2553e8 opea/embedding-tei:latest "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server +24cae0db1a70 opea/llm-tgi:latest "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server +ea3986c3cf82 opea/dataprep-redis:latest "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-server 4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server @@ -314,84 +149,19 @@ e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypo ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed - -### Dataprep Microservice(Optional) - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to get the file on a terminal: +Each docker container's log can also be checked using: ```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +docker logs ``` -Upload the file: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -Add Knowledge Base via HTTP Links: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` +## Validate Microservices -#### Delete all uploaded files and links +This section will guide through the various methods for interacting with the deployed microservices. -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:8090/embed \ @@ -400,31 +170,13 @@ curl ${host_ip}:8090/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. - -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -433,26 +185,19 @@ curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. -The output is retrieved text that relevant to the input data: -```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. +The output is retrieved text that is relevant to the input data: +```bash +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -460,136 +205,107 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - +Sample output: ```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - -```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - -``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. - ### TGI Service +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. + ::::{tab-set} :::{tab-item} TGI :sync: TGI +Run the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" +```bash +docker logs tgi-service | grep Connected +``` + +Run the command below to use the TGI service to generate text for the input prompt. Sample output is also shown. ```bash curl http://${host_ip}:8008/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?", \ "parameters":{"max_new_tokens":17, "do_sample": true}}' \ -H 'Content-Type: application/json' - ``` -TGI service generates text for the input prompt. Here is the expected result from TGI: - ```bash -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +{"id":"chatcmpl-cc4300a173af48989cac841f54ebca09","object":"chat.completion","created":1743553002,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning is a subfield of machine learning that is inspired by the structure and function","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":32,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. ::: :::: +### Dataprep Microservice -If you get +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf ``` -curl: (7) Failed to connect to 100.81.104.168 port 8008 after 0 ms: Connection refused +Upload the file: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" ``` -and the log shows model warm up, please wait for a while and try it later. - +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' ``` -2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set -2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit. -2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model meta-llama/Meta-Llama-3-8B-Instruct -2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model +The list of uploaded files can be retrieved using this command: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" ``` -### LLM Microservice +To delete the file or link, use the following commands: +#### Delete link ```bash -curl http://${host_ip}:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" +``` +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" ``` -You will get generated text from LLM: +#### Delete all uploaded files and links ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" ``` -### MegaService +### ChatQnA MegaService +This will ensure the megaservice is working properly. ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ - "model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": "What is the revenue of Nike in 2023?" }' - ``` -Here is the output for your reference: - +Here is the output for reference: ```bash data: b'\n' data: b'An' @@ -626,117 +342,40 @@ data: b'' data: [DONE] ``` -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied -2024-06-05T01:30:30.697123534Z -2024-06-05T01:30:30.697148330Z For more information, try '--help'. - -``` - -The log indicates the `MODEL_ID` is not set. - - -::::{tab-set} -:::{tab-item} TGI -:sync: TGI - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/compose.yaml` - -```yaml - tgi-service: - image: ghcr.io/huggingface/text-generation-inference:2.2.0 - container_name: tgi-service - ports: - - "9009:80" - volumes: - - "./data:/data" - shm_size: 1g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HF_HUB_DISABLE_PROGRESS_BARS: 1 - HF_HUB_ENABLE_HF_TRANSFER: 0 - command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0 - -``` -::: -:::: - - -The input `MODEL_ID` is `${LLM_MODEL_ID}` - -Check environment variable `LLM_MODEL_ID` is set correctly, spelled correctly. -Set the `LLM_MODEL_ID` then restart the containers. - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - -::::{tab-set} - -:::{tab-item} TGI -:sync: TGI +### NGINX Service +This will ensure the NGINX ervice is working properly. ```bash -docker compose -f ./docker_compose/nvidia/gpu/compose.yaml logs +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` -::: -:::: + +The output will be similar to that of the ChatQnA megaservice. ## Launch UI ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: +To access the frontend, open the following URL in your browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. If you prefer to use a different to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-ui-server: + chatqna-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` -### Conversational UI +After making this change, rebuild and restart the containers for the change to take effect. -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-ui-server` service with the `chatqna-conversation-ui-server` service as per the config below: -```yaml -chaqna-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - container_name: chatqna-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:5174" - depends_on: - - chaqna-backend-server - ipc: host - restart: always -``` - -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +### Stop the Services -```yaml - chaqna-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/nvidia/gpu ``` -### Stop the services - -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} TGI :sync: TGI diff --git a/tutorial/ChatQnA/deploy/xeon.md b/tutorial/ChatQnA/deploy/xeon.md index fa59120f..955dfce5 100644 --- a/tutorial/ChatQnA/deploy/xeon.md +++ b/tutorial/ChatQnA/deploy/xeon.md @@ -1,19 +1,10 @@ # Single node on-prem deployment with vLLM or TGI on Xeon Scalable processors -This deployment section covers single-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using vLLM or TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and meta-llama/Meta-Llama-3-8B-Instruct model, -deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes -and set up the required hardware and software, please follow the instructions in the -[Getting Started](../../../getting-started/README.md) section. +This section covers single-node on-prem deployment of the ChatQnA example using the vLLM or TGI LLM service. There are several ways to enable RAG with vectordb and LLM models, but this tutorial will be covering how to build an end-to-end ChatQnA pipeline with the Redis vector database and meta-llama/Meta-Llama-3-8B-Instruct model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started Guide](../../../getting-started/README.md). ## Overview -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -GenAIComps to deploy a single node vLLM or TGI megaservice solution. +The OPEA GenAIComps microservices used to deploy a single node vLLM or TGI megaservice solution for ChatQnA are listed below: 1. Data Prep 2. Embedding @@ -21,242 +12,54 @@ GenAIComps to deploy a single node vLLM or TGI megaservice solution. 4. Reranking 5. LLM with vLLM or TGI -The solution is aimed to show how to use Redis vectordb for RAG and -Meta-Llama-3-8B-Instruct model on Intel Xeon Scalable processors. We will go through -how to setup docker container to start a microservices and megaservice . The -solution will then utilize a sample Nike dataset which is in PDF format. Users -can then ask a question about Nike and get a chat-like response by default for -up to 1024 tokens. The solution is deployed with a UI. There are 2 modes you can -use: - -1. Basic UI -2. Conversational UI - -Conversational UI is optional, but a feature supported in this example if you -are interested to use. +This solution is designed to demonstrate the use of Redis vectorDB for RAG and the Meta-Llama-3-8B-Instruct model for LLM inference on Intel® Xeon® Scalable processors. The steps will involve setting up Docker containers, using a sample Nike dataset in PDF format, and posing a question about Nike to receive a response. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. - +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash -# Set workspace -export WORKSPACE= +export WORKSPACE= cd $WORKSPACE - -# Set desired release version - number only -export RELEASE_VERSION= - -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. - -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git -cd GenAIExamples -git checkout tags/v${RELEASE_VERSION} -cd .. -``` - -Setup the HuggingFace token -```bash -export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" -``` - -The example requires you to set the `host_ip` to deploy the microservices on -endpoint enabled with ports. Set the host_ip env variable -```bash -export host_ip=$(hostname -I | awk '{print $1}') -``` - -Make sure to setup Proxies if you are behind a firewall -```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples ``` -## Prepare (Building / Pulling) Docker images - -This step will involve building/pulling relevant docker -images with step-by-step process along with sanity check in the end. For -ChatQnA, the following docker images will be needed: embedding, retriever, -rerank, LLM and dataprep. Additionally, you will need to build docker images for -ChatQnA megaservice, and UI (conversational React UI is optional). In total, -there are 8 required and an optional docker images. - -### Build/Pull Microservice images - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, -you can proceed to the next step where all the necessary containers will -be pulled in from Docker Hub. - -::::: -:::::{tab-item} Build -:sync: Build - -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. - +**(Optional)** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. ```bash -cd $WORKSPACE/GenAIComps -``` - -#### Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile . -``` - -#### Build Embedding Image - -```bash -docker build --no-cache -t opea/embedding-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile . -``` - -#### Build Retriever Image - -```bash - docker build --no-cache -t opea/retriever-redis:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile . -``` - -#### Build Rerank Image - -```bash -docker build --no-cache -t opea/reranking-tei:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile . -``` - -#### Build LLM Image - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -We build the vllm docker image from source -```bash -git clone https://github.com/vllm-project/vllm.git -cd vllm -docker build --no-cache -t opea/vllm:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f Dockerfile.cpu . +export RELEASE_VERSION= # Set desired release version - number only +cd GenAIExamples +git checkout tags/v${RELEASE_VERSION} cd .. ``` -Next, we'll build the vllm microservice docker. This will set the entry point -needed for the vllm to suit the ChatQnA examples -```bash -docker build --no-cache -t opea/llm-vllm:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy \ - -f comps/llms/text-generation/vllm/langchain/Dockerfile.microservice . - -``` -::: -:::{tab-item} TGI -:sync: TGI - -```bash -docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . -``` -::: -:::: - -### Build Mega Service images - -The Megaservice is a pipeline that channels data through different -microservices, each performing varied tasks. We define the different -microservices and the flow of data between them in the `chatqna.py` file, say in -this example the output of embedding microservice will be the input of retrieval -microservice which will in turn passes data to the reranking microservice and so -on. You can also add newer or remove some microservices and customize the -megaservice to suit the needs. - -Build the megaservice image for this use case +Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Request access to the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model. +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` +The example requires setting the `host_ip` to "localhost" to deploy the microservices on endpoints enabled with ports. ```bash -docker build --no-cache -t opea/chatqna:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f Dockerfile . +export host_ip="localhost" ``` -### Build Other Service images - -#### Build the UI Image - -As mentioned, you can build 2 modes of UI - -*Basic UI* - +Set the NGINX port. ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +# Example: NGINX_PORT=80 +export NGINX_PORT= ``` -*Conversation UI* -If you want a conversational experience with chatqna megaservice. - +For machines behind a firewall, set up the proxy environment variables: ```bash -cd $WORKSPACE/GenAIExamples/ChatQnA/ui/ -docker build --no-cache -t opea/chatqna-conversation-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ - --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen,jaeger,prometheus,grafana,xeon-node-exporter-1 ``` -### Sanity Check -Check if you have the below set of docker images, before moving on to the next step: - -::::{tab-set} -:::{tab-item} vllm -:sync: vllm - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/vllm:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-vllm:${RELEASE_VERSION} -::: -:::{tab-item} TGI -:sync: TGI - -* opea/dataprep-redis:${RELEASE_VERSION} -* opea/embedding-tei:${RELEASE_VERSION} -* opea/retriever-redis:${RELEASE_VERSION} -* opea/reranking-tei:${RELEASE_VERSION} -* opea/chatqna:${RELEASE_VERSION} -* opea/chatqna-ui:${RELEASE_VERSION} -* opea/llm-tgi:${RELEASE_VERSION} -::: -:::: - -::::: -:::::: - ## Use Case Setup -As mentioned the use case will use the following combination of the GenAIComps -with the tools +ChatQnA will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. ::::{tab-set} @@ -272,8 +75,6 @@ with the tools |LLM | vLLM | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::{tab-item} TGI :sync: TGI @@ -287,24 +88,19 @@ environment variable or `compose.yaml` file. |LLM | TGI | meta-llama/Meta-Llama-3-8B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | -Tools and models mentioned in the table are configurable either through the -environment variable or `compose.yaml` file. ::: :::: -Set the necessary environment variables to setup the use case. If you want to swap -out models, modify `set_env.sh` before running. +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. ```bash cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon source ./set_env.sh ``` -## Deploy the use case +## Deploy the Use Case -In this tutorial, we will be deploying via docker compose with the provided -YAML file. The docker compose instructions should be starting all the -above mentioned services as containers. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. ::::{tab-set} :::{tab-item} vllm @@ -323,16 +119,13 @@ docker compose -f compose_tgi.yaml up -d ::: :::: -### Validate microservice -#### Check Env Variables +### Check Env Variables +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. ::::{tab-set} :::{tab-item} vllm :sync: vllm - Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. - - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f compose.yaml up -d WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. @@ -343,13 +136,11 @@ The warning messages print out the variables if they are **NOT** set. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml: `version` is obsolete ::: + :::{tab-item} TGI :sync: TGI - Check the start up log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. - - ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d + ubuntu@xeon-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon$ docker compose -f compose_tgi.yaml up -d WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. @@ -358,135 +149,69 @@ The warning messages print out the variables if they are **NOT** set. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string. WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string. - WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml: `version` is obsolete + WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose_tgi.yaml: `version` is obsolete ::: :::: -#### Check the container status +### Check Container Statuses -Check if all the containers launched via docker compose has started +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`. -For example, the ChatQnA example starts 11 docker (services), check these docker -containers are all running, i.e, all the containers `STATUS` are `Up` -To do a quick sanity check, try `docker ps -a` to see if all the containers are running +Run this command to see this info: +```bash +docker ps -a +``` +Sample output: ::::{tab-set} :::{tab-item} vllm :sync: vllm ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server -d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server -b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server -24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -24cae0db1a70 opea/llm-vllm:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-vllm-server -ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -b98fa07a4f5c opea/vllm:${RELEASE_VERSION} "python3 -m vllm.ent…" 32 hours ago Up 2 hours 0.0.0.0:9009->80/tcp, :::9009->80/tcp vllm-service -79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server -4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +25964cd40c51 opea/nginx:latest "/docker-entrypoint.…" 37 minutes ago Up 37 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server +bca19cf35370 opea/chatqna-ui:latest "docker-entrypoint.s…" 37 minutes ago Up 37 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server +e9622436428a opea/chatqna:latest "python chatqna.py" 37 minutes ago Up 37 minutes 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server +514acfb8f398 opea/dataprep:latest "sh -c 'python $( [ …" 37 minutes ago Up 37 minutes 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +dbaf2116ae4b opea/retriever:latest "python opea_retriev…" 37 minutes ago Up 37 minutes 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +82d802dd79c0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 minutes ago Up 37 minutes 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server +20aebf41b92b opea/vllm:latest "python3 -m vllm.ent…" 37 minutes ago Up 37 minutes (unhealthy) 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp vllm-service +590ee468e4b7 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 minutes ago Up 37 minutes 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +df543e8425ea ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 minutes ago Up 37 minutes 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` ::: :::{tab-item} TGI :sync: TGI ```bash -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server -d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server -b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server -24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server -9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server -24cae0db1a70 opea/llm-tgi:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server -ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server -e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db -79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server -4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +f303bf48dd43 opea/nginx:latest "/docker-entrypoint.…" 4 seconds ago Up 3 seconds 0.0.0.0:80->80/tcp, [::]:80->80/tcp chatqna-xeon-nginx-server +0a2597a4baa0 opea/chatqna-ui:latest "docker-entrypoint.s…" 4 seconds ago Up 3 seconds 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp chatqna-xeon-ui-server +5b5a37ba59ed opea/chatqna:latest "python chatqna.py" 4 seconds ago Up 3 seconds 0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp chatqna-xeon-backend-server +b2ec04f4d3d5 opea/dataprep:latest "sh -c 'python $( [ …" 4 seconds ago Up 3 seconds 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server +c6347c8758e4 opea/retriever:latest "python opea_retriev…" 4 seconds ago Up 3 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis-server +13403b62e768 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 4 seconds ago Up 3 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service +00509c41487b redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 4 seconds ago Up 3 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db +3e6e650f73a9 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 4 seconds ago Up 3 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server +105d130b80ac ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 4 seconds ago Up 3 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server ``` ::: :::: -## Interacting with ChatQnA deployment - -This section will walk you through what are the different ways to interact with -the microservices deployed - -### Dataprep Microservice(Optional) - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to get the file on a terminal: +Each docker container's log can also be checked using: ```bash -wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +docker logs ``` -Upload the file: +## Validate Microservices -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. +This section will guide through the various methods for interacting with the deployed microservices. -Add Knowledge Base via HTTP Links: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -```bash -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" -``` - -#### Delete all uploaded files and links - -```bash -curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` ### TEI Embedding Service -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. +The TEI embedding service takes in a string as input, embeds the string into a vector of a specific length determined by the embedding model, and returns this vector. ```bash curl ${host_ip}:6006/embed \ @@ -495,31 +220,13 @@ curl ${host_ip}:6006/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a -vector size of 768. So the output of the curl command is a embedded vector of -length 768. - -### Embedding Microservice -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. +In this example, the embedding model used is `BAAI/bge-base-en-v1.5`, which has a vector size of 768. Therefore, the output of the curl command is a vector of length 768. -```bash -curl http://${host_ip}:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. +To consume the retriever microservice, generate a mock embedding vector with a Python script. The length of the embedding vector is determined by the embedding model. The model is set with the environment variable EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which has a vector size of 768. -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. +Check the vector dimension of the embedding model used and set `your_embedding` dimension equal to it. ```bash export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") @@ -528,26 +235,19 @@ curl http://${host_ip}:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. -The output is retrieved text that relevant to the input data: -```bash -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } +The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top `n` retrieved documents relevant to the input query, and top_n where n refers to the number of documents to be returned. +The output is retrieved text that is relevant to the input data: +```bash +{"id":"b16024e140e78e39a60e8678622be630","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` ### TEI Reranking Service -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. +The TEI Reranking Service reranks the documents returned by the retrieval service. It consumes the query and list of documents and returns the document index in decreasing order of the similarity score. The document corresponding to the index with the highest score is the most relevant document for the input query. + ```bash curl http://${host_ip}:8808/rerank \ -X POST \ @@ -555,179 +255,117 @@ curl http://${host_ip}:8808/rerank \ -H 'Content-Type: application/json' ``` -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### Reranking Microservice - - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -```bash -curl http://${host_ip}:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - +Sample output: ```bash -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - +[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}] ``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. ### vLLM and TGI Service -In first startup, this service will take more time to download the model files. -After it's finished, the service will be ready. - -Try the command below to check whether the LLM serving is ready. - -```bash -docker logs ${CONTAINER_ID} | grep Connected -``` - -If the service is ready, you will get the response like below. - -``` -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected -``` +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. ::::{tab-set} :::{tab-item} vllm :sync: vllm +Run the command below to check whether the LLM service is ready. The output should be "Application startup complete." + ```bash -curl http://${host_ip}:9009/v1/completions \ - -H "Content-Type: application/json" \ - -d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", \ - "prompt": "What is Deep Learning?", \ - "max_tokens": 32, "temperature": 0}' +docker logs vllm-service 2>&1 | grep complete ``` -vLLM service generates text for the input prompt. Here is the expected result -from vllm: +::: +:::{tab-item} TGI +:sync: TGI + +Run the command below to check whether the LLM service is ready. The output should be "INFO text_generation_router::server: router/src/server.rs:2311: Connected" ```bash -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +docker logs tgi-service | grep Connected ``` -**NOTE**: After launch the vLLM, it takes few minutes for vLLM server to load -LLM model and warm up. ::: -:::{tab-item} TGI -:sync: TGI +:::: +Run the command below to use the vLLM or TGI service to generate text for the input prompt. Sample output is also shown. ```bash -curl http://${host_ip}:9009/generate \ +curl http://${host_ip}:9009/v1/chat/completions \ -X POST \ - -d '{"inputs":"What is Deep Learning?", \ - "parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \ -H 'Content-Type: application/json' - ``` -TGI service generate text for the input prompt. Here is the expected result from TGI: - ```bash -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +{"id":"chatcmpl-cc4300a173af48989cac841f54ebca09","object":"chat.completion","created":1743553002,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning is a subfield of machine learning that is inspired by the structure and function","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":32,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. -::: -:::: - +### Dataprep Microservice -### LLM Microservice +The knowledge base can be updated using the dataprep microservice, which extracts text from a variety of data sources, chunks the data, and embeds each chunk using the embedding microservice. Finally, the embedded vectors are stored in the Redis vector database. -This service depends on above LLM backend service startup. It will be ready after long time, -to wait for them being ready in first startup. +`nke-10k-2023.pdf` is Nike's annual report on a form 10-K. Run this command to download the file: +```bash +wget https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf +``` -::::{tab-set} +Upload the file: +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" +``` -:::{tab-item} vllm -:sync: vllm +HTTP links can also be added to the knowledge base. This command adds the opea.dev website. +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' +``` +The list of uploaded files can be retrieved using this command: ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,\ - "frequency_penalty":0,"presence_penalty":0, "streaming":true}' \ - -H 'Content-Type: application/json' +curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" ``` -For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) -::: -:::{tab-item} TGI -:sync: TGI +To delete the file or link, use the following commands: +#### Delete link ```bash -curl http://${host_ip}:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' +# The dataprep service will add a .txt postfix for link file +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" ``` -For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".) -::: -:::: +#### Delete file + +```bash +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` -You will get generated text from LLM: +#### Delete all uploaded files and links ```bash -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] +curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" ``` -### MegaService +### ChatQnA MegaService +This will ensure the megaservice is working properly. ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' - ``` -Here is the output for your reference: - +Here is the output for reference: ```bash data: b'\n' data: b'An' @@ -764,149 +402,40 @@ data: b'' data: [DONE] ``` -## Check docker container log - -Check the log of container by: - -`docker logs -t` - - -Check the log by `docker logs f7a08f9867f9 -t`. - -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied -2024-06-05T01:30:30.697123534Z -2024-06-05T01:30:30.697148330Z For more information, try '--help'. - -``` - -The log indicates the `MODEL_ID` is not set. - - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml` - -```yaml -vllm_service: - image: ${REGISTRY:-opea}/vllm:${TAG:-latest} - container_name: vllm-service - ports: - - "9009:80" - volumes: - - "./data:/data" - shm_size: 128g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - LLM_MODEL_ID: ${LLM_MODEL_ID} - command: --model $LLM_MODEL_ID --host 0.0.0.0 --port 80 - -``` -::: -:::{tab-item} TGI -:sync: TGI - -View the docker input parameters in `$WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/compose_tgi.yaml` - -```yaml - tgi-service: - image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu - container_name: tgi-service - ports: - - "9009:80" - volumes: - - "./data:/data" - shm_size: 1g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - HF_HUB_DISABLE_PROGRESS_BARS: 1 - HF_HUB_ENABLE_HF_TRANSFER: 0 - command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0 - -``` -::: -:::: - - -The input `MODEL_ID` is `${LLM_MODEL_ID}` - -Check environment variable `LLM_MODEL_ID` is set correctly, spelled correctly. -Set the `LLM_MODEL_ID` then restart the containers. - -Also you can check overall logs with the following command, where the -compose.yaml is the mega service docker-compose configuration file. - -::::{tab-set} - -:::{tab-item} vllm -:sync: vllm +### NGINX Service +This will ensure the NGINX ervice is working properly. ```bash -docker compose -f compose.yaml logs +curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` -::: -:::{tab-item} TGI -:sync: TGI -```bash -docker compose -f compose_tgi.yaml logs -``` -::: -:::: +The output will be similar to that of the ChatQnA megaservice. ## Launch UI ### Basic UI -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose.yaml file as shown below: +To access the frontend, open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml - chaqna-xeon-ui-server: + chatqna-xeon-ui-server: image: opea/chatqna-ui:${TAG:-latest} ... ports: - - "5173:5173" + - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` -### Conversational UI +After making this change, rebuild and restart the containers for the change to take effect. -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below: -```yaml -chaqna-xeon-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - container_name: chatqna-xeon-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:5174" - depends_on: - - chaqna-xeon-backend-server - ipc: host - restart: always -``` - -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +## Stop the Services -```yaml - chaqna-xeon-conversation-ui-server: - image: opea/chatqna-conversation-ui:${TAG:-latest} - ... - ports: - - "80:80" +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon ``` -### Stop the services - -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +To stop and remove all the containers, use the command below: ::::{tab-set} :::{tab-item} vllm @@ -920,7 +449,7 @@ docker compose -f compose.yaml down :sync: TGI ```bash -docker compose -f compose.yaml down +docker compose -f compose_tgi.yaml down ``` ::: :::: diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md index 4541d970..8cdefb4e 100644 --- a/tutorial/CodeGen/deploy/xeon.md +++ b/tutorial/CodeGen/deploy/xeon.md @@ -386,4 +386,4 @@ the newly selected model. Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: ```bash docker compose down -``` +``` \ No newline at end of file