Skip to content

Commit 68bfc0a

Browse files
authored
Update and simplify ChatQnA tutorials (#345)
Update and simplify ChatQnA tutorials Signed-off-by: alexsin368 <[email protected]>
1 parent b7f065c commit 68bfc0a

File tree

6 files changed

+672
-2220
lines changed

6 files changed

+672
-2220
lines changed

tutorial/ChatQnA/ChatQnA_Guide.rst

Lines changed: 79 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,11 @@
33
ChatQnA
44
####################
55

6-
.. note:: This guide is in its early development and is a work-in-progress with
7-
placeholder content.
8-
96
Overview
107
********
118

12-
Chatbots are a widely adopted use case for leveraging the powerful chat and
13-
reasoning capabilities of large language models (LLMs). The ChatQnA example
9+
Chatbots are a widely adopted use case for leveraging the powerful chat and
10+
reasoning capabilities of large language models (LLMs). The ChatQnA example
1411
provides the starting point for developers to begin working in the GenAI space.
1512
Consider it the “hello world” of GenAI applications and can be leveraged for
1613
solutions across wide enterprise verticals, both internally and externally.
@@ -38,16 +35,22 @@ generating human-like responses. Developers can easily swap out the generative
3835
model or vector database with their own custom models or databases. This allows
3936
developers to build chatbots that are tailored to their specific use cases and
4037
requirements. By combining the generative model with the vector database, RAG
41-
can provide accurate and contextually relevant responses specific to your users'
38+
can provide accurate and contextually relevant responses specific to users'
4239
queries.
4340

4441
The ChatQnA example is designed to be a simple, yet powerful, demonstration of
4542
the RAG architecture. It is a great starting point for developers looking to
4643
build chatbots that can provide accurate and up-to-date information to users.
4744

48-
To facilitate sharing of individual services across multiple GenAI applications, use the GenAI Microservices Connector (GMC) to deploy your application. Apart from service sharing , it also supports specifying sequential, parallel, and alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching between models used in any stage of a GenAI pipeline. For example, within the ChatQnA pipeline, using GMC one could switch the model used in the embedder, re-ranker, and/or the LLM.
49-
Upstream Vanilla Kubernetes or Red Hat OpenShift Container
50-
Platform (RHOCP) can be used with or without GMC, while use with GMC provides additional features.
45+
To facilitate sharing of individual services across multiple GenAI applications,
46+
use the GenAI Microservices Connector (GMC) to deploy the application. Apart
47+
from service sharing , it also supports specifying sequential, parallel, and
48+
alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching
49+
between models used in any stage of a GenAI pipeline. For example, within the
50+
ChatQnA pipeline, using GMC one could switch the model used in the embedder,
51+
re-ranker, and/or the LLM. Upstream Vanilla Kubernetes or Red Hat OpenShift Container
52+
Platform (RHOCP) can be used with or without GMC, while use with GMC provides
53+
additional features.
5154

5255
The ChatQnA provides several deployment options, including single-node
5356
deployments on-premise or in a cloud environment using hardware such as Xeon
@@ -126,7 +129,53 @@ For more details, please refer to the following document:
126129
Expected Output
127130
===============
128131

129-
TBD
132+
After launching the ChatQnA application, a curl command can be used to ensure the
133+
megaservice is working properly. The example below assumes a document containing
134+
new information is uploaded to the vector database before querying.
135+
.. code-block:: bash
136+
137+
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
138+
"messages": "What is the revenue of Nike in 2023?"
139+
}'
140+
141+
Here is the output for reference:
142+
.. code-block:: bash
143+
144+
data: b'\n'
145+
data: b'An'
146+
data: b'swer'
147+
data: b':'
148+
data: b' In'
149+
data: b' fiscal'
150+
data: b' '
151+
data: b'2'
152+
data: b'0'
153+
data: b'2'
154+
data: b'3'
155+
data: b','
156+
data: b' N'
157+
data: b'I'
158+
data: b'KE'
159+
data: b','
160+
data: b' Inc'
161+
data: b'.'
162+
data: b' achieved'
163+
data: b' record'
164+
data: b' Rev'
165+
data: b'en'
166+
data: b'ues'
167+
data: b' of'
168+
data: b' $'
169+
data: b'5'
170+
data: b'1'
171+
data: b'.'
172+
data: b'2'
173+
data: b' billion'
174+
data: b'.'
175+
data: b'</s>'
176+
data: [DONE]
177+
178+
The UI will show a similar response with formatted output.
130179

131180
Validation Matrix and Prerequisites
132181
===================================
@@ -217,9 +266,9 @@ The gateway serves as the interface for users to access. The gateway routes inco
217266
Deployment
218267
**********
219268

220-
Here are some deployment options depending on your hardware and environment.
269+
Here are some deployment options depending on the hardware and environment.
221270
It includes both single-node and orchestrated multi-node configurations.
222-
Choose the one that best fits your requirements.
271+
Choose the one that best fits requirements.
223272

224273
Single Node
225274
***********
@@ -253,21 +302,19 @@ Troubleshooting
253302

254303
1. Browser interface https link failed
255304

256-
Q:For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection
305+
Q: For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection
257306

258-
A: That is because by default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, you can specify the certificate file paths in the MicroService class. For more details, please refer to the `source code <https://github.com/opea-project/GenAIComps/blob/main/comps/cores/mega/micro_service.py#L33>`_.
307+
A: By default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, specify the certificate file paths in the MicroService class. For more details, please refer to the `source code <https://github.com/opea-project/GenAIComps/blob/main/comps/cores/mega/micro_service.py#L33>`_.
259308

260309
2. For other troubles, please check the `doc <https://opea-project.github.io/latest/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.html>`_.
261310

262311

263312
Monitoring
264313
**********
265314

266-
Now that you have deployed the ChatQnA example, let's talk about monitoring the performance of the microservices in the ChatQnA pipeline.
267-
268-
Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. By monitoring metrics such as latency and throughput, you can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This allows us to proactively address any issues and ensure that the ChatQnA pipeline is running efficiently.
315+
Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. Monitoring metrics such as latency and throughput can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This helps proactively address any issues and ensure that the ChatQnA pipeline is running efficiently.
269316

270-
This document will help you understand how to monitor in real time the latency, throughput, and other metrics of different microservices. You will use **Prometheus** and **Grafana**, both open-source toolkits, to collect metrics and visualize them in a dashboard.
317+
**Prometheus** and **Grafana**, both open-source toolkits, are used to collect metrics including latency and throughput of different microservices in real time, and visualize them in a dashboard.
271318

272319
Set Up the Prometheus Server
273320
============================
@@ -303,7 +350,7 @@ Edit the `prometheus.yml` file:
303350
304351
vim prometheus.yml
305352
306-
Change the ``job_name`` to the name of the microservice you want to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint.
353+
Change the ``job_name`` to the name of the microservice to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint.
307354

308355
Here is an example of exporting metrics data from a TGI microservice to Prometheus:
309356

@@ -346,7 +393,7 @@ nohup ./prometheus --config.file=./prometheus.yml &
346393
347394
>Note: Before starting Prometheus, ensure that no other processes are running on the designated port (default is 9090). Otherwise, Prometheus will not be able to scrape the metrics.
348395

349-
On the Prometheus UI, you can see the status of the targets and the metrics that are being scraped. You can search for a metrics variable by typing it in the search bar.
396+
On the Prometheus UI, look at the status of the targets and the metrics that are being scraped. To search for a metrics variable, type it in the search bar.
350397

351398
The TGI metrics can be accessed at:
352399

@@ -385,7 +432,7 @@ Run the Grafana server, without hanging-up the process:
385432
nohup ./bin/grafana-server &
386433
387434
3. Access the Grafana dashboard UI:
388-
On your browser, access the Grafana dashboard UI at the following URL:
435+
On a web browser, access the Grafana dashboard UI at the following URL:
389436

390437
.. code-block:: bash
391438
@@ -401,23 +448,28 @@ Log in to Grafana using the default credentials:
401448
password: admin
402449
403450
4. Add Prometheus as a data source:
404-
You need to configure the data source for Grafana to scrape data from. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``.
451+
The data source for Grafana needs to be configured to scrape data. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``.
405452

406-
Then, you need to upload a JSON file for the dashboard's configuration. You can upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json <https://github.com/huggingface/text-generation-inference/blob/main/assets/tgi_grafana.json>`_
453+
Then, upload a JSON file for the dashboard's configuration. Upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json <https://github.com/huggingface/text-generation-inference/blob/main/assets/tgi_grafana.json>`_
407454

408455
5. View the dashboard:
409-
Finally, open the dashboard in the Grafana UI, and you will see different panels displaying the metrics data.
456+
Finally, open the dashboard in the Grafana UI to see different panels displaying the metrics data.
410457

411-
Taking the TGI microservice as an example, you can see the following metrics:
458+
Taking the TGI microservice as an example, look at the following metrics:
412459
* Time to first token
413460
* Decode per-token latency
414461
* Throughput (generated tokens/sec)
415462
* Number of tokens per prompt
416463
* Number of generated tokens per request
417464

418-
You can also monitor the incoming requests to the microservice, the response time per token, etc., in real time.
465+
Incoming requests to the microservice, the response time per token, etc., can also be monitored in real time.
419466

420467
Summary and Next Steps
421468
=======================
422469

423-
TBD
470+
The ChatQnA application deploys a RAG architecture consisting of the following microservices -
471+
embedding, vectorDB, retrieval, reranker, and LLM text generation. It is a chatbot that can
472+
leverage new information from uploaded documents and websites to provide more accurate answers.
473+
The microservices can be customized by modifying and building them in `GenAIComponents <https://github.com/opea-project/GenAIComps>`_.
474+
Explore additional `GenAIExamples <https://github.com/opea-project/GenAIExamples>`_ and use them
475+
as starting points for other use cases.

0 commit comments

Comments
 (0)