You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorial/ChatQnA/ChatQnA_Guide.rst
+79-27Lines changed: 79 additions & 27 deletions
Original file line number
Diff line number
Diff line change
@@ -3,14 +3,11 @@
3
3
ChatQnA
4
4
####################
5
5
6
-
.. note:: This guide is in its early development and is a work-in-progress with
7
-
placeholder content.
8
-
9
6
Overview
10
7
********
11
8
12
-
Chatbots are a widely adopted use case for leveraging the powerful chat and
13
-
reasoning capabilities of large language models (LLMs). The ChatQnA example
9
+
Chatbots are a widely adopted use case for leveraging the powerful chat and
10
+
reasoning capabilities of large language models (LLMs). The ChatQnA example
14
11
provides the starting point for developers to begin working in the GenAI space.
15
12
Consider it the “hello world” of GenAI applications and can be leveraged for
16
13
solutions across wide enterprise verticals, both internally and externally.
@@ -38,16 +35,22 @@ generating human-like responses. Developers can easily swap out the generative
38
35
model or vector database with their own custom models or databases. This allows
39
36
developers to build chatbots that are tailored to their specific use cases and
40
37
requirements. By combining the generative model with the vector database, RAG
41
-
can provide accurate and contextually relevant responses specific to your users'
38
+
can provide accurate and contextually relevant responses specific to users'
42
39
queries.
43
40
44
41
The ChatQnA example is designed to be a simple, yet powerful, demonstration of
45
42
the RAG architecture. It is a great starting point for developers looking to
46
43
build chatbots that can provide accurate and up-to-date information to users.
47
44
48
-
To facilitate sharing of individual services across multiple GenAI applications, use the GenAI Microservices Connector (GMC) to deploy your application. Apart from service sharing , it also supports specifying sequential, parallel, and alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching between models used in any stage of a GenAI pipeline. For example, within the ChatQnA pipeline, using GMC one could switch the model used in the embedder, re-ranker, and/or the LLM.
49
-
Upstream Vanilla Kubernetes or Red Hat OpenShift Container
50
-
Platform (RHOCP) can be used with or without GMC, while use with GMC provides additional features.
45
+
To facilitate sharing of individual services across multiple GenAI applications,
46
+
use the GenAI Microservices Connector (GMC) to deploy the application. Apart
47
+
from service sharing , it also supports specifying sequential, parallel, and
48
+
alternative steps in a GenAI pipeline. In so doing, it supports dynamic switching
49
+
between models used in any stage of a GenAI pipeline. For example, within the
50
+
ChatQnA pipeline, using GMC one could switch the model used in the embedder,
51
+
re-ranker, and/or the LLM. Upstream Vanilla Kubernetes or Red Hat OpenShift Container
52
+
Platform (RHOCP) can be used with or without GMC, while use with GMC provides
53
+
additional features.
51
54
52
55
The ChatQnA provides several deployment options, including single-node
53
56
deployments on-premise or in a cloud environment using hardware such as Xeon
@@ -126,7 +129,53 @@ For more details, please refer to the following document:
126
129
Expected Output
127
130
===============
128
131
129
-
TBD
132
+
After launching the ChatQnA application, a curl command can be used to ensure the
133
+
megaservice is working properly. The example below assumes a document containing
134
+
new information is uploaded to the vector database before querying.
"messages": "What is the revenue of Nike in 2023?"
139
+
}'
140
+
141
+
Here is the output for reference:
142
+
.. code-block:: bash
143
+
144
+
data: b'\n'
145
+
data: b'An'
146
+
data: b'swer'
147
+
data: b':'
148
+
data: b' In'
149
+
data: b' fiscal'
150
+
data: b''
151
+
data: b'2'
152
+
data: b'0'
153
+
data: b'2'
154
+
data: b'3'
155
+
data: b','
156
+
data: b' N'
157
+
data: b'I'
158
+
data: b'KE'
159
+
data: b','
160
+
data: b' Inc'
161
+
data: b'.'
162
+
data: b' achieved'
163
+
data: b' record'
164
+
data: b' Rev'
165
+
data: b'en'
166
+
data: b'ues'
167
+
data: b' of'
168
+
data: b' $'
169
+
data: b'5'
170
+
data: b'1'
171
+
data: b'.'
172
+
data: b'2'
173
+
data: b' billion'
174
+
data: b'.'
175
+
data: b'</s>'
176
+
data: [DONE]
177
+
178
+
The UI will show a similar response with formatted output.
130
179
131
180
Validation Matrix and Prerequisites
132
181
===================================
@@ -217,9 +266,9 @@ The gateway serves as the interface for users to access. The gateway routes inco
217
266
Deployment
218
267
**********
219
268
220
-
Here are some deployment options depending on your hardware and environment.
269
+
Here are some deployment options depending on the hardware and environment.
221
270
It includes both single-node and orchestrated multi-node configurations.
222
-
Choose the one that best fits your requirements.
271
+
Choose the one that best fits requirements.
223
272
224
273
Single Node
225
274
***********
@@ -253,21 +302,19 @@ Troubleshooting
253
302
254
303
1. Browser interface https link failed
255
304
256
-
Q:For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection
305
+
Q:For example, started ChatQnA example in IBM Cloud and trying to access the UI interface. By default, typing the :5173 resolves to https://:5173. Chrome shows the following warning message:xx.xx.xx.xx doesn't support a secure connection
257
306
258
-
A: That is because by default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, you can specify the certificate file paths in the MicroService class. For more details, please refer to the `source code <https://github.com/opea-project/GenAIComps/blob/main/comps/cores/mega/micro_service.py#L33>`_.
307
+
A: By default, the browser resolves xx.xx.xx.xx:5173 to https://xx.xx.xx.xx:5173. But to meet security requirements, users need to deploy a certificate to enable HTTPS support in some cloud environments. OPEA provides HTTP services by default,but also supports HTTPS. To enable HTTPS, specify the certificate file paths in the MicroService class. For more details, please refer to the `source code <https://github.com/opea-project/GenAIComps/blob/main/comps/cores/mega/micro_service.py#L33>`_.
259
308
260
309
2. For other troubles, please check the `doc <https://opea-project.github.io/latest/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.html>`_.
261
310
262
311
263
312
Monitoring
264
313
**********
265
314
266
-
Now that you have deployed the ChatQnA example, let's talk about monitoring the performance of the microservices in the ChatQnA pipeline.
267
-
268
-
Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. By monitoring metrics such as latency and throughput, you can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This allows us to proactively address any issues and ensure that the ChatQnA pipeline is running efficiently.
315
+
Monitoring the performance of microservices is crucial for ensuring the smooth operation of the generative AI systems. Monitoring metrics such as latency and throughput can identify bottlenecks, detect anomalies, and optimize the performance of individual microservices. This helps proactively address any issues and ensure that the ChatQnA pipeline is running efficiently.
269
316
270
-
This document will help you understand how to monitor in real time the latency, throughput, and other metrics of different microservices. You will use **Prometheus** and **Grafana**, both open-source toolkits, to collect metrics and visualize them in a dashboard.
317
+
**Prometheus** and **Grafana**, both open-source toolkits, are used to collect metrics including latency and throughput of different microservices in real time, and visualize them in a dashboard.
271
318
272
319
Set Up the Prometheus Server
273
320
============================
@@ -303,7 +350,7 @@ Edit the `prometheus.yml` file:
303
350
304
351
vim prometheus.yml
305
352
306
-
Change the ``job_name`` to the name of the microservice you want to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint.
353
+
Change the ``job_name`` to the name of the microservice to monitor. Also change the ``targets`` to the job target endpoint of that microservice. Make sure the service is running and the port is open, and that it exposes the metrics that follow Prometheus convention at the ``/metrics`` endpoint.
307
354
308
355
Here is an example of exporting metrics data from a TGI microservice to Prometheus:
>Note: Before starting Prometheus, ensure that no other processes are running on the designated port (default is 9090). Otherwise, Prometheus will not be able to scrape the metrics.
348
395
349
-
On the Prometheus UI, you can see the status of the targets and the metrics that are being scraped. You can search for a metrics variable by typing it in the search bar.
396
+
On the Prometheus UI, look at the status of the targets and the metrics that are being scraped. To search for a metrics variable, type it in the search bar.
350
397
351
398
The TGI metrics can be accessed at:
352
399
@@ -385,7 +432,7 @@ Run the Grafana server, without hanging-up the process:
385
432
nohup ./bin/grafana-server &
386
433
387
434
3. Access the Grafana dashboard UI:
388
-
On your browser, access the Grafana dashboard UI at the following URL:
435
+
On a web browser, access the Grafana dashboard UI at the following URL:
389
436
390
437
.. code-block:: bash
391
438
@@ -401,23 +448,28 @@ Log in to Grafana using the default credentials:
401
448
password: admin
402
449
403
450
4. Add Prometheus as a data source:
404
-
You need to configure the data source for Grafana to scrape data from. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``.
451
+
The data source for Grafana needs to be configured to scrape data. Click on the "Data Source" button, select Prometheus, and specify the Prometheus URL ``http://localhost:9090``.
405
452
406
-
Then, you need to upload a JSON file for the dashboard's configuration. You can upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json <https://github.com/huggingface/text-generation-inference/blob/main/assets/tgi_grafana.json>`_
453
+
Then, upload a JSON file for the dashboard's configuration. Upload it in the Grafana UI under ``Home > Dashboards > Import dashboard``. A sample JSON file is supported here: `tgi_grafana.json <https://github.com/huggingface/text-generation-inference/blob/main/assets/tgi_grafana.json>`_
407
454
408
455
5. View the dashboard:
409
-
Finally, open the dashboard in the Grafana UI, and you will see different panels displaying the metrics data.
456
+
Finally, open the dashboard in the Grafana UI to see different panels displaying the metrics data.
410
457
411
-
Taking the TGI microservice as an example, you can see the following metrics:
458
+
Taking the TGI microservice as an example, look at the following metrics:
412
459
* Time to first token
413
460
* Decode per-token latency
414
461
* Throughput (generated tokens/sec)
415
462
* Number of tokens per prompt
416
463
* Number of generated tokens per request
417
464
418
-
You can also monitor the incoming requests to the microservice, the response time per token, etc., in real time.
465
+
Incoming requests to the microservice, the response time per token, etc., can also be monitored in real time.
419
466
420
467
Summary and Next Steps
421
468
=======================
422
469
423
-
TBD
470
+
The ChatQnA application deploys a RAG architecture consisting of the following microservices -
471
+
embedding, vectorDB, retrieval, reranker, and LLM text generation. It is a chatbot that can
472
+
leverage new information from uploaded documents and websites to provide more accurate answers.
473
+
The microservices can be customized by modifying and building them in `GenAIComponents <https://github.com/opea-project/GenAIComps>`_.
474
+
Explore additional `GenAIExamples <https://github.com/opea-project/GenAIExamples>`_ and use them
0 commit comments