Skip to content

Commit 5e943fe

Browse files
authored
Merge pull request #92 from marklogic/feature/remove-langchain-example
Updated example project to point to ai-examples repo
2 parents 9e94b5e + 600fe1c commit 5e943fe

23 files changed

+1
-1044
lines changed

examples/langchain/.gitignore

-4
This file was deleted.

examples/langchain/README.md

+1-166
Original file line numberDiff line numberDiff line change
@@ -1,166 +1 @@
1-
# Example langchain retriever
2-
3-
This project demonstrates one approach for implementing a
4-
[langchain retriever](https://python.langchain.com/docs/modules/data_connection/)
5-
that allows for
6-
[Retrieval Augmented Generation (RAG)](https://python.langchain.com/docs/use_cases/question_answering/)
7-
to be supported via MarkLogic and the MarkLogic Python Client. This example uses the same data as in
8-
[the langchain RAG quickstart guide](https://python.langchain.com/docs/use_cases/question_answering/quickstart),
9-
but with the data having first been loaded into MarkLogic.
10-
11-
**This is only intended as an example** of how easily a langchain retriever can be developed
12-
using the MarkLogic Python Client. The queries in this example are simple and naturally
13-
do not have any knowledge of how your data is modeled in MarkLogic. You are encouraged to use
14-
this as an example for developing your own retriever, where you can build a query based on a
15-
question submitted to langchain that fully leverages the indexes and data models in your MarkLogic
16-
application. Additionally, please see the
17-
[langchain documentation on splitting text](https://python.langchain.com/docs/modules/data_connection/document_transformers/). You may need to restructure your data so that you have a larger number of
18-
smaller documents in your database so that you do not exceed the limit that langchain imposes on how
19-
much data a retriever can return.
20-
21-
# Setup
22-
23-
To try out this project, use [docker-compose](https://docs.docker.com/compose/) to instantiate a new MarkLogic
24-
instance with port 8003 available (you can use your own MarkLogic instance too, just be sure that port 8003
25-
is available):
26-
27-
docker-compose up -d --build
28-
29-
## Deploy With Gradle
30-
31-
Then deploy a small REST API application to MarkLogic, which includes a basic non-admin MarkLogic user
32-
named `langchain-user`:
33-
34-
./gradlew -i mlDeploy
35-
36-
## Install Python Libraries
37-
38-
Next, create a new Python virtual environment - [pyenv](https://github.com/pyenv/pyenv) is recommended for this -
39-
and install the
40-
[langchain example dependencies](https://python.langchain.com/docs/use_cases/question_answering/quickstart#dependencies),
41-
along with the MarkLogic Python Client:
42-
43-
pip install -U langchain langchain_openai langchain-community langchainhub openai chromadb bs4 marklogic_python_client
44-
45-
## Load Sample Data
46-
47-
Then run the following Python program to load text data from the langchain quickstart guide
48-
into two different collections in the `langchain-test-content` database:
49-
50-
python load_data.py
51-
52-
## Create Python Environment File
53-
54-
Create a ".env" file to hold your AzureOpenAI environment values. It should look
55-
something like this.
56-
```
57-
OPENAI_API_VERSION=2023-12-01-preview
58-
AZURE_OPENAI_ENDPOINT=<Your Azure OpenAI Endpoint>
59-
AZURE_OPENAI_API_KEY=<Your Azure OpenAI API Key>
60-
AZURE_LLM_DEPLOYMENT_NAME=gpt-test1-gpt-35-turbo
61-
AZURE_LLM_DEPLOYMENT_MODEL=gpt-35-turbo
62-
```
63-
64-
# Testing the retriever
65-
66-
## Testing using a retriever with a basic query
67-
68-
You are now ready to test the example retriever. Run the following to ask a question
69-
with the results augmented via the `marklogic_similar_query_retriever.py` module in this
70-
project:
71-
72-
python ask_similar_query.py "What is task decomposition?" posts
73-
74-
The retriever uses a [cts.similarQuery](https://docs.marklogic.com/cts.similarQuery) to
75-
select from the documents loaded via `load_data.py`. It defaults to a page length of 10.
76-
You can change this by providing a command line argument - e.g.:
77-
78-
python ask_similar_query.py "What is task decomposition?" posts 15
79-
80-
Example of a question for the "sotu" (State of the Union speech) collection:
81-
82-
python ask_similar_query.py "What are economic sanctions?" sotu 20
83-
84-
To use a word query instead of a similar query, along with a set of drop words, specify
85-
"word" as the 4th argument:
86-
87-
python ask_similar_query.py "What are economic sanctions?" sotu 20 word
88-
89-
## Testing using a retriever with a contextual query
90-
91-
There may be times when your langchain application needs to use both a question and a
92-
structured query during the document retrieval process. To see an example of this, run
93-
the following to ask a question. That question is combined with a hard-coded structured
94-
query using the `marklogic_contextual_query_retriever.py` module in this project.
95-
96-
python ask_contextual_query.py "What is task decomposition?" posts
97-
98-
This retriever builds a term-query using words from the question. Then the term-query is
99-
added to the structured query and the merged query is used to select from the documents
100-
loaded via `load_data.py`.
101-
102-
## Testing using MarkLogic 12EA Vector Search
103-
104-
### MarkLogic 12EA Setup
105-
106-
To try out this functionality out, you will need acces to an instance of MarkLogic 12
107-
(currently internal or Early Access only).
108-
<TODO>Add info to get ML12</TODO>
109-
You may use docker
110-
[docker-compose](https://docs.docker.com/compose/) to instantiate a new MarkLogic
111-
instance with port 8003 available (you can use your own MarkLogic instance too, just be
112-
sure that port 8003 is available):
113-
114-
docker-compose -f docker-compose-12.yml up -d --build
115-
116-
### Deploy With Gradle
117-
118-
You will also need to deploy the application. However, for this example, you will need
119-
to include an additional switch on the command line to deploy a TDE schema that takes
120-
advantage of the vector capabilities in MarkLogic 12.
121-
122-
./gradlew -i mlDeploy -PmlSchemasPath=src/main/ml-schemas-12
123-
124-
### Install Python Libraries
125-
126-
As above, if you have not yet installed the Python libraries, install this with pip:
127-
```
128-
pip install -U langchain langchain_openai langchain-community langchainhub openai chromadb bs4 marklogic_python_client
129-
```
130-
131-
### Create Python Environment File
132-
The Python script for this example also generates LLM embeddings and includes them in
133-
the documents stored in MarkLogic. In order to generate the embeddings, you'll need to
134-
add the following environment variables (with your values) to the .env file created
135-
above.
136-
137-
```
138-
AZURE_EMBEDDING_DEPLOYMENT_NAME=text-test-embedding-ada-002
139-
AZURE_EMBEDDING_DEPLOYMENT_MODEL=text-embedding-ada-002
140-
```
141-
142-
### Load Sample Data
143-
144-
Then run the following Python program to load text data from the langchain quickstart
145-
guide into two different collections in the `langchain-test-content` database. Note that
146-
this script is different than the one in the earlier setup section and loads the data
147-
into different collections.
148-
149-
```
150-
python load_data_with_embeddings.py
151-
```
152-
153-
### Running the Vector Query
154-
155-
You are now ready to test the example vector retriever. Run the following to ask a
156-
question with the results augmented via the `marklogic_vector_query_retriever.py` module
157-
in this project:
158-
159-
python ask_vector_query.py "What is task decomposition?" posts_with_embeddings
160-
161-
This retriever searches MarkLogic for candidate documents, and defaults to
162-
using the new score-bm25 scoring method in MarkLogic 12EA. If preferred, you can adjust
163-
this to one of the other scoring methods. After retrieving candidate documents based on
164-
the CTS search, the retriever uses the new vector functionality to sort the documents
165-
based on cosine similarity to the user question, and then returns the top N documents
166-
for the retriever to package up.
1+
This example project has been moved to the [MarkLogic AI examples repository](https://github.com/marklogic/marklogic-ai-examples).

examples/langchain/ask_contextual_query.py

-72
This file was deleted.

examples/langchain/ask_similar_query.py

-48
This file was deleted.

examples/langchain/ask_vector_query.py

-53
This file was deleted.

examples/langchain/build.gradle

-4
This file was deleted.

examples/langchain/docker-compose-12.yml

-17
This file was deleted.

examples/langchain/docker-compose.yml

-17
This file was deleted.

examples/langchain/gradle.properties

-4
This file was deleted.
Binary file not shown.

0 commit comments

Comments
 (0)