Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix all broken issues up to now #79

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -136,4 +136,6 @@ dmypy.json
.DS_Store

vectorstore.pkl
langchain.readthedocs.io/
python.langchain.com/

.venv
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ There are two components: ingestion and question-answering.
Ingestion has the following steps:

1. Pull html from documentation site
2. Load html with LangChain's [ReadTheDocs Loader](https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/readthedocs_documentation.html)
3. Split documents with LangChain's [TextSplitter](https://langchain.readthedocs.io/en/latest/reference/modules/text_splitter.html)
2. Load html with LangChain's [ReadTheDocs Loader](https://python.langchain.com/en/latest/modules/document_loaders/examples/readthedocs_documentation.html)
3. Split documents with LangChain's [TextSplitter](https://python.langchain.com/en/latest/reference/modules/text_splitter.html)
4. Create a vectorstore of embeddings, using LangChain's [vectorstore wrapper](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html) (with OpenAI's embeddings and FAISS vectorstore).

Question-Answering has the following steps, all handled by [ChatVectorDBChain](https://langchain.readthedocs.io/en/latest/modules/indexes/chain_examples/chat_vector_db.html):
Question-Answering has the following steps, all handled by [ChatVectorDBChain](https://python.langchain.com/en/latest/modules/indexes/chain_examples/chat_vector_db.html):

1. Given the chat history and new user input, determine what a standalone question would be (using GPT-3).
2. Given that standalone question, look up relevant documents from the vectorstore.
Expand Down
2 changes: 1 addition & 1 deletion ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

def ingest_docs():
"""Get documents from web pages."""
loader = ReadTheDocsLoader("langchain.readthedocs.io/en/latest/")
loader = ReadTheDocsLoader("python.langchain.com/en/latest/")
raw_documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
Expand Down
2 changes: 1 addition & 1 deletion ingest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
# This involves scraping the data from the web and then cleaning up and putting in Weaviate.
# Error if any command fails
set -e
wget -r -A.html https://langchain.readthedocs.io/en/latest/
wget -r -A.html https://python.langchain.com/en/latest/
python3 ingest.py
2 changes: 1 addition & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ async def websocket_endpoint(websocket: WebSocket):
qa_chain = get_chain(vectorstore, question_handler, stream_handler)
# Use the below line instead of the above line to enable tracing
# Ensure `langchain-server` is running
# qa_chain = get_chain(vectorstore, question_handler, stream_handler, tracing=True)
# qa_chain = await get_chain(vectorstore, question_handler, stream_handler, tracing=True)

while True:
try:
Expand Down
14 changes: 7 additions & 7 deletions query_data.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Create a ChatVectorDBChain for question/answering."""
from langchain.callbacks.base import AsyncCallbackManager
from langchain.callbacks.manager import AsyncCallbackManager
from langchain.callbacks.tracers import LangChainTracer
from langchain.chains import ChatVectorDBChain
from langchain.chains import ConversationalRetrievalChain
from langchain.chains.chat_vector_db.prompts import (CONDENSE_QUESTION_PROMPT,
QA_PROMPT)
from langchain.chains.llm import LLMChain
Expand All @@ -12,9 +12,9 @@

def get_chain(
vectorstore: VectorStore, question_handler, stream_handler, tracing: bool = False
) -> ChatVectorDBChain:
"""Create a ChatVectorDBChain for question/answering."""
# Construct a ChatVectorDBChain with a streaming llm for combine docs
) -> ConversationalRetrievalChain:
"""Create a ConversationalRetrievalChain for question/answering."""
# Construct a ConversationalRetrievalChain with a streaming llm for combine docs
# and a separate, non-streaming llm for question generation
manager = AsyncCallbackManager([])
question_manager = AsyncCallbackManager([question_handler])
Expand Down Expand Up @@ -45,8 +45,8 @@ def get_chain(
streaming_llm, chain_type="stuff", prompt=QA_PROMPT, callback_manager=manager
)

qa = ChatVectorDBChain(
vectorstore=vectorstore,
qa = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(),
combine_docs_chain=doc_chain,
question_generator=question_generator,
callback_manager=manager,
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ faiss-cpu
bs4
unstructured
libmagic
tiktoken