-
Notifications
You must be signed in to change notification settings - Fork 98
Code walkthrough #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AmeliaYe
wants to merge
11
commits into
NVIDIA:main
Choose a base branch
from
AmeliaYe:code_walkthrough
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Code walkthrough #59
Changes from 4 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
aa5a777
Auto-generated: README.md and related content
actions-user 035a2f2
created amelia-new and making new pull request
AmeliaTaihui 8a50857
code walk through
AmeliaTaihui c17c824
Update spec.yaml
AmeliaYe 39223f9
Update README.md
sophwats 7af6893
Update README.md
sophwats cce258a
Update README.md
sophwats b628e0f
Update docs/_SUMMARY.md
sophwats bb0d498
Auto-generated: README.md and related content
actions-user 985163d
Merge branch 'main' into code_walkthrough
AmeliaYe a96385a
Auto-generated: README.md and related content
actions-user File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| <?xml version="1.0" encoding="utf-8"?> | ||
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" | ||
| "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | ||
| <html xmlns="http://www.w3.org/1999/xhtml"> | ||
| <head> | ||
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | ||
| <meta http-equiv="Content-Style-Type" content="text/css" /> | ||
| <meta name="generator" content="pandoc" /> | ||
| <meta name="author" content="EPG TME" /> | ||
AmeliaYe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| <meta name="date" content="2024-01-01" /> | ||
| <title>How to build a RAG Chain</title> | ||
| <style type="text/css"> | ||
| code{white-space: pre-wrap;} | ||
| span.smallcaps{font-variant: small-caps;} | ||
| span.underline{text-decoration: underline;} | ||
| div.column{display: inline-block; vertical-align: top; width: 50%;} | ||
| div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} | ||
| ul.task-list{list-style: none;} | ||
| </style> | ||
| <link rel="stylesheet" type="text/css" media="screen, projection, print" | ||
| href="https://www.w3.org/Talks/Tools/Slidy2/styles/slidy.css" /> | ||
| <script src="https://www.w3.org/Talks/Tools/Slidy2/scripts/slidy.js" | ||
| charset="utf-8" type="text/javascript"></script> | ||
| </head> | ||
| <body> | ||
| <div class="slide titlepage"> | ||
| <h1 class="title">How to build a RAG Chain</h1> | ||
| <p class="author"> | ||
| EPG TME | ||
| </p> | ||
| <p class="date">2024</p> | ||
| </div> | ||
| <div id="imports" class="title-slide slide section level1"> | ||
| <h1>Imports</h1> | ||
|
|
||
| </div> | ||
| <div id="langchain-nvidia-integration" class="slide section level2"> | ||
| <h1>Langchain NVIDIA Integration</h1> | ||
| <ul> | ||
| <li>from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank</li> | ||
| <li>explanation:</li> | ||
| </ul> | ||
| <p>LangChain is a framework for developing applications powered by large language models (LLMs). In the process of developing a LLM application, there are many things to consider: which language model to choose, how to load documents, how to embed and retrieve loaded documents, etc. Here LangChain acts as a orchestrator of all the components in the development process and unify them in a single framework to make them compatible with one another.</p> | ||
| <p>Here the langchain_nvidia_ai_endpoints is one of the integration provided by LangChian. NVIDIA wrapped popular language models in NIMs to make them optimized to deliver the best performance on NVIDIA accelerated infrastructure. One easy way of using NIM is to do it through LangChian endpoint.</p> | ||
| <p>NIMs provide easy, consistent, and familiar APIs for running optimized inference on an AI models you want.</p> | ||
| </div> | ||
| <div id="from-langchain_nvidia_ai_endpoints-import-chatnvidia" class="slide section level2"> | ||
| <h1>from langchain_nvidia_ai_endpoints import ChatNVIDIA</h1> | ||
| <ul> | ||
| <li>This class provides access to a NVIDIA NIM for chat. By default, it connects to a hosted NIM, but can be configured to connect to a local NIM using the base_url parameter.</li> | ||
| </ul> | ||
| </div> | ||
| <div id="other-langchain-libraries" class="slide section level2"> | ||
| <h1>Other LangChain Libraries</h1> | ||
| <ul> | ||
| <li><p>from langchain_core.documents import Document</p></li> | ||
| <li><p>from langchain.retrievers import ContextualCompressionRetriever</p></li> | ||
| <li><p>from langchain_milvus.vectorstores.milvus import Milvus</p></li> | ||
| <li><p>from langchain_community.chat_message_histories import RedisChatMessageHistory</p></li> | ||
| <li><p>from langchain_core.runnables.history import RunnableWithMessageHistory</p></li> | ||
| <li><p>from langchain_core.output_parsers import StrOutputParser</p></li> | ||
| <li><p>from langchain_core.runnables import RunnablePassthrough, chain</p></li> | ||
| </ul> | ||
| <p>Here we import other langchain packages. This is a good opportunity to explain other components in our development of a LLM chain.</p> | ||
| </div> | ||
| <div id="retriever" class="slide section level2"> | ||
| <h1>retriever</h1> | ||
| <p>Packages that are related to retrievers are: - from langchain_core.documents import Document - from langchain.retrievers import ContextualCompressionRetriever - from langchain_milvus.vectorstores.milvus import Milvus</p> | ||
| <ul> | ||
| <li><p>A retriever is an interface that returns documents given an unstructured query. In most cases, a retriever relies on a vectorestore. Retriever retrieves documents from a database that stores the digital representation(vector) of documents.</p></li> | ||
| <li><p>Document in LangChain is the class for storing a piece of text and associated information.</p></li> | ||
| <li><p>The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether. Here we can use a NVIDIA infrastructure optimized NIM - reranking NIM as a compressor.</p></li> | ||
| <li><p>Milvus is a database that store, index, and manage embedding vectors generated by machine learning models. We choose Milvus here because it is GPU accelarated.</p></li> | ||
| </ul> | ||
| </div> | ||
| <div id="chat-history" class="slide section level2"> | ||
| <h1>Chat history</h1> | ||
| <ul> | ||
| <li><p>from langchain_community.chat_message_histories import RedisChatMessageHistory</p></li> | ||
| <li><p>from langchain_core.runnables.history import RunnableWithMessageHistory</p></li> | ||
| <li><p>The chat history is sequence of messages. Each message has a role (e.g., “user”, “assistant”, ie, from human or from the model), content (e.g., text, multimodal data), and additional metadata.</p></li> | ||
| <li><p>use the RedisChatMessageHistory class from the langchain-redis package to store and manage chat message history using Redis. Together with RunnableWithMessageHistory, they keep track of the message history.</p></li> | ||
| </ul> | ||
| </div> | ||
|
|
||
| <div id="actual-code-for-the-chain" class="title-slide slide section level1"> | ||
| <h1>Actual Code for the chain</h1> | ||
|
|
||
| </div> | ||
| <div id="embedding-model" class="slide section level2"> | ||
| <h1>Embedding Model</h1> | ||
| <ul> | ||
| <li><p>embedding_model = NVIDIAEmbeddings( model=app_config.embedding_model.name, base_url=str(app_config.embedding_model.url), api_key=app_config.nvidia_api_key, )</p></li> | ||
| <li><p>This creates an embedding model using NVIDIA’s embeddings, which convert text or other data types into numerica</p></li> | ||
| </ul> | ||
| </div> | ||
| <div id="vector-store" class="slide section level2"> | ||
| <h1>Vector store</h1> | ||
| <ul> | ||
| <li><p>vector_store = Milvus( embedding_function=embedding_model, connection_args={“uri”: app_config.milvus.url}, collection_name=app_config.milvus.collection_name, auto_id=True, timeout=10, )</p></li> | ||
| <li><p>This block initializes a vector store using Milvus, a database optimized for vector-based retrieval.</p></li> | ||
| </ul> | ||
| </div> | ||
| <div id="vector-store-1" class="slide section level2"> | ||
| <h1>Vector store</h1> | ||
| <ul> | ||
| <li><p>retriever = vector_store.as_retriever()</p></li> | ||
| <li><p>This converts the vector_store into a retriever object, enabling it to perform similarity searches. Given a query, the retriever can find vectors in Milvus that are close (in semantic space) to the query’s embedding.</p></li> | ||
| <li><p>reranker = NVIDIARerank( model=app_config.reranking_model.name, base_url=str(app_config.reranking_model.url), api_key=app_config.nvidia_api_key, )</p></li> | ||
| <li><p>reranking_retriever = ContextualCompressionRetriever(base_compressor=reranker, base_retriever=retriever)</p></li> | ||
| <li><p>This defines a final retrieval pipeline combining the retriever with the reranker. ContextualCompressionRetriever uses the reranker (base_compressor) to compress and order the results from the initial retrieval step (base_retriever). This pipeline returns the most relevant results based on the reranking model’s assessment.</p></li> | ||
| </ul> | ||
| </div> | ||
| <div id="document-formatting" class="slide section level2"> | ||
| <h1>Document Formatting</h1> | ||
| <ul> | ||
| <li>def format_docs(docs: list[Document]) -> str: ""“Take in a list of docs and concatenate the content, separating by newlines.”"" return “”.join(doc.page_content for doc in docs)</li> | ||
| <li>format_docs is a helper function that takes a list of documents (docs) and concatenates their content with two newline characters in between. This prepares documents for output in a readable format for the user.</li> | ||
| </ul> | ||
| </div> | ||
| <div id="language-model-initialization" class="slide section level2"> | ||
| <h1>Language Model Initialization:</h1> | ||
| <ul> | ||
| <li>llm = ChatNVIDIA( model=app_config.llm_model.name, curr_mode=“nim”, base_url=str(app_config.llm_model.url), api_key=app_config.nvidia_api_key, )</li> | ||
| <li>Initializes an NVIDIA-powered language model (ChatNVIDIA)</li> | ||
| </ul> | ||
| </div> | ||
| <div id="document-retrieval-function" class="slide section level2"> | ||
| <h1>Document Retrieval Function</h1> | ||
| <ul> | ||
| <li><p><span class="citation">@chain</span> async def retrieve_context(msg, config) -> str: ""“The Retrieval part of the RAG chain.”"" use_kb = msg[“use_kb”] use_reranker = msg[“use_reranker”] question = msg[“question”]</p> | ||
| <p>if not use_kb: return ""</p> | ||
| <p>if use_reranker: return (reranking_retriever | format_docs).invoke(question, config)</p> | ||
| <p>return (retriever | format_docs).invoke(question, config)</p></li> | ||
| <li><p>This asynchronous function (retrieve_context) retrieves relevant documents based on the user’s question.</p></li> | ||
| <li><p>If use_kb is False, it returns an empty string (no retrieval). If use_reranker is True, it uses the reranking_retriever, which rerank the retrieved documents for accuracy, then formats the results. Otherwise, it uses the basic retriever without reranking</p></li> | ||
| </ul> | ||
| </div> | ||
| <div id="question-parsing-and-condensing" class="slide section level2"> | ||
| <h1>Question Parsing and Condensing</h1> | ||
| <ul> | ||
| <li><p><span class="citation">@chain</span> async def question_parsing(msg, config) -> str: ""“Condense the question with chat history”""</p> | ||
| <p>condense_question_prompt = prompts.CONDENSE_QUESTION_TEMPLATE.with_config(run_name=“condense_question_prompt”) condensed_chain = condense_question_prompt | llm | StrOutputParser().with_config(run_name=“condense_question_chain”) if msg[“history”]: return condensed_chain.invoke(msg, config) return msg[“question”]</p></li> | ||
| <li><p>This function condenses the user’s question along with any existing chat history to maintain context.</p></li> | ||
| </ul> | ||
| </div> | ||
| <div id="combining-the-chain" class="slide section level2"> | ||
| <h1>Combining the Chain</h1> | ||
| <ul> | ||
| <li>my_chain = ( { “context”: retrieve_context, “question”: question_parsing, “history”: itemgetter(“history”), } | RunnablePassthrough().with_config(run_name=“LLM Prompt Input”) | prompts.CHAT_PROMPT | llm )</li> | ||
| <li>This combines the components into a chain for question-answering: retrieve_context fetches relevant documents. question_parsing formats the question with history. RunnablePassthrough and CHAT_PROMPT set up the prompt format for the language model (llm) to process and respond.</li> | ||
| </ul> | ||
| </div> | ||
| <div id="configuring-message-history" class="slide section level2"> | ||
| <h1>Configuring Message History</h1> | ||
| <ul> | ||
| <li><p>my_chain = RunnableWithMessageHistory( my_chain, lambda session_id: RedisChatMessageHistory(session_id, url=str(app_config.redis_dsn)), input_messages_key=“question”, output_messages_key=“output”, history_messages_key=“history”, ).with_types(input_type=ChainInputs)</p></li> | ||
| <li><p>Wraps my_chain with message history using Redis, storing the chat history under a specific session ID for continuity.</p></li> | ||
| </ul> | ||
| </div> | ||
| </body> | ||
| </html> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.