This repository contains examples of using vector stores with LangChain for semantic search and retrieval-augmented generation (RAG).
src/vectorstore.py
: Implementation of Pinecone vector storesrc/example.py
: Basic example using Pineconesrc/test_embeddings.py
: Test script for debugging embeddings and searchsrc/check_indexes.py
: Utility to check Pinecone indexessrc/faiss_example.py
: Example using FAISS vector store (local, no API keys needed)src/rag_example.py
: Complete RAG example with document chunking and retrieval
- Clone this repository
- Install dependencies:
pip install -r requirements.txt
- Copy
.env.example
to.env
and fill in your API keys (only needed for Pinecone examples):cp .env.example .env
For Pinecone examples:
PINECONE_API_KEY
: Your Pinecone API keyPINECONE_ENVIRONMENT
: Your Pinecone environment (e.g., "gcp-starter")
This example uses FAISS, which runs locally and doesn't require any API keys:
python src/faiss_example.py
This example demonstrates a complete RAG pipeline with document chunking and retrieval:
python src/rag_example.py
This example uses Pinecone as the vector store (requires API keys):
python src/example.py
If you encounter issues with the Pinecone examples:
- Verify your API keys are correct
- Check your Pinecone account status and quota
- Try the FAISS examples which run locally
- Document chunking with metadata
- Vector embeddings using HuggingFace models
- Semantic search with different vector stores
- Complete RAG pipeline example
- Debugging utilities
See requirements.txt
for full list of dependencies.
Performing similarity search for query: 'Vector databases' Query embedding shape: 768
Searching with k=1: Results:
- Vector stores are useful for semantic search Metadata: {}
Searching with k=2: Results:
-
Vector stores are useful for semantic search Metadata: {}
-
Python is a versatile programming language Metadata: {}
Initializing embeddings model...
Creating sample documents... Created 4 document chunks
Creating FAISS vector store... Vector store created successfully
================================================================================ QUERY: What is artificial intelligence and how does it relate to machine learning?
Retrieving relevant documents...
Top 3 most relevant document chunks:
-
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.
The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, focuses on the development of computer programs that can access data and use it to learn for themselves. Deep learning is a subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled.
Source: document_0, Chunk: 0
-
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. These technologies enable computers to process human language in the form of text or voice data and to 'understand' its full meaning.
Source: document_3, Chunk: 0
-
Vector databases are specialized database systems designed to store, manage, and search high-dimensional vectors efficiently. These vectors typically represent embeddings of data such as text, images, or audio in a mathematical space where similar items are located close to each other.
Vector databases are crucial for applications like semantic search, recommendation systems, and machine learning pipelines where finding similar items quickly is important. They use specialized indexing techniques like approximate nearest neighbor (ANN) algorithms to enable fast similarity searches.
Source: document_2, Chunk: 0
In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.
================================================================================ QUERY: Explain the key features of Python programming language
Retrieving relevant documents...
Top 3 most relevant document chunks:
-
Python is a high-level, interpreted programming language known for its readability and simplicity. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace.
Python features a dynamic type system and automatic memory management. It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library.
Source: document_1, Chunk: 0
-
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. These technologies enable computers to process human language in the form of text or voice data and to 'understand' its full meaning.
Source: document_3, Chunk: 0
-
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.
The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, focuses on the development of computer programs that can access data and use it to learn for themselves. Deep learning is a subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled.
Source: document_0, Chunk: 0
In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.
================================================================================ QUERY: How do vector databases work and what are they used for?
Retrieving relevant documents...
Top 3 most relevant document chunks:
-
Vector databases are specialized database systems designed to store, manage, and search high-dimensional vectors efficiently. These vectors typically represent embeddings of data such as text, images, or audio in a mathematical space where similar items are located close to each other.
Vector databases are crucial for applications like semantic search, recommendation systems, and machine learning pipelines where finding similar items quickly is important. They use specialized indexing techniques like approximate nearest neighbor (ANN) algorithms to enable fast similarity searches.
Source: document_2, Chunk: 0
-
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. These technologies enable computers to process human language in the form of text or voice data and to 'understand' its full meaning.
Source: document_3, Chunk: 0
-
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.
The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, focuses on the development of computer programs that can access data and use it to learn for themselves. Deep learning is a subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled.
Source: document_0, Chunk: 0
In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.
================================================================================ QUERY: What is NLP and how is it connected to AI?
Retrieving relevant documents...
Top 3 most relevant document chunks:
-
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. These technologies enable computers to process human language in the form of text or voice data and to 'understand' its full meaning.
Source: document_3, Chunk: 0
-
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.
The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, focuses on the development of computer programs that can access data and use it to learn for themselves. Deep learning is a subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled.
Source: document_0, Chunk: 0
-
Python is a high-level, interpreted programming language known for its readability and simplicity. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace.
Python features a dynamic type system and automatic memory management. It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library.
Source: document_1, Chunk: 0
In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.
================================================================================ QUERY: What are the applications of deep learning?
Retrieving relevant documents...
Top 3 most relevant document chunks:
-
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.
The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, focuses on the development of computer programs that can access data and use it to learn for themselves. Deep learning is a subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled.
Source: document_0, Chunk: 0
-
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. These technologies enable computers to process human language in the form of text or voice data and to 'understand' its full meaning.
Source: document_3, Chunk: 0
-
Vector databases are specialized database systems designed to store, manage, and search high-dimensional vectors efficiently. These vectors typically represent embeddings of data such as text, images, or audio in a mathematical space where similar items are located close to each other.
Vector databases are crucial for applications like semantic search, recommendation systems, and machine learning pipelines where finding similar items quickly is important. They use specialized indexing techniques like approximate nearest neighbor (ANN) algorithms to enable fast similarity searches.
Source: document_2, Chunk: 0
In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.
Pinecone might have some limitations or issues with the free tier that made it difficult to get search results.
FAISS works perfectly as a local alternative that doesn't require any API keys or external services. The RAG example demonstrates a complete pipeline from document chunking to retrieval that can be used as a foundation for building more complex applications.