Skip to content

Pinecone Vector Store example Langchain ( free tier working examples )

Notifications You must be signed in to change notification settings

staminna/vectorstorelangchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LangChain Vector Store Examples

This repository contains examples of using vector stores with LangChain for semantic search and retrieval-augmented generation (RAG).

Repository Structure

  • src/vectorstore.py: Implementation of Pinecone vector store
  • src/example.py: Basic example using Pinecone
  • src/test_embeddings.py: Test script for debugging embeddings and search
  • src/check_indexes.py: Utility to check Pinecone indexes
  • src/faiss_example.py: Example using FAISS vector store (local, no API keys needed)
  • src/rag_example.py: Complete RAG example with document chunking and retrieval

Setup

  1. Clone this repository
  2. Install dependencies:
    pip install -r requirements.txt
  3. Copy .env.example to .env and fill in your API keys (only needed for Pinecone examples):
    cp .env.example .env

Environment Variables

For Pinecone examples:

  • PINECONE_API_KEY: Your Pinecone API key
  • PINECONE_ENVIRONMENT: Your Pinecone environment (e.g., "gcp-starter")

Running the Examples

FAISS Example (Recommended)

This example uses FAISS, which runs locally and doesn't require any API keys:

python src/faiss_example.py

RAG Example

This example demonstrates a complete RAG pipeline with document chunking and retrieval:

python src/rag_example.py

Pinecone Example

This example uses Pinecone as the vector store (requires API keys):

python src/example.py

Troubleshooting

If you encounter issues with the Pinecone examples:

  1. Verify your API keys are correct
  2. Check your Pinecone account status and quota
  3. Try the FAISS examples which run locally

Features

  • Document chunking with metadata
  • Vector embeddings using HuggingFace models
  • Semantic search with different vector stores
  • Complete RAG pipeline example
  • Debugging utilities

Requirements

See requirements.txt for full list of dependencies.

Output and expected results

Performing similarity search for query: 'Vector databases' Query embedding shape: 768

Searching with k=1: Results:

  1. Vector stores are useful for semantic search Metadata: {}

Searching with k=2: Results:

  1. Vector stores are useful for semantic search Metadata: {}

  2. Python is a versatile programming language Metadata: {}

Initializing embeddings model...

Creating sample documents... Created 4 document chunks

Creating FAISS vector store... Vector store created successfully

================================================================================ QUERY: What is artificial intelligence and how does it relate to machine learning?

Retrieving relevant documents...

Top 3 most relevant document chunks:

  1. Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.

     The ideal characteristic of artificial intelligence is its ability to rationalize and take 
     actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, 
     focuses on the development of computer programs that can access data and use it to learn for themselves.
     
     Deep learning is a subset of machine learning that has networks capable of learning unsupervised 
     from data that is unstructured or unlabeled.
    

    Source: document_0, Chunk: 0

  2. Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.

     NLP combines computational linguistics—rule-based modeling of human language—with statistical, 
     machine learning, and deep learning models. These technologies enable computers to process 
     human language in the form of text or voice data and to 'understand' its full meaning.
    

    Source: document_3, Chunk: 0

  3. Vector databases are specialized database systems designed to store, manage, and search high-dimensional vectors efficiently. These vectors typically represent embeddings of data such as text, images, or audio in a mathematical space where similar items are located close to each other.

     Vector databases are crucial for applications like semantic search, recommendation systems, 
     and machine learning pipelines where finding similar items quickly is important. They use 
     specialized indexing techniques like approximate nearest neighbor (ANN) algorithms to enable 
     fast similarity searches.
    

    Source: document_2, Chunk: 0

In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.

================================================================================ QUERY: Explain the key features of Python programming language

Retrieving relevant documents...

Top 3 most relevant document chunks:

  1. Python is a high-level, interpreted programming language known for its readability and simplicity. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace.

     Python features a dynamic type system and automatic memory management. It supports multiple 
     programming paradigms, including object-oriented, imperative, functional and procedural, 
     and has a large and comprehensive standard library.
    

    Source: document_1, Chunk: 0

  2. Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.

     NLP combines computational linguistics—rule-based modeling of human language—with statistical, 
     machine learning, and deep learning models. These technologies enable computers to process 
     human language in the form of text or voice data and to 'understand' its full meaning.
    

    Source: document_3, Chunk: 0

  3. Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.

     The ideal characteristic of artificial intelligence is its ability to rationalize and take 
     actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, 
     focuses on the development of computer programs that can access data and use it to learn for themselves.
     
     Deep learning is a subset of machine learning that has networks capable of learning unsupervised 
     from data that is unstructured or unlabeled.
    

    Source: document_0, Chunk: 0

In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.

================================================================================ QUERY: How do vector databases work and what are they used for?

Retrieving relevant documents...

Top 3 most relevant document chunks:

  1. Vector databases are specialized database systems designed to store, manage, and search high-dimensional vectors efficiently. These vectors typically represent embeddings of data such as text, images, or audio in a mathematical space where similar items are located close to each other.

     Vector databases are crucial for applications like semantic search, recommendation systems, 
     and machine learning pipelines where finding similar items quickly is important. They use 
     specialized indexing techniques like approximate nearest neighbor (ANN) algorithms to enable 
     fast similarity searches.
    

    Source: document_2, Chunk: 0

  2. Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.

     NLP combines computational linguistics—rule-based modeling of human language—with statistical, 
     machine learning, and deep learning models. These technologies enable computers to process 
     human language in the form of text or voice data and to 'understand' its full meaning.
    

    Source: document_3, Chunk: 0

  3. Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.

     The ideal characteristic of artificial intelligence is its ability to rationalize and take 
     actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, 
     focuses on the development of computer programs that can access data and use it to learn for themselves.
     
     Deep learning is a subset of machine learning that has networks capable of learning unsupervised 
     from data that is unstructured or unlabeled.
    

    Source: document_0, Chunk: 0

In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.

================================================================================ QUERY: What is NLP and how is it connected to AI?

Retrieving relevant documents...

Top 3 most relevant document chunks:

  1. Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.

     NLP combines computational linguistics—rule-based modeling of human language—with statistical, 
     machine learning, and deep learning models. These technologies enable computers to process 
     human language in the form of text or voice data and to 'understand' its full meaning.
    

    Source: document_3, Chunk: 0

  2. Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.

     The ideal characteristic of artificial intelligence is its ability to rationalize and take 
     actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, 
     focuses on the development of computer programs that can access data and use it to learn for themselves.
     
     Deep learning is a subset of machine learning that has networks capable of learning unsupervised 
     from data that is unstructured or unlabeled.
    

    Source: document_0, Chunk: 0

  3. Python is a high-level, interpreted programming language known for its readability and simplicity. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace.

     Python features a dynamic type system and automatic memory management. It supports multiple 
     programming paradigms, including object-oriented, imperative, functional and procedural, 
     and has a large and comprehensive standard library.
    

    Source: document_1, Chunk: 0

In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.

================================================================================ QUERY: What are the applications of deep learning?

Retrieving relevant documents...

Top 3 most relevant document chunks:

  1. Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.

     The ideal characteristic of artificial intelligence is its ability to rationalize and take 
     actions that have the best chance of achieving a specific goal. Machine learning, a subset of AI, 
     focuses on the development of computer programs that can access data and use it to learn for themselves.
     
     Deep learning is a subset of machine learning that has networks capable of learning unsupervised 
     from data that is unstructured or unlabeled.
    

    Source: document_0, Chunk: 0

  2. Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans using natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of human languages in a valuable way.

     NLP combines computational linguistics—rule-based modeling of human language—with statistical, 
     machine learning, and deep learning models. These technologies enable computers to process 
     human language in the form of text or voice data and to 'understand' its full meaning.
    

    Source: document_3, Chunk: 0

  3. Vector databases are specialized database systems designed to store, manage, and search high-dimensional vectors efficiently. These vectors typically represent embeddings of data such as text, images, or audio in a mathematical space where similar items are located close to each other.

     Vector databases are crucial for applications like semantic search, recommendation systems, 
     and machine learning pipelines where finding similar items quickly is important. They use 
     specialized indexing techniques like approximate nearest neighbor (ANN) algorithms to enable 
     fast similarity searches.
    

    Source: document_2, Chunk: 0

In a complete RAG system, these documents would be passed to an LLM along with the query to generate a comprehensive response.

Pinecone might have some limitations or issues with the free tier that made it difficult to get search results.

FAISS works perfectly as a local alternative that doesn't require any API keys or external services. The RAG example demonstrates a complete pipeline from document chunking to retrieval that can be used as a foundation for building more complex applications.

About

Pinecone Vector Store example Langchain ( free tier working examples )

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages