Skip to content

Latest commit

 

History

History
278 lines (210 loc) · 10.2 KB

INSTALL_AND_USAGE.md

File metadata and controls

278 lines (210 loc) · 10.2 KB
Decoding ML Logo

📬 Stay Updated

Join Decoding ML for proven content on production-grade AI, GenAI, and information retrieval systems. Every week, straight to your inbox.

Subscribe Now


🚀 Installation and Usage Guide

Get up and running with our Amazon tabular semantic search engine in minutes.

📋 Prerequisites

Local Setup

Install these tools on your machine:

Tool Purpose Version Download Link Notes
Python Programming language runtime = v3.11 Download Core runtime environment
uv Python package installer and virtual environment manager >= v0.4.30 Download Modern replacement for pip/venv/poetry
GNU Make Build or task automation tool >= v3.81 Download Used for running project commands
MongoDB Atlas CLI Interact with MongoDB Atlas from the CLI >= v1.33.0 Download Used for hosting the vector DB

Cloud Services

You'll need access to:

Service Purpose Cost Required Environment Variables Setup Guide
OpenAI API LLM API Pay-per-use OPENAI_API_KEY
OPENAI_MODEL_ID
Quick Start Guide
MongoDB Atlas Vector DB Free tier USE_MONGO_VECTOR_DB
MONGO_CLUSTER_URL
MONGO_DATABASE_NAME
MONGO_CLUSTER_NAME
MONGO_PROJECT_ID
MONGO_API_PUBLIC_KEY
MONGO_API_PRIVATE_KEY
1. Create a free MongoDB Atlas account
2. Create a Cluster
3. Add a Database User
4. Configure a Network Connection
5. Create an API Key
6. Create an empty database

Note: Find all the required environment variables in the .env.example file.

💻 Setup in 4 Steps

1. Install Dependencies

Set up the project environment by running the following:

make install

Test that you have Python 3.11.8 installed in your new uv environment:

uv run python --version
# Output: Python 3.11.8

This command will:

  • Create a virtual environment using uv
  • Activate the virtual environment
  • Install all dependencies from pyproject.toml

Note

Normally, uv will pick the right Python version mentioned in .python-version and install it automatically if it is not on your system. If you are having any issues, explicitly install the right Python version by running make install-python

2. Configure Environment

Before running any components:

  1. Create your environment file:
    cp .env.example .env
  2. Open .env and configure the required credentials following the inline comments (see superlinked_app/config.py for all options).

Important

For quick testing, set USE_MONGO_VECTOR_DB=False to use an in-memory database, otherwise follow Step 3.

3. Configure Vector Search with MongoDB Atlas

Follow these steps to set up MongoDB Atlas for scalable vector search and get all required environment variables.

Tip

If you are more comfortable with a UI, you can also follow the steps from 📋 Prerequisites -> Cloud Services -> MongoDB Atlas, which do the same thing.

  1. Create Account & Install CLI

📚 More on getting started with MongoDB Atlas

  1. Login to Atlas CLI
atlas auth login
  1. Create Free Cluster

Create an M0 (free) cluster in AWS EU West region:

atlas clusters create free-cluster --provider AWS --region EU_WEST_1 --tier M0

Wait for cluster creation to complete and list available clusters:

atlas clusters watch free-cluster
atlas clusters list

Set MONGO_CLUSTER_NAME=free-cluster environment variable.

Important

The free M0 cluster has limitations but is sufficient for testing.

  1. Create Database User

Create database user:

atlas dbusers create --username <your_mongo_database_user> --password <your_mongo_database_password> --role readWriteAnyDatabase

List users:

atlas dbusers list

These credentials will be used in the MONGO_CLUSTER_URL env var.

  1. Configure Network Access

Option 1: Allow access from anywhere (ease of use for development):

atlas accessList create "0.0.0.0/0" --type ipAddress --comment "Allow access from anywhere"

Option 2: Allow only your IP (recommended):

atlas accessList create --currentIp

To list current access list entries:

atlas accessList list

Important

For production, restrict network access to specific IPs.

  1. Create API Keys

Create API key with required permissions:

atlas organizations apiKeys create --desc "Full Access API Key for 'tabular-semantic-search' project" --role ORG_OWNER --role ORG_MEMBER --role ORG_GROUP_CREATOR --role ORG_READ_ONLY

List keys to get the public key:

atlas organizations apiKeys list

Set:

  • MONGO_API_PUBLIC_KEY: Public key from the created API key
  • MONGO_API_PRIVATE_KEY: Private key shown during creation (save it immediately)

Important

Save your API private key immediately after creation - it cannot be retrieved later.

  1. Setting Remaining Environment Variables

Set MONGO_PROJECT_ID:

atlas projects list

Set MONGO_CLUSTER_URL: bash

atlas clusters connectionStrings describe free-cluster

Now set the environment variables as (without mongodb+srv://) MONGO_CLUSTER_URL={YOUR_DATABASE_USER}:{YOUR_DATABASE_PASSWORD}@free-cluster.vhxy1.mongodb.net, where the database user and password are the ones created at point 4.

  1. Create Database

Create the database which is already specified in MONGO_DATABASE_NAME:

make create-mongodb-database

Important

If you are getting SSL handshake errors, turn off your VPN or firewall or try using a different network.

Now go to MongoDB Atlas, navigate to Clusters → Browse Collections to verify that your database was created successfully.

Your final .env file should have these MongoDB-related variables:

USE_MONGO_VECTOR_DB=True
MONGO_CLUSTER_URL=username:[email protected]
MONGO_DATABASE_NAME=your_database_name
MONGO_CLUSTER_NAME=free-cluster
MONGO_PROJECT_ID=your_project_id
MONGO_API_PUBLIC_KEY=your_public_key
MONGO_API_PRIVATE_KEY=your_private_key
  1. Final Thoughts

MongoDB Atlas can also be set up locally using Docker, but Superlinked isn't yet integrated with the local version: more on the local Mongo vector DB

4. Load and Process Your Data

Download and process the dataset sample:

make download-and-process-sample-dataset

We also support the complete dataset, but you need a powerful computer, good internet and patience to run everything on it:

make download-and-process-full-dataset

You should see this structure in your data folder:

data/
├── processed_100_sample.jsonl
├── processed_300_sample.jsonl
├── processed_850_sample.jsonl
├── sample.json
└── sample.json.gz

⚡️ Explore & Run

🔍 Interactive Notebooks

Notebook Description
Dataset exploration Dive into the Amazon ESCI dataset
Tabular semantic search with natural language queries demo See Superlinked in action
Text-to-SQL examples Try LlamaIndex queries

🚀 Launch the Superlinked Server and MongoDB Vector Database

  1. Start it up:
make start-superlinked-server

FastAPI endpoints docs available at http://localhost:8080/docs

  1. From a different terminal, load your data:
make load-data

Go to MongoDB Atlas, navigate to Clusters → Browse Collections → tabular-semantic-search to verify that your vector database was populated successfully.

Note: Give it a few minutes before running the queries (~5 minutes)

  1. Try some queries:
make post-filter-query     
make post-semantic-query   
make similar-item-query
  1. Start the Streamlit UI:
make start-ui

Accessible at http://localhost:8501/

Important

If you are not getting any results when making queries from the CLI or Streamlit app, restart the Superlinked server.