📬 Stay Updated

Join Decoding ML for proven content on production-grade AI, GenAI, and information retrieval systems. Every week, straight to your inbox.

🚀 Installation and Usage Guide

Get up and running with our Amazon tabular semantic search engine in minutes.

📋 Prerequisites

Local Setup

Install these tools on your machine:

Tool	Purpose	Version	Download Link	Notes
Python	Programming language runtime	= v3.11	Download	Core runtime environment
uv	Python package installer and virtual environment manager	>= v0.4.30	Download	Modern replacement for pip/venv/poetry
GNU Make	Build or task automation tool	>= v3.81	Download	Used for running project commands
MongoDB Atlas CLI	Interact with MongoDB Atlas from the CLI	>= v1.33.0	Download	Used for hosting the vector DB

Cloud Services

You'll need access to:

Service	Purpose	Cost	Required Environment Variables	Setup Guide
OpenAI API	LLM API	Pay-per-use	`OPENAI_API_KEY` `OPENAI_MODEL_ID`	Quick Start Guide
MongoDB Atlas	Vector DB	Free tier	`USE_MONGO_VECTOR_DB` `MONGO_CLUSTER_URL` `MONGO_DATABASE_NAME` `MONGO_CLUSTER_NAME` `MONGO_PROJECT_ID` `MONGO_API_PUBLIC_KEY` `MONGO_API_PRIVATE_KEY`	1. Create a free MongoDB Atlas account 2. Create a Cluster 3. Add a Database User 4. Configure a Network Connection 5. Create an API Key 6. Create an empty database

Note: Find all the required environment variables in the .env.example file.

💻 Setup in 4 Steps

1. Install Dependencies

Set up the project environment by running the following:

make install

Test that you have Python 3.11.8 installed in your new uv environment:

uv run python --version
# Output: Python 3.11.8

This command will:

Create a virtual environment using uv
Activate the virtual environment
Install all dependencies from pyproject.toml

Note

Normally, uv will pick the right Python version mentioned in .python-version and install it automatically if it is not on your system. If you are having any issues, explicitly install the right Python version by running make install-python

2. Configure Environment

Before running any components:

Create your environment file:
```
cp .env.example .env
```
Open .env and configure the required credentials following the inline comments (see superlinked_app/config.py for all options).

Important

For quick testing, set USE_MONGO_VECTOR_DB=False to use an in-memory database, otherwise follow Step 3.

3. Configure Vector Search with MongoDB Atlas

Follow these steps to set up MongoDB Atlas for scalable vector search and get all required environment variables.

Tip

If you are more comfortable with a UI, you can also follow the steps from 📋 Prerequisites -> Cloud Services -> MongoDB Atlas, which do the same thing.

Create Account & Install CLI

Create a free MongoDB Atlas account
Install MongoDB Atlas CLI

📚 More on getting started with MongoDB Atlas

Login to Atlas CLI

atlas auth login

Create Free Cluster

Create an M0 (free) cluster in AWS EU West region:

atlas clusters create free-cluster --provider AWS --region EU_WEST_1 --tier M0

Wait for cluster creation to complete and list available clusters:

atlas clusters watch free-cluster
atlas clusters list

Set MONGO_CLUSTER_NAME=free-cluster environment variable.

Important

The free M0 cluster has limitations but is sufficient for testing.

Create Database User

Create database user:

atlas dbusers create --username <your_mongo_database_user> --password <your_mongo_database_password> --role readWriteAnyDatabase

List users:

atlas dbusers list

These credentials will be used in the MONGO_CLUSTER_URL env var.

Configure Network Access

Option 1: Allow access from anywhere (ease of use for development):

atlas accessList create "0.0.0.0/0" --type ipAddress --comment "Allow access from anywhere"

Option 2: Allow only your IP (recommended):

atlas accessList create --currentIp

To list current access list entries:

atlas accessList list

Important

For production, restrict network access to specific IPs.

Create API Keys

Create API key with required permissions:

atlas organizations apiKeys create --desc "Full Access API Key for 'tabular-semantic-search' project" --role ORG_OWNER --role ORG_MEMBER --role ORG_GROUP_CREATOR --role ORG_READ_ONLY

List keys to get the public key:

atlas organizations apiKeys list

Set:

MONGO_API_PUBLIC_KEY: Public key from the created API key
MONGO_API_PRIVATE_KEY: Private key shown during creation (save it immediately)

Important

Save your API private key immediately after creation - it cannot be retrieved later.

Setting Remaining Environment Variables

Set MONGO_PROJECT_ID:

atlas projects list

Set MONGO_CLUSTER_URL: bash

atlas clusters connectionStrings describe free-cluster

Now set the environment variables as (without mongodb+srv://) MONGO_CLUSTER_URL={YOUR_DATABASE_USER}:{YOUR_DATABASE_PASSWORD}@free-cluster.vhxy1.mongodb.net, where the database user and password are the ones created at point 4.

Create Database

Create the database which is already specified in MONGO_DATABASE_NAME:

make create-mongodb-database

Important

If you are getting SSL handshake errors, turn off your VPN or firewall or try using a different network.

Now go to MongoDB Atlas, navigate to Clusters → Browse Collections to verify that your database was created successfully.

Your final .env file should have these MongoDB-related variables:

USE_MONGO_VECTOR_DB=True
MONGO_CLUSTER_URL=username:password@free-cluster.xxxxx.mongodb.net
MONGO_DATABASE_NAME=your_database_name
MONGO_CLUSTER_NAME=free-cluster
MONGO_PROJECT_ID=your_project_id
MONGO_API_PUBLIC_KEY=your_public_key
MONGO_API_PRIVATE_KEY=your_private_key

Final Thoughts

MongoDB Atlas can also be set up locally using Docker, but Superlinked isn't yet integrated with the local version: more on the local Mongo vector DB ←

4. Load and Process Your Data

Download and process the dataset sample:

make download-and-process-sample-dataset

We also support the complete dataset, but you need a powerful computer, good internet and patience to run everything on it:

make download-and-process-full-dataset

You should see this structure in your data folder:

data/
├── processed_100_sample.jsonl
├── processed_300_sample.jsonl
├── processed_850_sample.jsonl
├── sample.json
└── sample.json.gz

⚡️ Explore & Run

🔍 Interactive Notebooks

Notebook	Description
Dataset exploration	Dive into the Amazon ESCI dataset
Tabular semantic search with natural language queries demo	See Superlinked in action
Text-to-SQL examples	Try LlamaIndex queries

🚀 Launch the Superlinked Server and MongoDB Vector Database

Start it up:

make start-superlinked-server

FastAPI endpoints docs available at http://localhost:8080/docs

From a different terminal, load your data:

make load-data

Go to MongoDB Atlas, navigate to Clusters → Browse Collections → tabular-semantic-search to verify that your vector database was populated successfully.

Note: Give it a few minutes before running the queries (~5 minutes)

Try some queries:

make post-filter-query     
make post-semantic-query   
make similar-item-query

Start the Streamlit UI:

make start-ui

Accessible at http://localhost:8501/

Important

If you are not getting any results when making queries from the CLI or Streamlit app, restart the Superlinked server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL_AND_USAGE.md

INSTALL_AND_USAGE.md

📬 Stay Updated

🚀 Installation and Usage Guide

📋 Prerequisites

Local Setup

Cloud Services

💻 Setup in 4 Steps

1. Install Dependencies

2. Configure Environment

3. Configure Vector Search with MongoDB Atlas

4. Load and Process Your Data

⚡️ Explore & Run

🔍 Interactive Notebooks

🚀 Launch the Superlinked Server and MongoDB Vector Database

Files

INSTALL_AND_USAGE.md

Latest commit

History

INSTALL_AND_USAGE.md

File metadata and controls

📬 Stay Updated

🚀 Installation and Usage Guide

📋 Prerequisites

Local Setup

Cloud Services

💻 Setup in 4 Steps

1. Install Dependencies

2. Configure Environment

3. Configure Vector Search with MongoDB Atlas

4. Load and Process Your Data

⚡️ Explore & Run

🔍 Interactive Notebooks

🚀 Launch the Superlinked Server and MongoDB Vector Database