![]() |
Join Decoding ML for proven content on production-grade AI, GenAI, and information retrieval systems. Every week, straight to your inbox. |
Get up and running with our Amazon tabular semantic search engine in minutes.
Install these tools on your machine:
Tool | Purpose | Version | Download Link | Notes |
---|---|---|---|---|
Python | Programming language runtime | = v3.11 | Download | Core runtime environment |
uv | Python package installer and virtual environment manager | >= v0.4.30 | Download | Modern replacement for pip/venv/poetry |
GNU Make | Build or task automation tool | >= v3.81 | Download | Used for running project commands |
MongoDB Atlas CLI | Interact with MongoDB Atlas from the CLI | >= v1.33.0 | Download | Used for hosting the vector DB |
You'll need access to:
Service | Purpose | Cost | Required Environment Variables | Setup Guide |
---|---|---|---|---|
OpenAI API | LLM API | Pay-per-use | OPENAI_API_KEY OPENAI_MODEL_ID |
Quick Start Guide |
MongoDB Atlas | Vector DB | Free tier | USE_MONGO_VECTOR_DB MONGO_CLUSTER_URL MONGO_DATABASE_NAME MONGO_CLUSTER_NAME MONGO_PROJECT_ID MONGO_API_PUBLIC_KEY MONGO_API_PRIVATE_KEY |
1. Create a free MongoDB Atlas account 2. Create a Cluster 3. Add a Database User 4. Configure a Network Connection 5. Create an API Key 6. Create an empty database |
Note: Find all the required environment variables in the
.env.example
file.
Set up the project environment by running the following:
make install
Test that you have Python 3.11.8 installed in your new uv
environment:
uv run python --version
# Output: Python 3.11.8
This command will:
- Create a virtual environment using
uv
- Activate the virtual environment
- Install all dependencies from
pyproject.toml
Note
Normally, uv
will pick the right Python version mentioned in .python-version
and install it automatically if it is not on your system. If you are having any issues, explicitly install the right Python version by running make install-python
Before running any components:
- Create your environment file:
cp .env.example .env
- Open
.env
and configure the required credentials following the inline comments (see superlinked_app/config.py for all options).
Important
For quick testing, set USE_MONGO_VECTOR_DB=False
to use an in-memory database, otherwise follow Step 3
.
Follow these steps to set up MongoDB Atlas for scalable vector search and get all required environment variables.
Tip
If you are more comfortable with a UI, you can also follow the steps from 📋 Prerequisites -> Cloud Services -> MongoDB Atlas, which do the same thing.
- Create Account & Install CLI
📚 More on getting started with MongoDB Atlas
- Login to Atlas CLI
atlas auth login
- Create Free Cluster
Create an M0 (free) cluster in AWS EU West region:
atlas clusters create free-cluster --provider AWS --region EU_WEST_1 --tier M0
Wait for cluster creation to complete and list available clusters:
atlas clusters watch free-cluster
atlas clusters list
Set MONGO_CLUSTER_NAME=free-cluster
environment variable.
Important
The free M0 cluster has limitations but is sufficient for testing.
- Create Database User
Create database user:
atlas dbusers create --username <your_mongo_database_user> --password <your_mongo_database_password> --role readWriteAnyDatabase
List users:
atlas dbusers list
These credentials will be used in the MONGO_CLUSTER_URL
env var.
- Configure Network Access
Option 1: Allow access from anywhere (ease of use for development):
atlas accessList create "0.0.0.0/0" --type ipAddress --comment "Allow access from anywhere"
Option 2: Allow only your IP (recommended):
atlas accessList create --currentIp
To list current access list entries:
atlas accessList list
Important
For production, restrict network access to specific IPs.
- Create API Keys
Create API key with required permissions:
atlas organizations apiKeys create --desc "Full Access API Key for 'tabular-semantic-search' project" --role ORG_OWNER --role ORG_MEMBER --role ORG_GROUP_CREATOR --role ORG_READ_ONLY
List keys to get the public key:
atlas organizations apiKeys list
Set:
MONGO_API_PUBLIC_KEY
: Public key from the created API keyMONGO_API_PRIVATE_KEY
: Private key shown during creation (save it immediately)
Important
Save your API private key immediately after creation - it cannot be retrieved later.
- Setting Remaining Environment Variables
Set MONGO_PROJECT_ID
:
atlas projects list
Set MONGO_CLUSTER_URL
:
bash
atlas clusters connectionStrings describe free-cluster
Now set the environment variables as (without mongodb+srv://
) MONGO_CLUSTER_URL={YOUR_DATABASE_USER}:{YOUR_DATABASE_PASSWORD}@free-cluster.vhxy1.mongodb.net
, where the database user and password are the ones created at point 4.
- Create Database
Create the database which is already specified in MONGO_DATABASE_NAME
:
make create-mongodb-database
Important
If you are getting SSL handshake errors,
turn off your VPN or firewall or try using a different network.
Now go to MongoDB Atlas, navigate to Clusters → Browse Collections to verify that your database was created successfully.
Your final .env
file should have these MongoDB-related variables:
USE_MONGO_VECTOR_DB=True
MONGO_CLUSTER_URL=username:[email protected]
MONGO_DATABASE_NAME=your_database_name
MONGO_CLUSTER_NAME=free-cluster
MONGO_PROJECT_ID=your_project_id
MONGO_API_PUBLIC_KEY=your_public_key
MONGO_API_PRIVATE_KEY=your_private_key
- Final Thoughts
MongoDB Atlas can also be set up locally using Docker, but Superlinked isn't yet integrated with the local version: more on the local Mongo vector DB ←
Download and process the dataset sample:
make download-and-process-sample-dataset
We also support the complete dataset, but you need a powerful computer, good internet and patience to run everything on it:
make download-and-process-full-dataset
You should see this structure in your data
folder:
data/
├── processed_100_sample.jsonl
├── processed_300_sample.jsonl
├── processed_850_sample.jsonl
├── sample.json
└── sample.json.gz
Notebook | Description |
---|---|
Dataset exploration | Dive into the Amazon ESCI dataset |
Tabular semantic search with natural language queries demo | See Superlinked in action |
Text-to-SQL examples | Try LlamaIndex queries |
- Start it up:
make start-superlinked-server
FastAPI endpoints docs available at http://localhost:8080/docs
- From a different terminal, load your data:
make load-data
Go to MongoDB Atlas, navigate to Clusters → Browse Collections → tabular-semantic-search to verify that your vector database was populated successfully.
Note: Give it a few minutes before running the queries (~5 minutes)
- Try some queries:
make post-filter-query
make post-semantic-query
make similar-item-query
- Start the Streamlit UI:
make start-ui
Accessible at http://localhost:8501/
Important
If you are not getting any results when making queries from the CLI or Streamlit app, restart the Superlinked server.