LSS — Local Semantic Search

Hybrid search over local files. BM25 + embeddings + Reciprocal Rank Fusion. Real-time file watching. Runs offline or with OpenAI.

lss "authentication JWT"              # search current directory
lss "deploy kubernetes" ~/Projects    # search a specific path
lss "rate limiting" --json            # machine-readable output

0.91 NDCG@10 on our golden set. Beats ColBERTv2, Voyage-2, and Cohere embed-v3 on BEIR SciFact. See EVALS.md.

Install

# One-liner (auto-detects pipx/uv/pip)
curl -fsSL https://raw.githubusercontent.com/kortix-ai/lss/main/install.sh | bash

Or directly:

pipx install local-semantic-search       # recommended
pip install local-semantic-search
uv tool install local-semantic-search

Embedding provider

Default: OpenAI — if OPENAI_API_KEY is set, lss uses it automatically.

export OPENAI_API_KEY="sk-..."   # add to ~/.zshrc or ~/.bashrc

Offline alternative:

pip install 'local-semantic-search[local]'
lss config provider local

Uses bge-small-en-v1.5 (384d, ~125 MB). No API key, no network, no cost. Within 0.3% of OpenAI on quality, 8x faster.

Usage

Search

lss "query"                              # current directory
lss "query" ~/Documents                  # explicit path
lss "auth JWT" "deploy k8s"              # multiple queries
lss "config" --json                      # JSON output
lss "error" -k 5                         # top 5

# Filters (applied without re-indexing)
lss "auth" -e .py -e .ts                 # only these extensions
lss "config" -E .json -E .yaml           # exclude extensions
lss "user data" -x '\d{4}-\d{2}-\d{2}'  # exclude chunks matching regex
lss "auth" -e .py -x "test_"            # combine filters

First search auto-indexes. Subsequent searches use cached embeddings.

Index & manage

lss index ~/Projects                     # index without searching
lss status                               # DB stats, provider, config
lss ls                                   # list indexed files
lss sweep --clear-all                    # wipe database

lss watch add ~/Documents                # for lss-sync daemon
lss exclude add "*.log"                  # glob exclusion
lss include add .rst                     # custom extension
lss config provider local                # switch provider
lss eval                                 # run quality benchmarks
lss update                               # check for updates

File watcher

lss-sync                                 # watch configured paths
lss-sync --watch ~/Projects              # watch specific path

How it works

query → BM25 (FTS5 + custom rescore) ─┐
      → Embedding (OpenAI or local)  ──┤→ RRF → boosts → MMR → results

BM25 — FTS5 retrieves candidates, custom re-scorer ranks with TF saturation + IDF (k1=1.2, b=0.75)
Embedding — OpenAI text-embedding-3-small (256d) or local bge-small-en-v1.5 (384d), cached permanently
RRF — Reciprocal Rank Fusion merges both ranked lists
Boosts — Jaccard overlap, phrase matching, digit co-mention
MMR — Maximal Marginal Relevance for diversity

See ARCHITECTURE.md for full pipeline detail.

Supported formats

Category	Extensions
Code	`.py`, `.js`, `.ts`, `.go`, `.rs`, `.java`, `.c`, `.cpp`, `.rb`, `.php`, `.swift`, `.kt`, +40 more
Markup	`.md`, `.rst`, `.tex`, `.html`, `.xml`, `.yaml`, `.json`, `.toml`
Documents	`.pdf`, `.docx`, `.xlsx`, `.pptx`, `.eml`
Data	`.csv`, `.jsonl`, `.tsv`, SQLite `.db/.sqlite/.sqlite3`

Extraction via pdfminer.six, python-docx, openpyxl, python-pptx, beautifulsoup4, and built-in SQLite row extraction. Unknown extensions skipped by default; add with lss include add .ext.

Search quality

Method	NDCG@10	MRR@10	Provider
hybrid	0.914	1.000	OpenAI
hybrid	0.911	1.000	Local
bm25	0.885	0.988	—

BEIR SciFact (5,183 docs, 300 queries):

System	NDCG@10
lss hybrid	0.729
Cohere embed-v3	0.717
Voyage-2	0.713
ColBERTv2	0.693
BM25 (Anserini)	0.665

Full results: EVALS.md

Performance

Scenario	OpenAI	Local
Cold search (no cache)	400-800 ms	50-200 ms
Warm (embeddings cached)	100-200 ms	5-50 ms
Hot (all in LRU)	50-150 ms	2-10 ms

Configuration

Variable	Default	Description
`OPENAI_API_KEY`	—	Required for OpenAI provider
`LSS_PROVIDER`	auto-detect	`openai` or `local`
`LSS_DIR`	`~/.lss`	Data directory
`BM25_K1` / `BM25_B`	1.2 / 0.75	BM25 tuning
`NO_COLOR`	unset	Disable ANSI colors

Config file: ~/.lss/config.json

Programmatic use

from semantic_search import semantic_search
from lss_store import ingest_many, discover_files

all_files, new_files, _ = discover_files("/path/to/project")
ingest_many(new_files)
results = semantic_search("/path/to/project", ["JWT authentication"])

Tests

361+ tests covering extraction, filtering, chunking, CLI, e2e, file watching, providers, and search quality.

python -m pytest tests/ -x -q

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
assets		assets
dev		dev
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BUILD_LINUX.md		BUILD_LINUX.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.simple		Dockerfile.simple
EVALS.md		EVALS.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
__main__.py		__main__.py
build-linux.sh		build-linux.sh
dockerfile		dockerfile
install.sh		install.sh
lss_cli.py		lss_cli.py
lss_config.py		lss_config.py
lss_extract.py		lss_extract.py
lss_store.py		lss_store.py
lss_sync.py		lss_sync.py
pyproject.toml		pyproject.toml
semantic_search.py		semantic_search.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSS — Local Semantic Search

Install

Embedding provider

Usage

Search

Index & manage

File watcher

How it works

Supported formats

Search quality

Performance

Configuration

Programmatic use

Tests

License

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LSS — Local Semantic Search

Install

Embedding provider

Usage

Search

Index & manage

File watcher

How it works

Supported formats

Search quality

Performance

Configuration

Programmatic use

Tests

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages