Hybrid search over local files. BM25 + embeddings + Reciprocal Rank Fusion. Real-time file watching. Runs offline or with OpenAI.
lss "authentication JWT" # search current directory
lss "deploy kubernetes" ~/Projects # search a specific path
lss "rate limiting" --json # machine-readable output
0.91 NDCG@10 on our golden set. Beats ColBERTv2, Voyage-2, and Cohere embed-v3 on BEIR SciFact. See EVALS.md.
# One-liner (auto-detects pipx/uv/pip)
curl -fsSL https://raw.githubusercontent.com/kortix-ai/lss/main/install.sh | bashOr directly:
pipx install local-semantic-search # recommended
pip install local-semantic-search
uv tool install local-semantic-searchDefault: OpenAI — if OPENAI_API_KEY is set, lss uses it automatically.
export OPENAI_API_KEY="sk-..." # add to ~/.zshrc or ~/.bashrcOffline alternative:
pip install 'local-semantic-search[local]'
lss config provider localUses bge-small-en-v1.5 (384d, ~125 MB). No API key, no network, no cost. Within 0.3% of OpenAI on quality, 8x faster.
lss "query" # current directory
lss "query" ~/Documents # explicit path
lss "auth JWT" "deploy k8s" # multiple queries
lss "config" --json # JSON output
lss "error" -k 5 # top 5
# Filters (applied without re-indexing)
lss "auth" -e .py -e .ts # only these extensions
lss "config" -E .json -E .yaml # exclude extensions
lss "user data" -x '\d{4}-\d{2}-\d{2}' # exclude chunks matching regex
lss "auth" -e .py -x "test_" # combine filtersFirst search auto-indexes. Subsequent searches use cached embeddings.
lss index ~/Projects # index without searching
lss status # DB stats, provider, config
lss ls # list indexed files
lss sweep --clear-all # wipe database
lss watch add ~/Documents # for lss-sync daemon
lss exclude add "*.log" # glob exclusion
lss include add .rst # custom extension
lss config provider local # switch provider
lss eval # run quality benchmarks
lss update # check for updateslss-sync # watch configured paths
lss-sync --watch ~/Projects # watch specific pathquery → BM25 (FTS5 + custom rescore) ─┐
→ Embedding (OpenAI or local) ──┤→ RRF → boosts → MMR → results
- BM25 — FTS5 retrieves candidates, custom re-scorer ranks with TF saturation + IDF (k1=1.2, b=0.75)
- Embedding — OpenAI
text-embedding-3-small(256d) or localbge-small-en-v1.5(384d), cached permanently - RRF — Reciprocal Rank Fusion merges both ranked lists
- Boosts — Jaccard overlap, phrase matching, digit co-mention
- MMR — Maximal Marginal Relevance for diversity
See ARCHITECTURE.md for full pipeline detail.
| Category | Extensions |
|---|---|
| Code | .py, .js, .ts, .go, .rs, .java, .c, .cpp, .rb, .php, .swift, .kt, +40 more |
| Markup | .md, .rst, .tex, .html, .xml, .yaml, .json, .toml |
| Documents | .pdf, .docx, .xlsx, .pptx, .eml |
| Data | .csv, .jsonl, .tsv, SQLite .db/.sqlite/.sqlite3 |
Extraction via pdfminer.six, python-docx, openpyxl, python-pptx, beautifulsoup4, and built-in SQLite row extraction. Unknown extensions skipped by default; add with lss include add .ext.
| Method | NDCG@10 | MRR@10 | Provider |
|---|---|---|---|
| hybrid | 0.914 | 1.000 | OpenAI |
| hybrid | 0.911 | 1.000 | Local |
| bm25 | 0.885 | 0.988 | — |
BEIR SciFact (5,183 docs, 300 queries):
| System | NDCG@10 |
|---|---|
| lss hybrid | 0.729 |
| Cohere embed-v3 | 0.717 |
| Voyage-2 | 0.713 |
| ColBERTv2 | 0.693 |
| BM25 (Anserini) | 0.665 |
Full results: EVALS.md
| Scenario | OpenAI | Local |
|---|---|---|
| Cold search (no cache) | 400-800 ms | 50-200 ms |
| Warm (embeddings cached) | 100-200 ms | 5-50 ms |
| Hot (all in LRU) | 50-150 ms | 2-10 ms |
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
— | Required for OpenAI provider |
LSS_PROVIDER |
auto-detect | openai or local |
LSS_DIR |
~/.lss |
Data directory |
BM25_K1 / BM25_B |
1.2 / 0.75 | BM25 tuning |
NO_COLOR |
unset | Disable ANSI colors |
Config file: ~/.lss/config.json
from semantic_search import semantic_search
from lss_store import ingest_many, discover_files
all_files, new_files, _ = discover_files("/path/to/project")
ingest_many(new_files)
results = semantic_search("/path/to/project", ["JWT authentication"])361+ tests covering extraction, filtering, chunking, CLI, e2e, file watching, providers, and search quality.
python -m pytest tests/ -x -qApache-2.0
