- Queue webpage(s) to ingest
- Scrape HTML
- Cleanup HTML and convert to markdown
- Send cleaned up markdown content to an LLM (
mixtral-8x7b
via Groq) to summarize - Vectorized summarized content (OpenAI
text-embedding-3-small
) - Store webpage embeddings in
pgvetor
- with summarized content + webpage url as metadata
- Webpage ingestion is handled asynchronously by a FIFO queue
- The queue takes in a
user_id
and aurl
to scrape as payload
- The queue takes in a
- Failures are handled by re-processing the same message with exponential backoff up to a configurable
maxRetries
number of times.- Any error thrown by the handler function is consider a failure
The scraping flow supports bespoke scraping logic for different hostnames
Code Reference -> app/services/webscraper
- Submit a query
- Vectorize user query (OpenAI
text-embedding-3-small
) - Find the most similar* document from the corpus of documents ingested by the user
*
= Highest cosine similarity threshold
- Send user query + retrieved context to LLM (OpenAI
gpt-4o
) to generate a response
LLM responses are semantically cached using an in-memory FAISS vector store such that unnecessary LLM requests are avoided for very similar queries...
Code References
- Document retrieval is biased towards longer documents
- Puppeteer instance disconnects from the browser instance on error
- Need to build better mechanisms for connection recovery
- Integrate an LRU cache in the in-memory vector store to ensure a constant size
- Add support for concurrency
- Implement a DLQ for finally failed messages
- Decouple each step in the processing pipeline to increase resilience
- Allow each step to fail and recover independently (with different retry policies since the nature of failure for different steps will be different)
- A DAG dependency graph like structure for the processing flow
- Better (re)ranking (with something like Cohere Command R)
- More structured processing
- Better chunking and text splitting
- More structured retrieval system
- Convert a user question to a more structured and expressive query object
- Dynamically create a prompt with this object to improve precision and recall
- Convert a user question to a more structured and expressive query object
- Add Observability
- Cite specific text chunks from source documents
- Structured Model Responses
- Using something like instructor-js
- Stream model responses
~
Setup .env from the default values in .env.development
cp .env.development .env
Install node dependencies
pnpm i
Start dependencies in docker
make start
Start the development server
pnpm run dev
Or, start the service alongside all dependencies in docker
make start-all
The service will start listening on port 4001
Migrations files are tracked in the /migrations directory
make migrate [...args]
up [N] Apply all or N up migrations
down [N] Apply all or N down migrations
create NAME Create a set of timestamped up/down migrations titled NAME
Migrations are internally handled with https://github.com/golang-migrate/migrate
This project was bootstrapped from github.com/SwarnimWalavalkar/webServiceStarter