A powerful Telegram chat history search tool that supports vector search and semantic matching. Based on OpenAI's semantic vector technology, it makes your Telegram message retrieval smarter and more precise.
- Using UserBot comes with the risk of account suspension, please use with caution.
- Due to the project being in a rapid iteration phase, database incompatibility may occur. It's recommended to back up your data regularly.
- Real-time Monitoring: Monitor messages from specified conversations in real-time using the watch command
- Conversation Synchronization: Sync folders and conversation information using the sync command
- History Import: Support importing HTML format history exported from Telegram
- Data Export: Export message records for backup and migration
- Vector Search: Implement semantic search based on OpenAI's text embedding model
- Multi-dimensional Filtering: Filter by time range, message type, conversation, and other dimensions
- Similarity Ranking: Sort search results based on semantic similarity
- Multimedia Support: Support for various media types including text, images, videos, documents, stickers, etc.
- Media Preview: Preview media content directly in search results
- Node.js >= 20
- PostgreSQL >= 15 (with pgvector extension)
- OpenAI API Key
- Telegram API credentials (API ID and API Hash)
- Clone the repository:
git clone https://github.com/GramSearch/telegram-search.git
cd telegram-search
- Install dependencies:
pnpm install
pnpm run stub
- Configure environment:
cp config/config.example.yaml config/config.yaml
- Start the database container:
docker compose up -d
- Initialize the database:
pnpm run db:migrate
- Start services:
# Start backend service
pnpm run dev:server
# Start frontend interface
pnpm run dev:frontend
Visit http://localhost:3333
to open the search interface.
# Sync folders and conversation information
pnpm run dev:cli sync
# Monitor specified conversations
pnpm run dev:cli watch
- Import history:
# Import HTML format message records
pnpm run dev:cli import -p <path_to_html_files>
# Skip vector embedding
pnpm run dev:cli import -p <path_to_html_files> --no-embedding
- Export messages:
# Export messages (supports database format)
pnpm run dev:cli export
# Process vector embeddings for all messages
pnpm run dev:cli embed
# Start the search service
pnpm run dev:cli search
This project uses OpenAI's text-embedding-3-small model to convert text into 1536-dimensional vectors and calculates semantic similarity using cosine similarity. The main implementation process:
- Convert message text to vector representations via the EmbeddingService
- Store and retrieve vector data using PostgreSQL's pgvector extension
- Calculate cosine similarity between input queries and stored messages during queries
- Sort results based on similarity and return the most relevant messages
// Vector search example
const queryEmbedding = await embedding.generateEmbeddings([query])
const results = await findSimilarMessages(queryEmbedding[0], options)
Use gram.js to interact with the Telegram API to collect and synchronize messages:
- Get conversation lists and message history using the Telegram API
- Process and format message content
- Generate vector representations of messages and save to the database
- Periodically sync and update message content changes
PostgreSQL database is used for storage with the following main table structures:
messages
: Store message content, metadata, and vector representationschats
: Store conversation informationfolders
: Store folder information and configurations
Utilize partitioned tables and appropriate indexes to optimize query performance:
- Partition by chat_id to improve query performance
- Use ivfflat index to accelerate vector search
- Use full-text index to optimize keyword search
We plan to develop a flexible Agent framework that supports:
- Multi-model Integration: Connect to various LLM models, including OpenAI, Claude, locally deployed models, etc.
- Agent Pipeline: Build complex Agent collaboration processes to split and process complex tasks
- Custom Agent Capabilities: Allow users to define specialized Agents for specific tasks
Provide deeper chat record analysis capabilities based on vector databases and large language models:
- Conversation Summary Generation: Automatically summarize long conversations and extract key information
- Topic Clustering: Identify and categorize main topics and discussion points in conversations
- Knowledge Graph Construction: Extract entities and relationships from conversations to build knowledge networks
- User Personality Analysis: Analyze users' expression styles, emotional tendencies, and interest preferences based on message content
- Social Relationship Network: Visualize interaction relationships and intimacy between users in groups
- Emotional Trend Tracking: Analyze emotional change trends in conversations and identify important emotional turning points
- Timeline View: Display conversation development in a timeline
- Topic Heat Map: Visualize changes in discussion topic heat over different periods
- Keyword Cloud: Dynamically display high-frequency keywords in conversations
These plans will be implemented gradually and continuously optimized based on user feedback. We look forward to developing Telegram Search into a powerful tool that integrates data mining, knowledge management, and social analysis.
MIT License © 2025