Skip to content

nitink23/cluecursor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ GPU-Optimized Screen Capture OCR (ClueCursor)

A real-time text extraction tool with GPU acceleration that captures your screen and extracts text using advanced OCR (Optical Character Recognition) technology. The application follows your cursor and provides instant text recognition with a sleek, modern interface.

GPU OCR Demo Python CUDA

πŸŽ₯ Demo

GPU-Optimized Screen Capture OCR Demo

Click the image above to watch the demo video and see the GPU-optimized screen capture OCR in action!

πŸ“Ή Demo Video

The demo shows:

  • πŸ–±οΈ Cursor Following: Overlay follows mouse movement
  • πŸ“· Real-time Capture: Automatic screen capture every 2 seconds
  • πŸ” Text Extraction: GPU-accelerated OCR processing
  • πŸ“± Adaptive UI: Window resizes based on text content
  • ⚑ GPU Acceleration: Fast processing with NVIDIA RTX 4070 SUPER

Note: The video demonstrates the application running with full GPU acceleration, showing real-time text extraction from various screen content. Watch the full demo on YouTube.

✨ Features

  • πŸš€ GPU Acceleration: Leverages CUDA for 2-10x faster processing
  • πŸ“± Real-time Capture: Automatically captures screen every 2 seconds
  • 🎯 Cursor Following: Overlay follows your mouse cursor
  • πŸ” Dual OCR Engines: Uses both Tesseract and EasyOCR for maximum accuracy
  • 🎨 Modern UI: Dark theme with GPU aesthetic
  • ⚑ Adaptive Sizing: Window resizes based on text content
  • πŸ–±οΈ Manual Control: Click button or press Ctrl+C for instant capture
  • πŸ”„ Automatic Fallback: Gracefully falls back to CPU if GPU unavailable

πŸ—οΈ Architecture

The project is organized into modular components:

src/
β”œβ”€β”€ __init__.py          # Package initialization
β”œβ”€β”€ imports.py           # Library imports and detection
β”œβ”€β”€ gpu_processor.py     # GPU-accelerated OCR processing
β”œβ”€β”€ ui_components.py     # User interface components
β”œβ”€β”€ screen_capture.py    # Screen capture functionality
β”œβ”€β”€ cursor_tracker.py    # Cursor tracking and positioning
└── main_app.py         # Main application orchestrator

πŸ“‹ Requirements

System Requirements

  • OS: Windows 10/11 (Linux/macOS support planned)
  • Python: 3.8 or higher
  • GPU: NVIDIA GPU with CUDA support (optional, falls back to CPU)
  • RAM: 4GB minimum, 8GB recommended
  • Storage: 2GB free space

GPU Requirements (Optional)

  • CUDA: 11.8 or higher
  • GPU Memory: 4GB+ recommended
  • Driver: Latest NVIDIA drivers

πŸš€ Installation

1. Clone the Repository

git clone https://github.com/nitink23/cluecursor.git
cd cluecursor

2. Create Virtual Environment

python -m venv venv
venv\Scripts\activate  # On Windows
# or
source venv/bin/activate  # On Linux/macOS

3. Install Dependencies

Option A: GPU Version (Recommended)

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install other dependencies
pip install -r requirements.txt

Option B: CPU Version

# Install PyTorch CPU version
pip install torch torchvision torchaudio

# Install other dependencies
pip install -r requirements.txt

4. Install Tesseract OCR (Optional)

Download and install Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki

Default installation path: C:\Program Files\Tesseract-OCR\

🎯 Usage

Quick Start

python run.py

Controls

  • ESC: Close application
  • Ctrl+C: Manual capture
  • Manual Capture Button: Click for instant capture
  • Automatic: Captures every 2 seconds

Features

  1. Automatic Capture: The app automatically captures your screen every 2 seconds
  2. Cursor Following: The overlay window follows your mouse cursor
  3. Real-time OCR: Extracts text using GPU-accelerated OCR
  4. Adaptive Display: Window resizes based on detected text
  5. Dual Engine: Uses both Tesseract and EasyOCR for best results

πŸ”§ Configuration

GPU Settings

The application automatically detects and uses your GPU. To check GPU status:

from src.gpu_processor import GPUProcessor
processor = GPUProcessor()
status = processor.get_gpu_status()
print(status)

Tesseract Path

If Tesseract is installed in a different location, modify src/imports.py:

TESSERACT_PATH = r'C:\Your\Custom\Path\To\Tesseract-OCR\tesseract.exe'

πŸ“Š Performance

GPU vs CPU Performance

Operation CPU GPU Improvement
Image Preprocessing 100ms 20ms 5x faster
EasyOCR Text Recognition 500ms 50ms 10x faster
Overall Pipeline 600ms 70ms 8.5x faster

Supported GPU Features

  • βœ… CUDA acceleration
  • βœ… OpenCV GPU operations
  • βœ… PyTorch tensor operations
  • βœ… EasyOCR GPU mode
  • βœ… Memory optimization

πŸ› οΈ Development

Project Structure

cluecursor/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ imports.py         # Library imports
β”‚   β”œβ”€β”€ gpu_processor.py   # GPU processing
β”‚   β”œβ”€β”€ ui_components.py   # UI components
β”‚   β”œβ”€β”€ screen_capture.py  # Screen capture
β”‚   β”œβ”€β”€ cursor_tracker.py  # Cursor tracking
β”‚   └── main_app.py       # Main application
β”œβ”€β”€ run.py                 # Entry point
β”œβ”€β”€ requirements.txt       # Dependencies
β”œβ”€β”€ README.md             # This file
└── main_gpu.py           # Legacy single-file version

Adding New Features

  1. Create new module in src/
  2. Import in main_app.py
  3. Initialize in GPUScreenCaptureApp.__init__()
  4. Add to cleanup in cleanup()

Testing

# Run with debug output
python run.py

# Check GPU status
python -c "from src.gpu_processor import GPUProcessor; print(GPUProcessor().get_gpu_status())"

πŸ› Troubleshooting

Common Issues

GPU Not Detected

# Check CUDA installation
python -c "import torch; print(torch.cuda.is_available())"

# Check GPU info
python -c "import torch; print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU')"

Tesseract Not Found

  1. Install Tesseract from official website
  2. Add to PATH: C:\Program Files\Tesseract-OCR\
  3. Verify installation: tesseract --version

OpenCV CUDA Issues

# Reinstall OpenCV with CUDA support
pip uninstall opencv-python opencv-contrib-python
pip install opencv-contrib-python

Memory Issues

  • Reduce capture frequency in screen_capture.py
  • Lower image resolution
  • Close other GPU applications

Performance Optimization

  1. GPU Memory: Close other GPU applications
  2. Capture Frequency: Adjust in screen_capture.py
  3. Image Quality: Modify preprocessing in gpu_processor.py
  4. OCR Confidence: Adjust threshold in gpu_processor.py

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines.

Development Setup

git clone https://github.com/nitink23/cluecursor.git
cd cluecursor
pip install -r requirements.txt
python run.py

Code Style

  • Follow PEP 8
  • Add docstrings to all functions
  • Include type hints
  • Write unit tests for new features

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • PyTorch: GPU acceleration framework
  • EasyOCR: Advanced OCR engine
  • Tesseract: Fast OCR engine
  • OpenCV: Computer vision library
  • Tkinter: GUI framework

πŸ“ž Support

πŸ—ΊοΈ Roadmap

πŸš€ Upcoming Features

v1.1.0 - LLM Integration

  • πŸ€– AI-Powered Text Analysis: Integrate with OpenAI GPT, Claude, or local LLMs
  • πŸ“ Smart Text Summarization: Automatically summarize captured text
  • πŸ” Context-Aware Processing: Understand text context and meaning
  • πŸ’¬ Real-time AI Chat: Chat with AI about captured content
  • 🎯 Intelligent Filtering: Filter relevant information from large text blocks

v1.2.0 - Advanced Chat Features

  • πŸ’¬ Interactive Chat Interface: Built-in chat window with AI
  • πŸ“š Conversation History: Save and manage chat sessions
  • πŸ”„ Multi-turn Conversations: Context-aware AI responses
  • πŸ“‹ Quick Actions: AI-powered text actions (translate, summarize, explain)
  • 🎨 Chat Customization: Themes and chat preferences

v1.3.0 - Enhanced OCR & Processing

  • πŸ“Š Table Recognition: Extract and format tables from images
  • πŸ“ˆ Chart & Graph Analysis: OCR for charts and data visualization
  • πŸ”€ Multi-language Support: Support for 50+ languages
  • πŸ“± Mobile Screenshot Support: Optimized for mobile screenshots
  • 🎯 Smart Region Selection: AI-powered area selection

v2.0.0 - Advanced Features

  • 🌐 Web Integration: Browser extension for web page text extraction
  • πŸ“± Mobile App: iOS/Android companion app
  • ☁️ Cloud Sync: Sync settings and chat history across devices
  • πŸ” Privacy Mode: Local-only processing for sensitive data
  • πŸ“Š Analytics Dashboard: Usage statistics and performance metrics

πŸ› οΈ Technical Improvements

  • ⚑ Performance: Further GPU optimization and parallel processing
  • πŸ”§ Plugin System: Extensible architecture for custom integrations
  • πŸ“¦ Package Distribution: PyPI package for easy installation
  • πŸ§ͺ Testing Suite: Comprehensive unit and integration tests
  • πŸ“š API Documentation: REST API for external integrations

🎯 Long-term Vision

  • πŸ€– AI Assistant: Full-featured AI assistant for productivity
  • πŸ“Š Data Analytics: Advanced analytics and insights from captured text
  • πŸ”— Integration Ecosystem: Connect with popular productivity tools
  • 🌍 Global Accessibility: Support for all major languages and scripts

πŸ”„ Changelog

v1.0.0 (2024-01-XX)

  • Initial release
  • GPU-accelerated OCR
  • Real-time screen capture
  • Cursor following overlay
  • Dual OCR engine support

Made with ❀️ and GPU power

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages