🎓 ScholarSift is an intelligent web scraper that automatically discovers, analyzes, and organizes scholarship opportunities from trusted sources worldwide. It features AI-powered summarization, dynamic filtering, and automated notifications.
- Smart Web Scraping: Handles both static and JavaScript-rendered pages with adaptive parsing
- AI Summarization: Uses OpenAI or local transformers to summarize scholarship descriptions
- Dynamic Filtering: Filter by country, degree level, GPA requirements, funding type, and deadlines
- Database Storage: SQLite database with JSON/CSV export capabilities
- Web Dashboard: Beautiful Flask-based interface for browsing scholarships
- Notification System: Email and Telegram alerts for new opportunities and urgent deadlines
- Ethical Scraping: Respects robots.txt and implements rate limiting
- Discovery Mode: Automatically finds new scholarship sources
- Python 3.8+
- pip package manager
-
Clone and Setup
git clone <repository-url> cd ScholarSift pip install -r requirements.txt
-
Install Playwright Browsers
playwright install
-
Environment Configuration Create a
.envfile in the project root:# Database DATABASE_URL=sqlite:///scholarships.db # API Keys (Optional - for AI features) OPENAI_API_KEY=your_openai_api_key_here # Email Notifications (Optional) SMTP_SERVER=smtp.gmail.com SMTP_PORT=587 SMTP_USERNAME=your_email@gmail.com SMTP_PASSWORD=your_app_password # Telegram Notifications (Optional) TELEGRAM_BOT_TOKEN=your_telegram_bot_token # Flask SECRET_KEY=your-secret-key-here FLASK_DEBUG=True
-
Initialize Database
python main.py
-
Start Scraping
# Scrape from seed sources python main.py --scrape # Scrape specific URLs python main.py --scrape --urls "https://www.daad.de/en/" "https://www.chevening.org/scholarships/"
-
Start Web Dashboard
cd dashboard python app.pyVisit
http://localhost:5000to access the web interface.
ScholarSift/
├── main.py # Main scraper orchestrator
├── scraper.py # Core scraping logic with Playwright
├── database.py # SQLite database operations
├── summarizer.py # AI-powered text summarization
├── notifications.py # Email and Telegram notifications
├── config.py # Configuration management
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── dashboard/ # Flask web application
│ ├── app.py # Flask application
│ ├── static/ # CSS, JS, images
│ └── templates/ # HTML templates
│ └── index.html # Main dashboard
├── data/ # Exported data files
│ └── scholarships.json
└── README.md # This file
# Basic scraping
python main.py --scrape
# Advanced scraping options
python main.py --scrape --discovery # Enable discovery mode
python main.py --scrape --urls URL1 URL2 # Scrape specific URLs
# Export data
python main.py --export json --filter-country "Germany"
python main.py --export json --filter-degree masters --filter-gpa 3.5
# Filter options
--filter-country # Filter by country
--filter-degree # undergraduate, masters, phd
--filter-gpa # Minimum GPA requirementThe Flask dashboard provides:
- Browse Scholarships: View all scholarships in a responsive grid
- Advanced Filtering: Filter by multiple criteria simultaneously
- Scholarship Details: Click any scholarship for full details
- Subscription Management: Subscribe to email/Telegram updates
- Real-time Search: Fast filtering and search capabilities
GET /api/scholarships- Get scholarships with optional filtersGET /api/scholarships/<id>- Get specific scholarship detailsGET /api/countries- Get available countriesGET /api/funding-types- Get available funding typesPOST /api/subscribe- Subscribe to notificationsPOST /api/summarize- Summarize text using AI
Filter scholarships by destination country (e.g., USA, UK, Germany, Canada)
- Undergraduate
- Master's
- PhD
- Postdoc
- Fully Funded
- Partial
- Stipend
- Other
Filter by minimum GPA requirement
- Next 30 days
- Next 2 months
- Next 3 months
ScholarSift uses AI to generate concise summaries of scholarship descriptions:
- OpenAI Integration: Uses GPT-3.5-turbo for high-quality summaries (requires API key)
- Local Fallback: Uses BART transformer model when OpenAI is unavailable
- Smart Extraction: Focuses on key benefits, eligibility, and application process
Automatically finds new scholarship sources by:
- Analyzing seed websites for related links
- Using search APIs to find scholarship pages
- Keyword-based content analysis
- Weekly digest of new scholarships
- Urgent deadline alerts (closing within 2 weeks)
- Customizable preferences
- Real-time alerts via Telegram bot
- Weekly summaries
- Urgent deadline notifications
# config.py
DEFAULT_USER_AGENT = 'ScholarSift/1.0 (Educational Research Bot)'
RESPECT_ROBOTS_TXT = True
REQUEST_DELAY = 2 # seconds between requests
MAX_RETRIES = 3The scraper starts with these trusted sources:
- DAAD (Germany)
- Chevening Scholarships (UK)
- Mastercard Foundation
- Commonwealth Scholarships
- Opportunities for Africans
- Scholarship Roar
ScholarSift implements ethical scraping practices:
- Respects robots.txt: Checks and follows website rules
- Rate Limiting: Implements delays between requests
- User Agent: Identifies as educational research bot
- Error Handling: Gracefully handles failures and timeouts
- Data Privacy: Only collects publicly available information
Set up cron jobs for automatic updates:
# Weekly scraping (Sundays at 9 AM)
0 9 * * 0 cd /path/to/ScholarSift && python main.py --scrape
# Daily urgent checks (6 AM)
0 6 * * * cd /path/to/ScholarSift && python -c "from notifications import NotificationManager; import asyncio; asyncio.run(NotificationManager().send_urgent_notifications())"# Send weekly digest (Mondays at 8 AM)
0 8 * * 1 cd /path/to/ScholarSift && python -c "from notifications import NotificationManager; import asyncio; asyncio.run(NotificationManager().send_weekly_digest())"-
Playwright Installation
playwright install chromium
-
Database Issues
rm scholarships.db python main.py
-
API Rate Limits
- Add delays in config.py
- Use rotating user agents
- Implement exponential backoff
-
Memory Issues
- Process scholarships in batches
- Use pagination for large datasets
Enable detailed logging:
import logging
logging.basicConfig(level=logging.DEBUG)- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Playwright, BeautifulSoup, Flask, and SQLAlchemy
- AI features powered by OpenAI and Hugging Face Transformers
- Icons from Font Awesome
- Bootstrap for responsive design
For support and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the configuration options
ScholarSift - Making scholarship discovery smarter and more accessible! 🎓