A Python tool that automatically tracks the online availability of IEEE VIS 2025 conference papers across multiple academic sources (arXiv, Semantic Scholar). The tool scrapes paper information directly from the VIS 2025 website and maintains an incremental index showing when papers become available online with timestamps and abstracts.
New user? Follow these 4 simple steps:
- Install dependencies:
pip3 install beautifulsoup4 requests - Get papers:
python3 vis_paper_tracker.py --scrape - Start tracking:
python3 vis_paper_tracker.py --add-papers vis2025_papers.json - Search & see results:
python3 vis_paper_tracker.py --update --report
Daily updates: Just run python3 vis_paper_tracker.py --update --report
- Automatic Web Scraping: Fetches papers directly from IEEE VIS 2025 website with proper encoding handling
- Multi-Source Search: Searches arXiv and Semantic Scholar APIs
- Paper Classification: Detects Full Papers and Short Papers (posters are excluded)
- Session Tracking: Captures conference session information
- Progress Logging: Optional detailed logging with statistics
- Data Export: JSON index and CSV files for analysis
- Analysis Tools: Built-in author analysis with statistics and visualizations
- Incremental Updates: Tracks discovery dates and maintains search history
- Robust Progress Saving: Saves progress every 10 papers to prevent data loss
- Data Cleaning: Built-in author name formatting and encoding issue fixes
- Python 3.7 or higher
- Internet connection
- Download or clone this repository
- Install required dependencies:
pip3 install beautifulsoup4 requestsFollow these steps in order:
# Step 1: Scrape papers from IEEE VIS 2025 website
python3 vis_paper_tracker.py --scrape
# Step 2: Add papers to tracking system (use the file created in step 1)
python3 vis_paper_tracker.py --add-papers vis2025_papers.json
# Step 3: Search for papers online (this takes time due to rate limiting)
python3 vis_paper_tracker.py --update
# Step 4: See results
python3 vis_paper_tracker.py --reportOnce you've done the initial setup, just run:
# Check for newly available papers
python3 vis_paper_tracker.py --update --reportThe tool uses a two-stage data system:
vis2025_papers.json- Static list of papers from VIS 2025 website- Created by:
--scrapecommand - Contains: titles, authors, sessions, paper types
- Used once to populate the tracking system
paper_tracking_data/paper_index.json- Live database with search resultspaper_tracking_data/paper_availability.csv- Spreadsheet export for analysis- Updated by:
--updatecommand - Contains: search status, URLs, abstracts, discovery dates
- Log files - Detailed activity logs (if you use
--log-file)
--scrape → vis2025_papers.json → --add-papers → paper_tracking_data/ → --update → results
↑ ↑ ↑ ↑ ↑ ↑
One-time Seed data Populate Live tracking Search Reports
# Get papers from VIS website and start tracking
python3 vis_paper_tracker.py --scrape
python3 vis_paper_tracker.py --add-papers vis2025_papers.json# Search for papers and see results
python3 vis_paper_tracker.py --update --report
# With detailed logging
python3 vis_paper_tracker.py --update --report --log-file daily_check.log# Fix formatting issues in author names
python3 vis_paper_tracker.py --clean-authors
# Re-scrape if new papers are added to VIS website
python3 vis_paper_tracker.py --scrape
# Note: This overwrites vis2025_papers.json with fresh data# Use custom data directory
python3 vis_paper_tracker.py --data-dir my_tracking_data --update
# Enable debug logging
python3 vis_paper_tracker.py --update --debug
# Just generate a report (no searching)
python3 vis_paper_tracker.py --report{
"title": "Paper Title",
"authors": "Author1, Author2, Author3",
"session": "Session Name",
"award": "Award Type",
"paper_type": "Full Paper"
}- Full Paper: Main conference papers
- Short Paper: Shorter research contributions
- Unknown: When type cannot be determined
Note: Posters are automatically excluded from tracking as they are typically not published as citable papers.
vis2025_papers.json: Scraped paper list from VIS 2025 websitepaper_tracking_data/paper_index.json: Persistent tracking databasepaper_tracking_data/paper_availability.csv: Export for data analysis- Log files (optional): Detailed status reports and statistics
After setup, your project directory will look like this:
vis-paper-tracker/
├── vis_paper_tracker.py # Main script
├── analyze_authors.py # Author analysis and visualization script
├── vis2025_papers.json # Seed data (created by --scrape)
├── paper_tracking_data/ # Live tracking database
│ ├── paper_index.json # Detailed search results
│ └── paper_availability.csv # Spreadsheet export
├── README.md # This documentation
├── requirements.txt # Python dependencies
├── LICENSE # MIT license
└── .gitignore # Git ignore rules
- Web Scraping: Fetches papers from IEEE VIS 2025 website with session names and paper types (excludes posters)
- Author Cleaning: Removes double commas and normalizes author lists
- Multi-Source Search: Searches arXiv first, then Semantic Scholar for each paper
- Fuzzy Matching: Uses 70% threshold word overlap for title matching
- Discovery Tracking: Records first discovery dates and maintains search history
- Progress Saving: Automatically saves progress every 10 papers to prevent data loss
- Abstract Storage: Keeps the longest abstract found across sources
- Progress Logging: Optional detailed logging with paper type and session statistics
The tracker includes analysis scripts to explore patterns in the collected data:
Analyze authorship patterns with the included analyze_authors.py script:
# Basic author analysis
python3 analyze_authors.py
# Show distribution statistics
python3 analyze_authors.py --stats-only
# Generate histogram (requires matplotlib)
pip3 install matplotlib numpy
python3 analyze_authors.py --histogram
# Top 10 authors with interactive plot
python3 analyze_authors.py --top-n 10 --histogram --show-plotSample Output:
TOP AUTHORS BY PAPER COUNT
- Kwan-Liu Ma: 8 papers (7 full, 1 short) - 24 collaborators
- Huamin Qu: 8 papers (7 full, 1 short) - 35 collaborators
- Cindy Xiong Bearfield: 8 papers (5 full, 3 short) - 31 collaborators
PAPER COUNT DISTRIBUTION
- 1 paper: 886 authors (82.1%)
- 2 papers: 131 authors (12.1%)
- 3+ papers: 62 authors (5.8%)
Outputs:
- Console report with top authors and collaboration statistics
author_analysis.csv- Detailed spreadsheet with all authorsauthor_histogram.png- Visual distribution chart
# Analysis script options
python3 analyze_authors.py --help
# Key parameters:
--top-n 20 # Number of top authors to show
--histogram # Generate visual histogram
--stats-only # Show statistics without full report
--csv filename.csv # Custom CSV output filename
--hist-file plot.png # Custom histogram filename
--show-plot # Display plot interactivelyThe tracking data is stored in standard JSON/CSV formats, making it easy to:
- Import into R, Python pandas, or Excel for custom analysis
- Create visualizations with your preferred tools
- Analyze collaboration networks, temporal patterns, or subject areas
- Compare productivity across institutions or research groups
Error: "No such file or directory"
# Make sure you use the correct filename
ls *.json # See what files exist
python3 vis_paper_tracker.py --add-papers vis2025_papers.json # Use existing fileError: "ModuleNotFoundError"
# Install missing dependencies
pip3 install beautifulsoup4 requestsSlow performance during updates
- This is normal! The tool waits 1.5 seconds between API calls to respect rate limits
- A full update of 290 papers takes ~7-8 minutes
- Progress is saved every 10 papers, so interruptions won't lose much work
- You'll see progress counters like "Checking (45/290): ..." showing current status
No papers found
- Check your internet connection
- Try running with debug logging:
--debug - Some papers may not be available on arXiv or Semantic Scholar yet
Double commas in author names
# Clean up existing data
python3 vis_paper_tracker.py --clean-authorsGarbled characters in paper titles (like âThey Arenât Built For Meâ)
- This is from encoding issues in older scraped data
- Re-scrape to get clean data:
python3 vis_paper_tracker.py --scrape
rm -rf paper_tracking_data/
python3 vis_paper_tracker.py --add-papers vis2025_papers.jsonPapers show "Added" but report shows 0 papers
- This was a bug that has been fixed
- If you encounter this, update to the latest version of the code
If you encounter issues:
- Run with debug logging:
python3 vis_paper_tracker.py --debug --update - Check the log file for detailed error messages
- Verify your Python version:
python3 --version(needs 3.7+)
- arXiv API: No authentication required, 15-second timeout
- Semantic Scholar: No API key needed, returns top 5 results
- IEEE VIS 2025: Direct HTML scraping, no authentication needed
- Rate Limiting: 1.5 second delay between API requests
Contributions are welcome! Please feel free to submit issues and pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
Claudio Silva & Claude
- IEEE VIS 2025 conference organizers
- arXiv and Semantic Scholar for their open APIs