Skip to content
This repository was archived by the owner on May 7, 2026. It is now read-only.

Latest commit

 

History

History
174 lines (124 loc) · 4.41 KB

File metadata and controls

174 lines (124 loc) · 4.41 KB

Setup Guide

This guide will help you set up and deploy Paper Pulse on GitHub Pages.

Prerequisites

Step 1: Repository Setup

  1. Push this code to your GitHub repository:
    git add .
    git commit -m "Initial commit: Paper Pulse"
    git push origin master

Step 2: Configure GitHub Secrets

  1. Go to your repository on GitHub
  2. Click on SettingsSecrets and variablesActions
  3. Click New repository secret
  4. Add the following secret:
    • Name: DASHSCOPE_API_KEY (or MODELSCOPE_API_KEY)
    • Value: Your DashScope/ModelScope API key

Step 3: Enable GitHub Actions

  1. Go to the Actions tab in your repository
  2. If prompted, click I understand my workflows, go ahead and enable them
  3. The workflow should now be enabled

Step 4: Enable GitHub Pages

  1. Go to SettingsPages
  2. Under Source, select:
    • Source: Deploy from a branch
    • Branch: master (or main if that's your default branch)
    • Folder: / (root)
  3. Click Save
  4. Wait a few minutes for the site to deploy
  5. Your site will be available at: https://<username>.github.io/<repo-name>/

Step 5: First Run

Option A: Manual Trigger (Recommended for first run)

  1. Go to Actions tab
  2. Click on Fetch Papers workflow
  3. Click Run workflowRun workflow
  4. Wait for the workflow to complete (5-10 minutes depending on number of papers)

Option B: Wait for Automatic Run

The workflow runs automatically every day at 00:00 UTC.

Step 6: Verify

  1. After the workflow completes, check that data/papers.json has been created
  2. Visit your GitHub Pages URL
  3. You should see the papers displayed in card format with bilingual summaries

Troubleshooting

Workflow fails with "API key not set"

  • Make sure you've added the secret in Step 2
  • The secret name must be DASHSCOPE_API_KEY or MODELSCOPE_API_KEY

No papers showing up

  • Check the workflow logs to see if papers were fetched
  • Papers must match your keyword filters in keywords.txt
  • Only papers from the configured time period are kept (default: 7 days)

GitHub Pages shows 404

  • Make sure you selected / (root) as the folder in Pages settings
  • Wait a few minutes after enabling Pages for DNS to propagate
  • Check that index.html exists in your repository root

Rate limiting issues

  • The default delays (3s for arXiv, 1s for summarization) should prevent rate limiting
  • If you still hit limits, increase delays in config.toml

Customization

Change keyword filters

Edit keywords.txt in the repository root to customize which papers are included.

Format:

  • Each line is an OR condition
  • Multiple words on the same line use AND logic (all must match)
  • Lines starting with # are comments
  • Empty lines are ignored

Examples:

# Match papers with "transformer" OR "attention"
transformer
attention

# Match papers with BOTH "neural" AND "backdoor" (both words must appear)
neural backdoor

# Match papers with "federated learning" (phrase)
federated learning

A paper will be included if it matches ANY line in the file.

Change configuration settings

Edit config.toml to customize:

Retention period:

[general]
days_back = 7  # Keep papers from last 7 days

arXiv categories:

[fetchers.arxiv]
categories = ["cs.CR", "cs.AI", "cs.LG", "cs.CL"]  # Customize categories

AI model:

[summarizer]
model = "qwen-plus"  # Options: qwen-turbo, qwen-plus, qwen-max
max_tokens = 1500     # For bilingual summaries

Rate limits:

[fetchers.arxiv]
delay = 3.0  # Delay between arXiv requests

[summarizer]
rate_limit_delay = 1.0  # Delay between summarization calls

See CONFIG_GUIDE.md for more detailed configuration options.

Change workflow schedule

Edit .github/workflows/fetch-papers.yml, line 5:

- cron: '0 0 * * *'  # Daily at 00:00 UTC

Use crontab.guru to generate different schedules.

Manual Local Testing

Test the fetcher locally before deploying:

# Install dependencies
pip install -r requirements.txt

# Set API key
export DASHSCOPE_API_KEY="your-key-here"

# Run the script
python scripts/main.py

This will create data/papers.json which you can inspect. Open index.html in a browser to view the results.

License

MIT