-
HTTPS URLs: Changed from
http://tohttps://for better security and reliability -
Better Proxy Services: Updated with working CORS proxies:
- AllOrigins (reliable, fast)
- CORSProxy.io (new, optimized for APIs)
- CodeTabs (stable backup)
- CORS.sh (additional fallback)
-
Improved Logging: Better console logs to track which method works
-
Longer Timeouts: Increased proxy timeout from 4s to 5s
-
All Proxies Attempted: Removed the
.slice(0, 2)limitation - tries all 4 proxies
- IP Blocking: Wayback Machine may block Vercel's server IPs
- Rate Limiting: Too many requests from the same IP
- Geographic Restrictions: Some regions may have limited access
- API Downtime: Wayback Machine service maintenance
- CORS Policies: Browser/server CORS restrictions
1. Try Direct Access (HTTPS)
├─ Success? → Return data
└─ Fail → Try Proxy 1
2. Try AllOrigins Proxy
├─ Success? → Return data
└─ Fail → Try Proxy 2
3. Try CORSProxy.io
├─ Success? → Return data
└─ Fail → Try Proxy 3
4. Try CodeTabs Proxy
├─ Success? → Return data
└─ Fail → Try Proxy 4
5. Try CORS.sh Proxy
├─ Success? → Return data
└─ Fail → Return error with debug info
- Availability Check: 5 seconds (non-blocking)
- Direct CDX Call: 8 seconds
- Each Proxy: 5 seconds
- Total Worst Case: ~25 seconds (under 60s limit)
# Test the API endpoint
curl "http://localhost:5173/api/wayback?url=example.com&mode=urls"
# Check logs
# Look for "Trying proxy: [name]" messages-
Deploy your changes
-
Check Vercel Function Logs:
- Go to Vercel Dashboard
- Select your project
- Click "Logs" tab
- Filter by
/api/wayback
-
Look for these log messages:
Trying proxy: AllOrigins Trying proxy: CORSProxy.io Success with proxy: [name]
Some domains work better than others:
Usually Work Well:
example.com(small, simple)github.com(popular, well-archived)wikipedia.org(heavily archived)
May Have Issues:
- Very new domains (< 1 year old)
- Domains with anti-bot protection
- Regional/geo-restricted sites
- Adult content sites (filtered by some proxies)
vercel logs --since 10mLook for:
- Which proxies are being tried
- Which one succeeds/fails
- Timeout vs connection errors
Try accessing Wayback directly through a proxy:
# Test AllOrigins
curl "https://api.allorigins.win/raw?url=https://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&fl=original&collapse=urlkey"
# Test CORSProxy.io
curl "https://corsproxy.io/?https://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&fl=original&collapse=urlkey"If you're being rate-limited:
Solution A: Add Delay
// In Dashboard.tsx, before calling the API
await new Promise(resolve => setTimeout(resolve, 2000)); // 2s delaySolution B: Cache Results
// Store results in localStorage
const cacheKey = `wayback_${domain}`;
const cached = localStorage.getItem(cacheKey);
if (cached) return JSON.parse(cached);If proxies continue to fail, fetch directly from the browser:
// This bypasses your API and calls Wayback directly
const response = await fetch(
`https://web.archive.org/cdx/search/cdx?url=${domain}/*&output=json&fl=original&collapse=urlkey`,
{ mode: 'cors' }
);Note: This may fail due to CORS, but worth trying.
Register for an API key at: https://archive.org/account/s3.php
Then modify the API to use authenticated requests.
Instead of full CDX queries, just check if snapshots exist:
const response = await fetch(
`https://archive.org/wayback/available?url=${domain}`
);This is more reliable but provides less data.
Some services offer JSONP support which bypasses CORS:
// Use a JSONP library or callback patternFor heavy usage:
- Set up a separate background worker (Vercel Cron, etc.)
- Pre-fetch and cache Wayback data
- Serve from your database/cache
Add a health check endpoint:
// api/health.ts
export default async function handler(req, res) {
const results = await Promise.allSettled(
PROXY_SERVICES.map(proxy =>
fetch(`${proxy.url}https://example.com`, { timeout: 3000 })
)
);
return res.json({
proxies: PROXY_SERVICES.map((p, i) => ({
name: p.name,
status: results[i].status
}))
});
}Add analytics to track which proxies work most often:
// In your wayback.ts, add counters
const stats = {
direct: { success: 0, fail: 0 },
allorigins: { success: 0, fail: 0 },
// etc...
};- Limit Result Size: Add
&limit=100to CDX queries - Use Collapse: Already using
collapse=urlkeyfor deduplication - Cache Results: Implement caching layer (Redis/Upstash)
- Parallel Requests: Fetch multiple domains concurrently
- Progressive Enhancement: Show partial results while loading
If issues persist:
- Check Wayback Machine status: https://archive.org/status
- Join Internet Archive forums: https://archive.org/about/contact.php
- Consider alternative services:
- Common Crawl (https://commoncrawl.org/)
- ArchiveBox (https://archivebox.io/)
- WebRecorder (https://webrecorder.net/)
✅ Using HTTPS for better security
✅ 4 different proxy services as fallbacks
✅ Better error messages with debug info
✅ Improved timeout handling
✅ Comprehensive logging
The API should now work more reliably! If a specific domain fails, the error message will include debug information to help diagnose the issue.