Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endpoint to export all active URLs as CSV or plain text #1190

Open
Mr0grog opened this issue Feb 5, 2025 · 0 comments
Open

Add endpoint to export all active URLs as CSV or plain text #1190

Mr0grog opened this issue Feb 5, 2025 · 0 comments
Labels

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Feb 5, 2025

We keep having to do various operations involving exporting a list of active URLs. You can do this by repeatedly querying:

/api/v0/pages?active=true&chunk=<N>

…but you still have to iterate through the chunk and then extract the URL from each result.

For the extremely common case of just listing active URLs for seeding crawlers, working with analysts, etc., it would be useful to have a quick shortcut endpoint that can just stream a CSV or newline-delimited text file:

/api/v0/active_urls.csv
/api/v0/active_urls.txt

Or maybe:

/api/v0/pages.csv
/api/v0/pages.txt

…but I’m worried these might be confusing vs. the normal /api/v0/pages[.json] since they wouldn't obey the same chunking rules and have a much more stripped-down set of data (maybe the CSV would include the UUID? but otherwise these are just the current canonical URLs).

@Mr0grog Mr0grog added [priority-★★☆] API Changes to the public API enhancement labels Feb 5, 2025
@Mr0grog Mr0grog moved this to Inbox in Web Monitoring Feb 17, 2025
@Mr0grog Mr0grog moved this from Inbox to Backlog in Web Monitoring Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

1 participant