You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We keep having to do various operations involving exporting a list of active URLs. You can do this by repeatedly querying:
/api/v0/pages?active=true&chunk=<N>
…but you still have to iterate through the chunk and then extract the URL from each result.
For the extremely common case of just listing active URLs for seeding crawlers, working with analysts, etc., it would be useful to have a quick shortcut endpoint that can just stream a CSV or newline-delimited text file:
/api/v0/active_urls.csv
/api/v0/active_urls.txt
Or maybe:
/api/v0/pages.csv
/api/v0/pages.txt
…but I’m worried these might be confusing vs. the normal /api/v0/pages[.json] since they wouldn't obey the same chunking rules and have a much more stripped-down set of data (maybe the CSV would include the UUID? but otherwise these are just the current canonical URLs).
The text was updated successfully, but these errors were encountered:
We keep having to do various operations involving exporting a list of active URLs. You can do this by repeatedly querying:
…but you still have to iterate through the chunk and then extract the URL from each result.
For the extremely common case of just listing active URLs for seeding crawlers, working with analysts, etc., it would be useful to have a quick shortcut endpoint that can just stream a CSV or newline-delimited text file:
Or maybe:
…but I’m worried these might be confusing vs. the normal
/api/v0/pages[.json]
since they wouldn't obey the same chunking rules and have a much more stripped-down set of data (maybe the CSV would include the UUID? but otherwise these are just the current canonical URLs).The text was updated successfully, but these errors were encountered: