Skip to content

Add endpoint to export all active URLs as CSV or plain text #1190

Open
@Mr0grog

Description

@Mr0grog

We keep having to do various operations involving exporting a list of active URLs. You can do this by repeatedly querying:

/api/v0/pages?active=true&chunk=<N>

…but you still have to iterate through the chunk and then extract the URL from each result.

For the extremely common case of just listing active URLs for seeding crawlers, working with analysts, etc., it would be useful to have a quick shortcut endpoint that can just stream a CSV or newline-delimited text file:

/api/v0/active_urls.csv
/api/v0/active_urls.txt

Or maybe:

/api/v0/pages.csv
/api/v0/pages.txt

…but I’m worried these might be confusing vs. the normal /api/v0/pages[.json] since they wouldn't obey the same chunking rules and have a much more stripped-down set of data (maybe the CSV would include the UUID? but otherwise these are just the current canonical URLs).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions