Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specific time ranges for get_memento #167

Open
Mr0grog opened this issue Feb 17, 2025 · 0 comments
Open

Allow specific time ranges for get_memento #167

Mr0grog opened this issue Feb 17, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Feb 17, 2025

The WaybackClient.get_memento() method currently allows you to specify a target_window (default: 24 hours), which, when used with the exact=False argument, indicates how distant from the requested time a returned memento can be. That is, if you ask for a memento of a given URL at 2025-01-20T00:00:00Z with a target_window of 24 hours, you might get back a memento from as early as 2025-01-19T00:00:00Z or as late as 2025-01-21T00:00:00Z. This behavior is designed around the way the Wayback Machine functions by default (if it couldn’t play back the requested memento, or there was no memento at the requested time, it will automatically redirect to the closest-in-time memento available).

This can be pretty useful, but a lot of the time you might really want to request something like “the latest memento within 24 hours before a given time” or the same, but after the given time, rather than just within 24 hours of a given time. You can currently do that yourself with the WaybackClient.search(), but it rapidly gets complicated when mementos of redirects are involved. (At EDGI, this need is really common especially around major US Presidential administration changes — we are often comparing something from before a given time to something after. For example, NHTSA’s 2022 FARS data was recently altered, and if you want to download an archived copy to compare to the current copy, you need to be careful about where these automatic closest-in-time redirects take you — you might think you’re downloading the version before the alteration, but wind up with one from after the alteration.)

To solve this, it might be pretty useful to allow target_window to be some kind of time interval or range object with defined start and end points instead of just a number of seconds (or timedelta as in #55). This might be as simple as a tuple of two datetimes, more complicated like a tuple of (datetime, timedelta), or as fancy as a specialized time interval object (which would let you specify stuff like open- or closed-endedness).

I don’t believe Python has any built-in representation for datetime intervals or ranges like this (sadly, range only works with integer-like types). In terms of prior art here, Pandas’ Interval supports Timestamp objects; Pendulum’s Interval also fits this situation, but doesn’t support closed- vs. open- endedness. We probably don’t want to depend on either of these since it adds a lot of weight here. But maybe we should be compatible with them as argument values?

It might be best to keep this simple by supporting the following:

  • A tuple of (datetime, datetime) describing a start and end.
  • A tuple of (datetime, timedelta) describing an anchor time and a delta from it.
  • A simple int (seconds) or timedelta that describes a window in both directions as we do today. These would essentially be a shortcut for (now - target_window, now + target_window).

That said, I think the internal implementation would have to change dramatically to support any of this, since this kind of functionality isn’t built-in. If a memento response redirects, we’d need to call search to find a valid memento to try if the default redirect went outside the window. Today, we just check to see if it’s in the window and, if now, raise an exception.

@Mr0grog Mr0grog added the enhancement New feature or request label Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

1 participant