You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The WaybackClient.get_memento() method currently allows you to specify a target_window (default: 24 hours), which, when used with the exact=False argument, indicates how distant from the requested time a returned memento can be. That is, if you ask for a memento of a given URL at 2025-01-20T00:00:00Z with a target_window of 24 hours, you might get back a memento from as early as 2025-01-19T00:00:00Z or as late as 2025-01-21T00:00:00Z. This behavior is designed around the way the Wayback Machine functions by default (if it couldn’t play back the requested memento, or there was no memento at the requested time, it will automatically redirect to the closest-in-time memento available).
This can be pretty useful, but a lot of the time you might really want to request something like “the latest memento within 24 hours before a given time” or the same, but after the given time, rather than just within 24 hours of a given time. You can currently do that yourself with the WaybackClient.search(), but it rapidly gets complicated when mementos of redirects are involved. (At EDGI, this need is really common especially around major US Presidential administration changes — we are often comparing something from before a given time to something after. For example, NHTSA’s 2022 FARS data was recently altered, and if you want to download an archived copy to compare to the current copy, you need to be careful about where these automatic closest-in-time redirects take you — you might think you’re downloading the version before the alteration, but wind up with one from after the alteration.)
To solve this, it might be pretty useful to allow target_window to be some kind of time interval or range object with defined start and end points instead of just a number of seconds (or timedelta as in #55). This might be as simple as a tuple of two datetimes, more complicated like a tuple of (datetime, timedelta), or as fancy as a specialized time interval object (which would let you specify stuff like open- or closed-endedness).
I don’t believe Python has any built-in representation for datetime intervals or ranges like this (sadly, range only works with integer-like types). In terms of prior art here, Pandas’ Interval supports Timestamp objects; Pendulum’s Interval also fits this situation, but doesn’t support closed- vs. open- endedness. We probably don’t want to depend on either of these since it adds a lot of weight here. But maybe we should be compatible with them as argument values?
It might be best to keep this simple by supporting the following:
A tuple of (datetime, datetime) describing a start and end.
A tuple of (datetime, timedelta) describing an anchor time and a delta from it.
A simple int (seconds) or timedelta that describes a window in both directions as we do today. These would essentially be a shortcut for (now - target_window, now + target_window).
That said, I think the internal implementation would have to change dramatically to support any of this, since this kind of functionality isn’t built-in. If a memento response redirects, we’d need to call search to find a valid memento to try if the default redirect went outside the window. Today, we just check to see if it’s in the window and, if now, raise an exception.
The text was updated successfully, but these errors were encountered:
The
WaybackClient.get_memento()
method currently allows you to specify atarget_window
(default: 24 hours), which, when used with theexact=False
argument, indicates how distant from the requested time a returned memento can be. That is, if you ask for a memento of a given URL at 2025-01-20T00:00:00Z with atarget_window
of 24 hours, you might get back a memento from as early as 2025-01-19T00:00:00Z or as late as 2025-01-21T00:00:00Z. This behavior is designed around the way the Wayback Machine functions by default (if it couldn’t play back the requested memento, or there was no memento at the requested time, it will automatically redirect to the closest-in-time memento available).This can be pretty useful, but a lot of the time you might really want to request something like “the latest memento within 24 hours before a given time” or the same, but after the given time, rather than just within 24 hours of a given time. You can currently do that yourself with the
WaybackClient.search()
, but it rapidly gets complicated when mementos of redirects are involved. (At EDGI, this need is really common especially around major US Presidential administration changes — we are often comparing something from before a given time to something after. For example, NHTSA’s 2022 FARS data was recently altered, and if you want to download an archived copy to compare to the current copy, you need to be careful about where these automatic closest-in-time redirects take you — you might think you’re downloading the version before the alteration, but wind up with one from after the alteration.)To solve this, it might be pretty useful to allow
target_window
to be some kind of time interval or range object with defined start and end points instead of just a number of seconds (ortimedelta
as in #55). This might be as simple as a tuple of two datetimes, more complicated like a tuple of(datetime, timedelta)
, or as fancy as a specialized time interval object (which would let you specify stuff like open- or closed-endedness).I don’t believe Python has any built-in representation for datetime intervals or ranges like this (sadly,
range
only works with integer-like types). In terms of prior art here, Pandas’Interval
supports Timestamp objects; Pendulum’sInterval
also fits this situation, but doesn’t support closed- vs. open- endedness. We probably don’t want to depend on either of these since it adds a lot of weight here. But maybe we should be compatible with them as argument values?It might be best to keep this simple by supporting the following:
(datetime, datetime)
describing a start and end.(datetime, timedelta)
describing an anchor time and a delta from it.int
(seconds) ortimedelta
that describes a window in both directions as we do today. These would essentially be a shortcut for(now - target_window, now + target_window)
.That said, I think the internal implementation would have to change dramatically to support any of this, since this kind of functionality isn’t built-in. If a memento response redirects, we’d need to call
search
to find a valid memento to try if the default redirect went outside the window. Today, we just check to see if it’s in the window and, if now, raise an exception.The text was updated successfully, but these errors were encountered: