Data.gov archiving guidance? #36

chronick · 2025-01-30T16:54:01Z

This seems concerning: https://www.reddit.com/r/climate/comments/1idin45/the_us_governments_open_data_is_currently_being/

From the thread:

I just checked, it has a steady and big increase in datasets until Jan 21, 2025, at 307,854 datasets http://web.archive.org/web/20250120135355/https://data.gov/
Now it has lost 2,290 datasets in 9 days!

Look at this huge decrease on Jan 21, between 03:04:19 and 15:15:42 http://web.archive.org/web/20250120135355/https://data.gov/ http://web.archive.org/web/20250121233247/https://data.gov/

Drops from 307,854 to 306,012 datasets!!! It's been decreasing everyday and today it's at 305,564 data.gov

Are data.gov datasets being covered by the EOT archive? I don't see any specific info about these.

ldko · 2025-01-30T22:50:44Z

I believe @jcushman has been working on archiving the datasets from data.gov, and some of it will have been captured in the web crawling being done by Internet Archive, but I don't know how fully they have gotten it at this point.

jcushman · 2025-01-30T22:57:11Z

We posted a short blog post on this just now: https://lil.law.harvard.edu/blog/2025/01/30/preserving-public-u-s-federal-data/

Basically we are routinely capturing the metadata of the data.gov index itself, as well as a copy of each URL it points to, and we're figuring out an affordable way to make that searchable and clonable for data science. There are likely things being missed between the two efforts still -- anything that needs a deep crawl but either isn't on the EOT list or isn't generically crawlable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data.gov archiving guidance? #36

Data.gov archiving guidance? #36

chronick commented Jan 30, 2025

ldko commented Jan 30, 2025

jcushman commented Jan 30, 2025 •

edited

Loading

Data.gov archiving guidance? #36

Data.gov archiving guidance? #36

Comments

chronick commented Jan 30, 2025

ldko commented Jan 30, 2025

jcushman commented Jan 30, 2025 • edited Loading

jcushman commented Jan 30, 2025 •

edited

Loading