This project provides a simple web application to view WACZ (Web ARChiveZip) files using the replayweb.page library. Its primary purpose is to host static web archives that can be used as stable testing environments for browser-based AI agents, particularly those developed with the HUD SDK.
- Displays WACZ files directly in the browser.
- Uses clean URLs for accessing different archives (e.g.,
/my-archiveloadsarchives/my-archive.wacz). - Supports a
?page=<url-encoded-page-in-archive>query parameter to open a specific page within an archive. - Supports a
?debug=truequery parameter to show the fullreplayweb.pageUI for debugging. - Archives can have a default
startPagedefined inarchives/archive_list.json. - Includes a GitHub Actions workflow for easy deployment to GitHub Pages, making archives web-accessible.
Web archives provide consistent, offline-first environments for testing web-based agents. For example, if you have an archive named my-test-site deployed here, it will be accessible at https://hud-evals.github.io/page-archives/my-test-site. Here's how you might use it in a hud.Task:
from hud.task import Task
login_task = Task(
prompt="Log into the website using username 'testuser' and password 'password123'.",
gym="hud-browser", # Or your relevant browser-based gym
setup=(
"goto", "https://hud-evals.github.io/page-archives/my-test-site"
),
evaluate=(
"page_contains", "Welcome, testuser!"
)
)
# You can then run this task with your agent:
# from hud import run_job, YourAgent
# await run_job(YourAgent, [login_task], "my-archived-site-login-test")This allows you to create reliable test scenarios for your agents against specific, unchanging versions of web pages.
To create the .wacz files that this viewer uses, you can use the ArchiveWeb.page browser extension or desktop application. It allows you to interactively capture websites as you browse.
- Full Guide: For detailed instructions on creating archives, please refer to the official ArchiveWeb.page User Guide.
- Basic Steps with ArchiveWeb.page extension:
- Install the ArchiveWeb.page extension (Chromium-based browsers).
- Open the extension and create a new collection.
- Start an archiving session.
- Browse the web pages you want to capture.
- Stop the session.
- Download your collection. It will typically download as a
.waczfile.
For automated, large-scale crawling, consider Browsertrix.
- Place WACZ Files:
- Put your
.waczfiles into thearchives/directory. - For example, if your archive is named
my-cool-site.wacz, place it inarchives/my-cool-site.wacz.
- Put your
- Update
archives/archive_list.json:- This file provides a list of your archives for the homepage and can define a default starting page for each.
- Edit
archives/archive_list.jsonand add an entry for each of your archives. Thenamefield must match the WACZ filename without the.waczextension. - Example
archives/archive_list.jsonentry:{ "archives": [ { "name": "my-cool-site", "displayName": "My Cool Site Archive", "startPage": "https://my-cool-site.com/index.html" // Optional: URL of start page within this WACZ }, { "name": "another-one", "displayName": "Another Great Archive" // No startPage, will use archive's default } // ... other archives ] } - The
displayNameis what appears in the list on the homepage. - The
startPageis optional. If provided, accessing/my-cool-sitewill attempt to open this specific page from the archive. If omitted (or if a?page=URL parameter is used), the archive's default page or the?page=parameter will be used.
- Prerequisites:
- Node.js and npm installed.
- Clone the repository:
git clone https://github.com/hud-evals/page-archives.git cd page-archives - Install dependencies:
npm install
- Run the development server:
This will start an Express.js server (usually at
npm run dev
http://localhost:3000) that handles the clean URLs.
- Homepage (List of Archives):
http://localhost:3000/ - Specific Archive (using its default or
startPage):http://localhost:3000/my-cool-site - Specific Page within an Archive:
http://localhost:3000/my-cool-site?page=https%3A%2F%2Fmy-cool-site.com%2Fspecific-article.html(ensure the page URL is URL-encoded). - Debug Mode (shows ReplayWeb.page UI):
http://localhost:3000/my-cool-site?debug=true
Enjoy creating and viewing your web archives for robust agent testing!