buzzcauldron
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 21 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 12 additions & 1 deletion b/‎README.md‎
Lines changed: 12 additions & 1 deletion
@@ -10,5 +10,6 @@ build/
 venv/
 env/
 output/
+output_*/
 *.log
 .DS_Store
@@ -6,6 +6,27 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
 ## [Unreleased]
 
+## [0.3.0] - 2025-02-04
+
+### Added
+- Clear progress output: "Found: X PDFs, Y images", "→ Downloading N assets", "[i/N]" per item.
+- Image extraction: `link[rel=preload][as=image]`, CSS `background-image: url()`, extension-less paths (e.g. `/image/`, `/thumb/`), more lazy-load attrs (`data-zoom-src`, `data-hires`, etc.).
+- GUI: Stop button, status parsing for mapping/download progress, `[i/N]` display.
+- Skip HEAD when no `--min-image-size` / `--max-image-size` (faster image downloads).
+- Auto retry crawl with cross-domain if same-domain returns no results.
+- File skip: check canonical paths and skip already-scraped images/PDFs/text.
+- Last URL persisted on GUI relaunch.
+- Multiprocessing semaphore warning suppression; tqdm unit spacing fix.
+
+### Changed
+- Default delay: 1.0s → 0.5s.
+- Workers: `SAFE_ASSET_WORKERS` 3→5, `SAFE_HEAD_WORKERS` 2→4; parallel HEAD threshold 8→4.
+- Crawl follows links by default; "Follow links" checkbox clarified.
+- README: min-image-size tip; `output_*/` in .gitignore.
+
+### Removed
+- GUI progress bar and spinner (replaced by clearer status text).
+
 ## [0.2.0] - 2025-02-04
 
 - GUI (tkinter) with file-type selector, image size filter, and Open folder button.
 
@@ -32,7 +32,7 @@ This installs the package in editable mode and registers the `scrape` and `scrap
 scrape --url https://example.com/page [--out-dir output] [--delay 1] [--crawl] [--max-depth 2] [--same-domain-only]
 ```
 
-Filter images by file size (uses HEAD `Content-Length`): `--min-image-size 50k` and/or `--max-image-size 5m` (suffixes `k`/`m` for KB/MB).
+Filter images by file size (uses HEAD `Content-Length`): `--min-image-size 50k` and/or `--max-image-size 5m` (suffixes `k`/`m` for KB/MB). Use a low or zero minimum to capture thumbnails; a high minimum (e.g. `1m`) skips smaller images.
 
 Or open the simple GUI:
 
@@ -76,6 +76,17 @@ For **crawl** mode, the scraper auto-detects CPU count and caps parallel workers
 scrape --url https://example.com --crawl --workers 2
 ```
 
+### Iterations and auto timeout (single-page)
+
+On 403 or slow responses, the scraper retries automatically:
+
+- **Iterations:** Single-page runs retry up to `--max-iterations` (default 3). Each iteration uses a longer delay and timeout; if the first attempt gets 403, the next iteration automatically uses the browser (`--js`) when Playwright is installed.
+- **Auto timeout:** Per-request timeout scales with each retry (30s → 60s → 120s, cap 120s). Suggested default is 120s max; override base with a custom timeout in code if needed.
+
+```bash
+scrape --url https://strict.site/page --max-iterations 5
+```
+
 ## Building a standalone bundle
 
 To build a standalone folder with the CLI and GUI (no Python required on the target machine):