|
| 1 | +Purpose |
| 2 | +------- |
| 3 | +This file gives concise, actionable guidance for AI coding agents working on the `webinfo` Go module. |
| 4 | + |
| 5 | +**What this project does**: Extracts metadata (title, description, canonical, image, etc.) from web pages and provides utilities to fetch and save representative images. |
| 6 | + |
| 7 | +Quick entry points |
| 8 | +------------------ |
| 9 | +- **Primary package**: `webinfo` — key files: `fetch.go` (core `Fetch` function), `webinfo.go` (`Webinfo` struct and `DownloadImage`), `errs.go` (error sentinel values), `fetch_test.go` (behavioral tests). |
| 10 | +- **Go module**: `go 1.25` (see `go.mod`). |
| 11 | + |
| 12 | +Developer workflows |
| 13 | +------------------- |
| 14 | +- Run full CI/test workflow using the Taskfile (recommended if `task` is installed): |
| 15 | + - `task test` — runs `go mod verify`, `go test -shuffle on ./...`, `govulncheck`, and `golangci-lint-v2` as configured in `Taskfile.yml`. |
| 16 | +- Quick test: `go test ./...` (useful during fast iteration). |
| 17 | +- Prepare module: `go mod tidy -v -go=1.25` (mirrors `prepare` in `Taskfile.yml`). |
| 18 | + |
| 19 | +Project-specific conventions and patterns |
| 20 | +---------------------------------------- |
| 21 | +- Error handling: uses `github.com/goark/errs`. Prefer `errs.Wrap(err, errs.WithContext("key", val))` for context-rich errors and `errs.Join` when combining close errors in `defer`. |
| 22 | +- HTTP fetching: uses `github.com/goark/fetch`. Typical pattern: |
| 23 | + - Parse URL with `fetch.URL(...)`. |
| 24 | + - Use `fetch.New(...).GetWithContext(ctx, parsed, fetch.WithRequestHeaderSet("User-Agent", ua))`. |
| 25 | +- Default User-Agent: `getUserAgent("")` returns a dummy UA string. Functions accept a `userAgent` param but fall back to this default. |
| 26 | +- Encoding: `Fetch` peeks the first 1024 bytes and uses `charset.DetermineEncoding` and `encoding.GetEncoding(name)` to decode response bodies before HTML parsing — preserve this approach when touching parsing logic. |
| 27 | +- HTML parsing: `goquery` is used to select head elements and meta tags. Extraction precedence is explicit in `fetch.go` (title → `twitter:title`/`og:title`, description → `twitter:description`/`og:description`, image → `twitter:image`/`og:image`). Follow this precedence in code changes or tests. |
| 28 | +- Image download (`DownloadImage` in `webinfo.go`): |
| 29 | + - Determines extension from URL path, `Content-Type` header, sniffing (up to 512 bytes), then fallback to `.img`. |
| 30 | + - If URL has no filename, `temporary` is forced true and `os.CreateTemp(destDir, "webinfo-image-*"+ext)` is used. |
| 31 | + - When sniffing bytes, the code prepends the read bytes back into the stream with `io.MultiReader` so the full image is written. |
| 32 | + |
| 33 | +Tests and examples |
| 34 | +------------------ |
| 35 | +- Tests use `net/http/httptest` for deterministic responses (encoding tests use `golang.org/x/text/encoding/japanese`). Inspect `fetch_test.go` for examples of: |
| 36 | + - Redirect handling and validation of `Location`. |
| 37 | + - Encoding tests for Shift_JIS and ISO-2022-JP. |
| 38 | + - Verifying `User-Agent` header usage. |
| 39 | +- Example usage patterns to follow when adding code or tests: |
| 40 | + - Fetch: `info, err := Fetch(ctx, "https://example.com", "")` — empty UA uses the default. |
| 41 | + - Download image: `outPath, err := w.DownloadImage(ctx, "images", true)` |
| 42 | + |
| 43 | +External dependencies & integration points |
| 44 | +---------------------------------------- |
| 45 | +- Key dependencies in `go.mod`: `github.com/goark/fetch`, `github.com/goark/errs`, `github.com/PuerkitoBio/goquery`, `golang.org/x/text` (encodings). |
| 46 | +- The `Taskfile.yml` runs additional tools: `govulncheck`, `golangci-lint-v2`, and (optionally) `nancy` via `depm` — keep CI tool invocations in sync when adding dependencies. |
| 47 | + |
| 48 | +When modifying public APIs |
| 49 | +------------------------- |
| 50 | +- Maintain existing error-wrapping conventions (`errs.Wrap`, `errs.WithContext`). |
| 51 | +- Preserve encoding detection behavior and the 1024-byte peek in `Fetch` unless a clear, tested performance reason exists. |
| 52 | +- Preserve `DownloadImage`'s extension-detection order and the behavior of `temporary` vs permanent files. |
| 53 | + |
| 54 | +Where to look next (high-value files) |
| 55 | +------------------------------------- |
| 56 | +- `fetch.go` — how pages are fetched, decoded and parsed. |
| 57 | +- `webinfo.go` — `Webinfo` type and `DownloadImage` implementation. |
| 58 | +- `fetch_test.go` — canonical tests and examples you should mirror for new behaviors. |
| 59 | +- `errs.go` and `go.mod` — error constants and dependency hints. |
| 60 | +- `Taskfile.yml` — canonical developer/test/lint workflow. |
| 61 | + |
| 62 | +If anything above is unclear or you want more examples (small patches, test templates, or a CI-safe refactor suggestion), tell me which area to expand and I will iterate. |
0 commit comments