chore: agent-readiness pass + alerts.log writer + daemon→run rename by altaywtf · Pull Request #31 · uinaf/healthd

altaywtf · 2026-04-25T22:48:59Z

Summary

Session-long readiness pass on healthd. Three substantive changes plus cleanup: (1) wire up the alerts.log writer the TUI was already reading from but nothing was producing, (2) rename healthd daemon run → healthd run and internal/daemon → internal/loop (lifecycle is process-compose's job now, the old naming was misleading), (3) add a pre-push verifier hook and a per-package coverage gate so weak new packages can't slip in. Doc drift fixed across the board.

Changes

feat(alertlog) — new internal/alertlog package, single source of truth for the log path + format. loop.Run calls alertlog.Append on every transition regardless of notifier config; best-effort with stderr error logging. TUI's DefaultAlertsLogPath delegates so reader/writer can't drift.
refactor — internal/daemon → internal/loop, RunLoop → Run, cmd/daemon.go → cmd/run.go. Top-level healthd run replaces healthd daemon run. Breaking change — supervisor configs need updating.
chore(verify) — added per-package coverage gate (default 70%) with sanity check that fails loudly if go test output format ever shifts. Table-driven tests for the parser (9 cases). Status indicators standardized to ✓/✗.
chore — .git-hooks/pre-push running go run ./cmd/verify; .github/pull_request_template.md.
ci — deleted .github/workflows/e2e-host-macos.yml (was running tests against e2e/host/... which was deleted in 63a396a; would have failed every nightly run).
docs — README, AGENTS.md, ARCHITECTURE.md, operator skill all reconciled. Removed stale daemon install/status/logs/uninstall references, LaunchAgent paragraph, make install (no Makefile), docs/testing.md (deleted in 0669eb8).

Validation

go run ./cmd/verify passes locally — total 83.7%, all packages ≥ 70%
Pre-push hook tested directly (runs verifier, exits 0)
New behavior covered by tests — alertlog round-trips through TUI parser; daemon coverage test asserts [crit] failing line lands in alerts.log; per-package parser has 9 unit tests
Manual smoke: built binary, ran healthd run against a temp config with one passing + one failing check, confirmed alerts.log got a real RFC3339 line and healthd status rendered it back
CLI surface confirmed: healthd --help no longer lists daemon; healthd daemon run errors out with "unknown command"; healthd run exists at top level
Independent /verify + 4-persona /review pass (general, tests, silent-failures, cleanup) — verdict ship-it after parser hardening + emoji + Go-version fixes

Risks

Breaking change for anyone invoking healthd daemon run directly. Update process-compose / systemd / launchd configs to healthd run.
alerts.log write failures (disk full, EACCES) currently log to stderr and continue — proper escalation is a deferred follow-up.
TestRunAdditionalBranches is a real-clock test (45ms ticker window); the new alerts.log assertion increases work in that window. Hasn't flaked locally but worth watching in CI. Splitting + clock-injection is a deferred follow-up.

Linked Issues

None.

Note

Medium Risk
Medium risk due to a breaking CLI rename (healthd daemon run → healthd run) that requires supervisor config updates, plus new alerts.log writing and config validation that can change runtime behavior/fail previously-accepted configs.

Overview
This PR wires the runtime loop to persist transition events to ~/.local/state/healthd/alerts.log via a new internal/alertlog writer (shared with the TUI), and adds config validation to reject check name/group values that would break alerts parsing.

It renames the foreground daemon command from healthd daemon run to top-level healthd run and renames the underlying package/API from internal/daemon.RunLoop to internal/loop.Run, updating tests and docs accordingly.

Developer/CI hygiene is tightened by adding a pre-push hook and PR template, enhancing cmd/verify with a per-package coverage gate (and tests for its parser), and removing the obsolete macOS host e2e workflow.

^{Reviewed by Cursor Bugbot for commit 2b13992. Bugbot is set up for automated code reviews on this repo. Configure here.}

The workflow ran `go test ./e2e/host/...` against a path deleted in 63a396a. Every nightly cron run would fail. Drop it; the underlying daemon-host test scaffold is gone and re-adding it is out of scope.

- New .git-hooks/pre-push runs `go run ./cmd/verify` (fmt + lint + test + 80% coverage gate) before push. Activate per-clone with `git config core.hooksPath .git-hooks`. - AGENTS.md: bump documented Go version to match .tool-versions (1.26), promote `go run ./cmd/verify` as the canonical gate, drop the misleading `go vet` line, replace the fictional `make install` with the actual build command, document HEALTHD_CONFIG.

- `healthd daemon install` was removed in 63a396a (process-compose now manages the daemon), but the quickstart still listed it. - The macOS LaunchAgent PATH paragraph was tied to that removed install command. - `docs/testing.md` was deleted in 0669eb8 but the link remained. - Removed the optional host-e2e snippet whose target tests were deleted in 63a396a (matching CI workflow removal in fe95f2b).

The TUI's "Recent Alerts" panel reads ~/.local/state/healthd/alerts.log via internal/tui/alerts.go, but nothing in the codebase ever wrote that file — the writer half of the feature shipped in PR #29 was never implemented. This adds a new internal/alertlog package as the canonical writer: - DefaultPath() — single source of truth for the log location - Append() — atomic append, creates parent dir, sanitizes newlines - FormatLine() — produces lines that match alertLinePattern in the TUI The daemon's RunLoop calls alertlog.Append on every emitted transition event, regardless of whether notify backends are configured. Writes are best-effort: errors log to stderr and don't fail the loop. internal/tui/alerts.go now delegates DefaultAlertsLogPath to alertlog so reader and writer can never drift on the path. Daemon coverage test now sets HOME to a tempdir and asserts the expected `[crit] failing` line lands in alerts.log, giving us end-to-end coverage of the writer.

The healthd-operator skill referenced subcommands that were removed in 63a396a — daemon install/status/logs/uninstall no longer exist; only `daemon run` remains. Daemon lifecycle is now owned by an external supervisor (process-compose / systemd / launchd). - SKILL.md: rewrite Step 6 (Daemon install/status/logs) and Step 7 (Rollback) for the foreground+supervisor model; add Step 7 for alert history inspection. Drop launchd-PATH troubleshooting entry (no longer healthd's responsibility) and replace with a generic supervisor-PATH note. Add an alerts.log inspection tip now that the log is actually written. - openclaw-integration.md: same surgery on the OpenClaw flow.

The `daemon` command path was a vestige from when healthd managed its own LaunchAgent lifecycle. That code was removed in 63a396a — supervision is now external (process-compose / systemd / launchd). The remaining "daemon run" subcommand was just running a foreground loop, so the naming became misleading. - internal/daemon/ → internal/loop/; package daemon → package loop; RunLoop → Run; matching test name updates. - cmd/daemon.go → cmd/run.go; removed the parent `daemon` command; exposed `healthd run` at the top level. - ARCHITECTURE.md mermaid diagram and component table updated to use the loop package and `healthd run` invocation. - README, AGENTS.md, operator skill, and openclaw integration doc all updated to reference the new command path. Breaking change for anyone invoking `healthd daemon run` directly — update supervisor configs to `healthd run`.

The total-coverage gate at 80% lets weak new packages slip in. The recently-added alertlog package shipped at 77.8% — fine on the total, but a per-package check would have flagged it for review. Adds a per-package floor (default 70%) that fails the verifier when any tested package with statements drops below the threshold. Skips packages with no statements (e2e/cli, integration). Both thresholds are flag-tunable: --min-coverage and --min-package-coverage.

Adds .github/pull_request_template.md with the structure already practiced in recent PRs: Summary, Changes, Validation, Risks, Linked Issues. The Validation section nudges contributors and agents toward the canonical proof points (verify, pre-push hook, tests, smoke, docs).

The go.mod `go` directive is `1.24.0`, which is the language version contract. `.tool-versions` pins the dev toolchain to 1.26.0 (a newer toolchain compiling code that targets 1.24). AGENTS.md previously said "Go 1.26", conflating the two. Changed to "Go 1.24+" with a parenthetical pointing at .tool-versions for the toolchain pin.

Review surfaced two real risks in the per-package gate added in d2b488f: 1. parsePackageCoverage silently dropped non-matching lines, so a future change to `go test` output format would let the gate vacuously pass ("all packages ≥ 70.0%" while measuring nothing). 2. The function had zero unit tests, despite cmd/verify being excluded from its own coverage gate. - Added a "parsed > 0 when expected > 0" sanity check that fails the verifier loudly if no `ok ... coverage:` lines match. Lines for packages with no _test.go files (no `ok` prefix) are tolerated as expected partial misses. - Added table-driven tests covering normal, cached, no-statements, mixed, partial-match, zero-match, and trailing-whitespace cases. Also normalized status indicators from ❌/✅ to ✗/✓ to match the unicode style established in 7991d2b.

Cleanup reviewer flagged two speculative seams inherited from the old cmd/daemon.go that the rename to cmd/run.go preserved unnecessarily: - `var runLoop = loop.Run` was a stub-point with no test stubbing it (grep confirms zero callers besides the one inline use). Inlined back to `loop.Run(...)`. - `loadConfigForRun` returned the resolved config path, but the only caller discarded it. Dropped the path from the signature. Per project doctrine: extension points are earned by real use-cases, not speculation.

Copilot

Pull request overview

This PR updates healthd’s long-running execution model and operational tooling by introducing a canonical alerts.log writer, renaming the daemon loop command/package to reflect supervisor-owned lifecycle, and tightening verification via a per-package coverage gate.

Changes:

Add internal/alertlog and wire loop.Run to append an alerts transition line on every state change (independent of notifier config).
Rename healthd daemon run → healthd run and internal/daemon → internal/loop; update docs/operator skill accordingly.
Enhance cmd/verify with per-package coverage threshold parsing + tests; add pre-push hook and PR template; remove stale macOS host-e2e workflow.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
skills/healthd-operator/references/openclaw-integration.md	Updates operator flow to supervisor-based lifecycle + alerts.log inspection.
skills/healthd-operator/SKILL.md	Refreshes skill description/workflow/troubleshooting for `healthd run` + alerts history.
internal/tui/alerts.go	Delegates alerts log path to `internal/alertlog` to prevent reader/writer drift.
internal/loop/runtime.go	Renames run loop and appends alert transitions to alerts.log.
internal/loop/runtime_test.go	Updates tests to call `loop.Run` and new naming.
internal/loop/coverage_test.go	Extends branch-coverage test to assert alerts.log writes.
internal/alertlog/alertlog.go	Introduces canonical alerts.log path + format + append helper.
internal/alertlog/alertlog_test.go	Adds tests for alert log formatting, append behavior, and TUI parse round-trip.
docs/ARCHITECTURE.md	Aligns architecture diagram and component list with loop/alerts.log.
cmd/verify/main.go	Adds per-package coverage threshold gate and parsing of `go test` output.
cmd/verify/main_test.go	Adds table-driven tests for `parsePackageCoverage`.
cmd/run.go	Adds top-level `healthd run` Cobra command using `internal/loop`.
cmd/root.go	Registers `run` command and removes `daemon` command.
cmd/daemon.go	Deletes deprecated daemon command implementation.
README.md	Updates quickstart to use `healthd run`; removes stale daemon/host-e2e notes.
AGENTS.md	Updates contributor commands, hook enablement, and running instructions.
.github/workflows/e2e-host-macos.yml	Removes stale workflow for deleted host e2e tests.
.github/pull_request_template.md	Adds PR checklist aligned with new verify/hook workflow.
.git-hooks/pre-push	Adds pre-push hook to run verifier.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T22:52:33Z

+func FormatLine(t time.Time, state, checkName, group, reason string) string {
+	return fmt.Sprintf(
+		"%s [%s] %s (%s) - %s",
+		t.UTC().Format(time.RFC3339),
+		state,
+		strings.TrimSpace(checkName),
+		strings.TrimSpace(group),
+		sanitizeReason(reason),
+	)


FormatLine claims to produce output that matches the TUI parser, but checkName and group are only TrimSpace’d. If either contains embedded newlines or the delimiter characters used by alertLinePattern (notably "(" in checkName or ")" in group), the resulting line becomes unparsable and the status UI will silently drop it. Consider sanitizing checkName/group (collapse CR/LF like reason, and/or replace/escape parentheses/brackets) or enforce an allowed character set during config validation so the writer truly guarantees parseable output.

Addressed in commit 2b13992 — chose parse-don't-validate at the config boundary instead of escaping at write time. New validateAlertSafeIdentifier rejects (, ), [, ], and embedded newlines in check.name and check.group at config load, with table-driven tests. The writer remains simple and the operator-facing log lines stay clean.

Copilot · 2026-04-25T22:52:34Z

+		return fmt.Errorf("create alerts log dir: %w", err)
+	}
+
+	file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)


alerts.log is created with mode 0644, which makes the log world-readable on multi-user systems. Since this file can contain check names/groups and failure reasons, it’s safer to default to 0600 (and rely on umask only as a further restriction).

Suggested change

file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)

file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o600)

Addressed in commit b52a281 — file mode is now 0o600.

Copilot · 2026-04-25T22:52:37Z

+	scanner := bufio.NewScanner(strings.NewReader(output))
+	for scanner.Scan() {
+		line := strings.TrimSpace(scanner.Text())
+		if matches := packageCoverageLine.FindStringSubmatch(line); len(matches) == 3 {
+			coverage, err := strconv.ParseFloat(matches[2], 64)
+			if err != nil {
+				return nil, fmt.Errorf("parse coverage for %s: %w", matches[1], err)
+			}
+			covered = append(covered, packageCoverage{Name: matches[1], Coverage: coverage})
+			continue
+		}
+		if packageNoStatementsLine.MatchString(line) {
+			noStatements++
+		}
+	}
+
+	if expected > 0 && len(covered)+noStatements == 0 {
+		return nil, fmt.Errorf("parsed coverage for 0 of %d packages; `go test` output format may have changed", expected)
+	}
+	return covered, nil


parsePackageCoverage doesn’t check scanner.Err() after the scan loop. If scanning fails (e.g., an unexpectedly long line exceeds the Scanner token limit), the verifier could silently treat it as “no matches” or miss packages. Add a scanner.Err() check and return a clear error so the coverage gate fails loudly on read/scan issues.

Addressed in commit fd1d22a — parsePackageCoverage now checks scanner.Err() after the loop and surfaces a "scan go test output" error.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c81dd2d128

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T22:53:13Z

+		"%s [%s] %s (%s) - %s",
+		t.UTC().Format(time.RFC3339),
+		state,
+		strings.TrimSpace(checkName),
+		strings.TrimSpace(group),


Escape parsed delimiters before writing alerts.log

FormatLine emits checkName and group verbatim into a line format that the TUI parser expects as ... ([group]) ... with checkName matched by ([^(]+) and group by ([^)]*). Because config validation allows arbitrary non-empty names/groups, a check name containing ( or a group containing ) will produce log lines that LoadRecentAlerts cannot parse, so transitions silently disappear from healthd status. Please escape/encode these fields (or widen the parser contract) before writing.

Useful? React with 👍 / 👎.

Addressed in commit 2b13992 — same fix as Copilot's parallel finding. Validation pushed to the config boundary (parse-don't-validate) rather than escape-at-write, so the rule is enforced once at load with a clear error and the writer stays trivial.

chatgpt-codex-connector · 2026-04-25T22:53:13Z

+	packageCoverageLine     = regexp.MustCompile(`^ok\s+(\S+)\s+\S+\s+coverage:\s+([0-9.]+)% of statements$`)
+	packageNoStatementsLine = regexp.MustCompile(`^ok\s+(\S+)\s+\S+\s+coverage:\s+\[no statements\]$`)


Parse non-ok coverage lines in package coverage gate

The per-package coverage parser only recognizes lines that start with ok, so lines like github.com/... coverage: 0.0% of statements (the no-_test.go case already covered in main_test.go) are dropped from pkgCovs. That means packages with executable code but no tests can bypass --min-package-coverage, which undermines the new per-package guardrail. Parse those bare coverage lines (or explicitly fail on them) so uncovered packages are enforced consistently.

Useful? React with 👍 / 👎.

Not actioning — the per-package gate is intentionally scoped to tested packages. The only bare-coverage line in this repo today is the root package main, which is a 4-line Cobra wrapper exercised transitively through cmd/... (currently 83% covered). Forcing the parser to enforce bare lines would force a contrived test for func main() (which can't be called without os.Exit) or a refactor into a run() int indirection — both worse than the current bypass for one wrapper package.

The real risk this would catch is "someone adds a new untested package with statements" — that's a code-review concern, not something the verifier should police.

Documented this design decision via the existing comment on parsePackageCoverage: "Packages with no _test.go files print a non-matching bare line with no "ok" prefix, so a partial match count is normal — only zero matches is a red flag."

Copilot review flagged that the log file was world-readable (0o644). Alert lines contain check names, groups, and failure reasons — operationally sensitive on multi-user systems. Tighten to owner-only read/write; rely on umask only as a further restriction.

Copilot review flagged that bufio.Scanner errors (e.g., line exceeding the default 64KiB token limit) would silently truncate the parsed output, potentially hiding packages from the per-package gate. Surface scanner errors so the verifier fails loudly on read/scan issues.

Copilot, Codex, and an internal silent-failures review all flagged the same gap: alertlog.FormatLine writes check names and groups verbatim into a line shape (`time [state] name (group) - reason`) that the TUI parser regex `^(\S+) \[([^\]]+)\] ([^(]+) $([^)]*)$ - (.*)$` cannot recover if the name contains '(' or the group contains ')'. Transitions would silently disappear from `healthd status`. Rather than escape at write time (which still leaves operator-facing log lines with surprise characters), parse-don't-validate at the config boundary: reject '(', ')', '[', ']', and embedded newlines in check names and groups. Surfaces a clear error at config load with the offending field and character. Tests: table-driven cases for each rejected character, plus a happy path with `disk-root` and `host services` (space allowed).

altaywtf added 11 commits April 26, 2026 01:25

ci: remove broken host-macos e2e workflow

fe95f2b

The workflow ran `go test ./e2e/host/...` against a path deleted in 63a396a. Every nightly cron run would fail. Drop it; the underlying daemon-host test scaffold is gone and re-adding it is out of scope.

Copilot AI review requested due to automatic review settings April 25, 2026 22:49

Copilot started reviewing on behalf of altaywtf April 25, 2026 22:49 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

altaywtf added 3 commits April 26, 2026 10:19

altaywtf merged commit 4d7c067 into main Apr 26, 2026
2 checks passed

altaywtf deleted the chore/repo-improvements branch April 26, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: agent-readiness pass + alerts.log writer + daemon→run rename#31

chore: agent-readiness pass + alerts.log writer + daemon→run rename#31
altaywtf merged 14 commits intomainfrom
chore/repo-improvements

altaywtf commented Apr 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

altaywtf Apr 26, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

altaywtf Apr 26, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

altaywtf Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

altaywtf Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

altaywtf Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	file, err := os.OpenFile(path, os.O_APPEND\|os.O_CREATE\|os.O_WRONLY, 0o644)
	file, err := os.OpenFile(path, os.O_APPEND\|os.O_CREATE\|os.O_WRONLY, 0o600)

		packageCoverageLine = regexp.MustCompile(`^ok\s+(\S+)\s+\S+\s+coverage:\s+([0-9.]+)% of statements$`)
		packageNoStatementsLine = regexp.MustCompile(`^ok\s+(\S+)\s+\S+\s+coverage:\s+\[no statements\]$`)

Conversation

altaywtf commented Apr 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Risks

Linked Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

altaywtf Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

altaywtf Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

altaywtf Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

altaywtf Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

altaywtf Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

altaywtf commented Apr 25, 2026 •

edited by cursor Bot

Loading