Skip to content

chore: agent-readiness pass + alerts.log writer + daemon→run rename#31

Merged
altaywtf merged 14 commits intomainfrom
chore/repo-improvements
Apr 26, 2026
Merged

chore: agent-readiness pass + alerts.log writer + daemon→run rename#31
altaywtf merged 14 commits intomainfrom
chore/repo-improvements

Conversation

@altaywtf
Copy link
Copy Markdown
Member

@altaywtf altaywtf commented Apr 25, 2026

Summary

Session-long readiness pass on healthd. Three substantive changes plus cleanup: (1) wire up the alerts.log writer the TUI was already reading from but nothing was producing, (2) rename healthd daemon runhealthd run and internal/daemoninternal/loop (lifecycle is process-compose's job now, the old naming was misleading), (3) add a pre-push verifier hook and a per-package coverage gate so weak new packages can't slip in. Doc drift fixed across the board.

Changes

  • feat(alertlog) — new internal/alertlog package, single source of truth for the log path + format. loop.Run calls alertlog.Append on every transition regardless of notifier config; best-effort with stderr error logging. TUI's DefaultAlertsLogPath delegates so reader/writer can't drift.
  • refactorinternal/daemoninternal/loop, RunLoopRun, cmd/daemon.gocmd/run.go. Top-level healthd run replaces healthd daemon run. Breaking change — supervisor configs need updating.
  • chore(verify) — added per-package coverage gate (default 70%) with sanity check that fails loudly if go test output format ever shifts. Table-driven tests for the parser (9 cases). Status indicators standardized to ✓/✗.
  • chore.git-hooks/pre-push running go run ./cmd/verify; .github/pull_request_template.md.
  • ci — deleted .github/workflows/e2e-host-macos.yml (was running tests against e2e/host/... which was deleted in 63a396a; would have failed every nightly run).
  • docs — README, AGENTS.md, ARCHITECTURE.md, operator skill all reconciled. Removed stale daemon install/status/logs/uninstall references, LaunchAgent paragraph, make install (no Makefile), docs/testing.md (deleted in 0669eb8).

Validation

  • go run ./cmd/verify passes locally — total 83.7%, all packages ≥ 70%
  • Pre-push hook tested directly (runs verifier, exits 0)
  • New behavior covered by tests — alertlog round-trips through TUI parser; daemon coverage test asserts [crit] failing line lands in alerts.log; per-package parser has 9 unit tests
  • Manual smoke: built binary, ran healthd run against a temp config with one passing + one failing check, confirmed alerts.log got a real RFC3339 line and healthd status rendered it back
  • CLI surface confirmed: healthd --help no longer lists daemon; healthd daemon run errors out with "unknown command"; healthd run exists at top level
  • Independent /verify + 4-persona /review pass (general, tests, silent-failures, cleanup) — verdict ship-it after parser hardening + emoji + Go-version fixes

Risks

  • Breaking change for anyone invoking healthd daemon run directly. Update process-compose / systemd / launchd configs to healthd run.
  • alerts.log write failures (disk full, EACCES) currently log to stderr and continue — proper escalation is a deferred follow-up.
  • TestRunAdditionalBranches is a real-clock test (45ms ticker window); the new alerts.log assertion increases work in that window. Hasn't flaked locally but worth watching in CI. Splitting + clock-injection is a deferred follow-up.

Linked Issues

None.


Note

Medium Risk
Medium risk due to a breaking CLI rename (healthd daemon runhealthd run) that requires supervisor config updates, plus new alerts.log writing and config validation that can change runtime behavior/fail previously-accepted configs.

Overview
This PR wires the runtime loop to persist transition events to ~/.local/state/healthd/alerts.log via a new internal/alertlog writer (shared with the TUI), and adds config validation to reject check name/group values that would break alerts parsing.

It renames the foreground daemon command from healthd daemon run to top-level healthd run and renames the underlying package/API from internal/daemon.RunLoop to internal/loop.Run, updating tests and docs accordingly.

Developer/CI hygiene is tightened by adding a pre-push hook and PR template, enhancing cmd/verify with a per-package coverage gate (and tests for its parser), and removing the obsolete macOS host e2e workflow.

Reviewed by Cursor Bugbot for commit 2b13992. Bugbot is set up for automated code reviews on this repo. Configure here.

altaywtf added 11 commits April 26, 2026 01:25
The workflow ran `go test ./e2e/host/...` against a path deleted in 63a396a.
Every nightly cron run would fail. Drop it; the underlying daemon-host
test scaffold is gone and re-adding it is out of scope.
- New .git-hooks/pre-push runs `go run ./cmd/verify` (fmt + lint + test
  + 80% coverage gate) before push. Activate per-clone with
  `git config core.hooksPath .git-hooks`.
- AGENTS.md: bump documented Go version to match .tool-versions (1.26),
  promote `go run ./cmd/verify` as the canonical gate, drop the
  misleading `go vet` line, replace the fictional `make install` with
  the actual build command, document HEALTHD_CONFIG.
- `healthd daemon install` was removed in 63a396a (process-compose now
  manages the daemon), but the quickstart still listed it.
- The macOS LaunchAgent PATH paragraph was tied to that removed
  install command.
- `docs/testing.md` was deleted in 0669eb8 but the link remained.
- Removed the optional host-e2e snippet whose target tests were
  deleted in 63a396a (matching CI workflow removal in fe95f2b).
The TUI's "Recent Alerts" panel reads ~/.local/state/healthd/alerts.log
via internal/tui/alerts.go, but nothing in the codebase ever wrote that
file — the writer half of the feature shipped in PR #29 was never
implemented.

This adds a new internal/alertlog package as the canonical writer:
- DefaultPath() — single source of truth for the log location
- Append() — atomic append, creates parent dir, sanitizes newlines
- FormatLine() — produces lines that match alertLinePattern in the TUI

The daemon's RunLoop calls alertlog.Append on every emitted transition
event, regardless of whether notify backends are configured. Writes are
best-effort: errors log to stderr and don't fail the loop.

internal/tui/alerts.go now delegates DefaultAlertsLogPath to alertlog
so reader and writer can never drift on the path.

Daemon coverage test now sets HOME to a tempdir and asserts the
expected `[crit] failing` line lands in alerts.log, giving us
end-to-end coverage of the writer.
The healthd-operator skill referenced subcommands that were removed in
63a396a — daemon install/status/logs/uninstall no longer exist; only
`daemon run` remains. Daemon lifecycle is now owned by an external
supervisor (process-compose / systemd / launchd).

- SKILL.md: rewrite Step 6 (Daemon install/status/logs) and Step 7
  (Rollback) for the foreground+supervisor model; add Step 7 for alert
  history inspection. Drop launchd-PATH troubleshooting entry (no
  longer healthd's responsibility) and replace with a generic
  supervisor-PATH note. Add an alerts.log inspection tip now that the
  log is actually written.
- openclaw-integration.md: same surgery on the OpenClaw flow.
The `daemon` command path was a vestige from when healthd managed its
own LaunchAgent lifecycle. That code was removed in 63a396a — supervision
is now external (process-compose / systemd / launchd). The remaining
"daemon run" subcommand was just running a foreground loop, so the
naming became misleading.

- internal/daemon/ → internal/loop/; package daemon → package loop;
  RunLoop → Run; matching test name updates.
- cmd/daemon.go → cmd/run.go; removed the parent `daemon` command;
  exposed `healthd run` at the top level.
- ARCHITECTURE.md mermaid diagram and component table updated to use
  the loop package and `healthd run` invocation.
- README, AGENTS.md, operator skill, and openclaw integration doc all
  updated to reference the new command path.

Breaking change for anyone invoking `healthd daemon run` directly —
update supervisor configs to `healthd run`.
The total-coverage gate at 80% lets weak new packages slip in. The
recently-added alertlog package shipped at 77.8% — fine on the total,
but a per-package check would have flagged it for review.

Adds a per-package floor (default 70%) that fails the verifier when any
tested package with statements drops below the threshold. Skips
packages with no statements (e2e/cli, integration). Both thresholds
are flag-tunable: --min-coverage and --min-package-coverage.
Adds .github/pull_request_template.md with the structure already
practiced in recent PRs: Summary, Changes, Validation, Risks, Linked
Issues. The Validation section nudges contributors and agents toward
the canonical proof points (verify, pre-push hook, tests, smoke, docs).
The go.mod `go` directive is `1.24.0`, which is the language version
contract. `.tool-versions` pins the dev toolchain to 1.26.0 (a newer
toolchain compiling code that targets 1.24). AGENTS.md previously said
"Go 1.26", conflating the two. Changed to "Go 1.24+" with a parenthetical
pointing at .tool-versions for the toolchain pin.
Review surfaced two real risks in the per-package gate added in d2b488f:

1. parsePackageCoverage silently dropped non-matching lines, so a future
   change to `go test` output format would let the gate vacuously pass
   ("all packages ≥ 70.0%" while measuring nothing).
2. The function had zero unit tests, despite cmd/verify being excluded
   from its own coverage gate.

- Added a "parsed > 0 when expected > 0" sanity check that fails the
  verifier loudly if no `ok ... coverage:` lines match. Lines for
  packages with no _test.go files (no `ok` prefix) are tolerated as
  expected partial misses.
- Added table-driven tests covering normal, cached, no-statements,
  mixed, partial-match, zero-match, and trailing-whitespace cases.

Also normalized status indicators from ❌/✅ to ✗/✓ to match the
unicode style established in 7991d2b.
Cleanup reviewer flagged two speculative seams inherited from the old
cmd/daemon.go that the rename to cmd/run.go preserved unnecessarily:

- `var runLoop = loop.Run` was a stub-point with no test stubbing it
  (grep confirms zero callers besides the one inline use). Inlined
  back to `loop.Run(...)`.
- `loadConfigForRun` returned the resolved config path, but the only
  caller discarded it. Dropped the path from the signature.

Per project doctrine: extension points are earned by real use-cases,
not speculation.
Copilot AI review requested due to automatic review settings April 25, 2026 22:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates healthd’s long-running execution model and operational tooling by introducing a canonical alerts.log writer, renaming the daemon loop command/package to reflect supervisor-owned lifecycle, and tightening verification via a per-package coverage gate.

Changes:

  • Add internal/alertlog and wire loop.Run to append an alerts transition line on every state change (independent of notifier config).
  • Rename healthd daemon runhealthd run and internal/daemoninternal/loop; update docs/operator skill accordingly.
  • Enhance cmd/verify with per-package coverage threshold parsing + tests; add pre-push hook and PR template; remove stale macOS host-e2e workflow.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
skills/healthd-operator/references/openclaw-integration.md Updates operator flow to supervisor-based lifecycle + alerts.log inspection.
skills/healthd-operator/SKILL.md Refreshes skill description/workflow/troubleshooting for healthd run + alerts history.
internal/tui/alerts.go Delegates alerts log path to internal/alertlog to prevent reader/writer drift.
internal/loop/runtime.go Renames run loop and appends alert transitions to alerts.log.
internal/loop/runtime_test.go Updates tests to call loop.Run and new naming.
internal/loop/coverage_test.go Extends branch-coverage test to assert alerts.log writes.
internal/alertlog/alertlog.go Introduces canonical alerts.log path + format + append helper.
internal/alertlog/alertlog_test.go Adds tests for alert log formatting, append behavior, and TUI parse round-trip.
docs/ARCHITECTURE.md Aligns architecture diagram and component list with loop/alerts.log.
cmd/verify/main.go Adds per-package coverage threshold gate and parsing of go test output.
cmd/verify/main_test.go Adds table-driven tests for parsePackageCoverage.
cmd/run.go Adds top-level healthd run Cobra command using internal/loop.
cmd/root.go Registers run command and removes daemon command.
cmd/daemon.go Deletes deprecated daemon command implementation.
README.md Updates quickstart to use healthd run; removes stale daemon/host-e2e notes.
AGENTS.md Updates contributor commands, hook enablement, and running instructions.
.github/workflows/e2e-host-macos.yml Removes stale workflow for deleted host e2e tests.
.github/pull_request_template.md Adds PR checklist aligned with new verify/hook workflow.
.git-hooks/pre-push Adds pre-push hook to run verifier.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +30
func FormatLine(t time.Time, state, checkName, group, reason string) string {
return fmt.Sprintf(
"%s [%s] %s (%s) - %s",
t.UTC().Format(time.RFC3339),
state,
strings.TrimSpace(checkName),
strings.TrimSpace(group),
sanitizeReason(reason),
)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FormatLine claims to produce output that matches the TUI parser, but checkName and group are only TrimSpace’d. If either contains embedded newlines or the delimiter characters used by alertLinePattern (notably "(" in checkName or ")" in group), the resulting line becomes unparsable and the status UI will silently drop it. Consider sanitizing checkName/group (collapse CR/LF like reason, and/or replace/escape parentheses/brackets) or enforce an allowed character set during config validation so the writer truly guarantees parseable output.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 2b13992 — chose parse-don't-validate at the config boundary instead of escaping at write time. New validateAlertSafeIdentifier rejects (, ), [, ], and embedded newlines in check.name and check.group at config load, with table-driven tests. The writer remains simple and the operator-facing log lines stay clean.

Comment thread internal/alertlog/alertlog.go Outdated
return fmt.Errorf("create alerts log dir: %w", err)
}

file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alerts.log is created with mode 0644, which makes the log world-readable on multi-user systems. Since this file can contain check names/groups and failure reasons, it’s safer to default to 0600 (and rely on umask only as a further restriction).

Suggested change
file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o644)
file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0o600)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit b52a281 — file mode is now 0o600.

Comment thread cmd/verify/main.go
Comment on lines +205 to +224
scanner := bufio.NewScanner(strings.NewReader(output))
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if matches := packageCoverageLine.FindStringSubmatch(line); len(matches) == 3 {
coverage, err := strconv.ParseFloat(matches[2], 64)
if err != nil {
return nil, fmt.Errorf("parse coverage for %s: %w", matches[1], err)
}
covered = append(covered, packageCoverage{Name: matches[1], Coverage: coverage})
continue
}
if packageNoStatementsLine.MatchString(line) {
noStatements++
}
}

if expected > 0 && len(covered)+noStatements == 0 {
return nil, fmt.Errorf("parsed coverage for 0 of %d packages; `go test` output format may have changed", expected)
}
return covered, nil
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsePackageCoverage doesn’t check scanner.Err() after the scan loop. If scanning fails (e.g., an unexpectedly long line exceeds the Scanner token limit), the verifier could silently treat it as “no matches” or miss packages. Add a scanner.Err() check and return a clear error so the coverage gate fails loudly on read/scan issues.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit fd1d22aparsePackageCoverage now checks scanner.Err() after the loop and surfaces a "scan go test output" error.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c81dd2d128

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +24 to +28
"%s [%s] %s (%s) - %s",
t.UTC().Format(time.RFC3339),
state,
strings.TrimSpace(checkName),
strings.TrimSpace(group),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape parsed delimiters before writing alerts.log

FormatLine emits checkName and group verbatim into a line format that the TUI parser expects as ... ([group]) ... with checkName matched by ([^(]+) and group by ([^)]*). Because config validation allows arbitrary non-empty names/groups, a check name containing ( or a group containing ) will produce log lines that LoadRecentAlerts cannot parse, so transitions silently disappear from healthd status. Please escape/encode these fields (or widen the parser contract) before writing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 2b13992 — same fix as Copilot's parallel finding. Validation pushed to the config boundary (parse-don't-validate) rather than escape-at-write, so the rule is enforced once at load with a clear error and the writer stays trivial.

Comment thread cmd/verify/main.go
Comment on lines +192 to +193
packageCoverageLine = regexp.MustCompile(`^ok\s+(\S+)\s+\S+\s+coverage:\s+([0-9.]+)% of statements$`)
packageNoStatementsLine = regexp.MustCompile(`^ok\s+(\S+)\s+\S+\s+coverage:\s+\[no statements\]$`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse non-ok coverage lines in package coverage gate

The per-package coverage parser only recognizes lines that start with ok, so lines like github.com/... coverage: 0.0% of statements (the no-_test.go case already covered in main_test.go) are dropped from pkgCovs. That means packages with executable code but no tests can bypass --min-package-coverage, which undermines the new per-package guardrail. Parse those bare coverage lines (or explicitly fail on them) so uncovered packages are enforced consistently.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actioning — the per-package gate is intentionally scoped to tested packages. The only bare-coverage line in this repo today is the root package main, which is a 4-line Cobra wrapper exercised transitively through cmd/... (currently 83% covered). Forcing the parser to enforce bare lines would force a contrived test for func main() (which can't be called without os.Exit) or a refactor into a run() int indirection — both worse than the current bypass for one wrapper package.

The real risk this would catch is "someone adds a new untested package with statements" — that's a code-review concern, not something the verifier should police.

Documented this design decision via the existing comment on parsePackageCoverage: "Packages with no _test.go files print a non-matching bare line with no "ok" prefix, so a partial match count is normal — only zero matches is a red flag."

Copilot review flagged that the log file was world-readable (0o644).
Alert lines contain check names, groups, and failure reasons —
operationally sensitive on multi-user systems. Tighten to owner-only
read/write; rely on umask only as a further restriction.
Copilot review flagged that bufio.Scanner errors (e.g., line exceeding
the default 64KiB token limit) would silently truncate the parsed
output, potentially hiding packages from the per-package gate. Surface
scanner errors so the verifier fails loudly on read/scan issues.
Copilot, Codex, and an internal silent-failures review all flagged the
same gap: alertlog.FormatLine writes check names and groups verbatim
into a line shape (`time [state] name (group) - reason`) that the TUI
parser regex `^(\S+) \[([^\]]+)\] ([^(]+) \(([^)]*)\) - (.*)$` cannot
recover if the name contains '(' or the group contains ')'. Transitions
would silently disappear from `healthd status`.

Rather than escape at write time (which still leaves operator-facing
log lines with surprise characters), parse-don't-validate at the
config boundary: reject '(', ')', '[', ']', and embedded newlines in
check names and groups. Surfaces a clear error at config load with
the offending field and character.

Tests: table-driven cases for each rejected character, plus a happy
path with `disk-root` and `host services` (space allowed).
@altaywtf altaywtf merged commit 4d7c067 into main Apr 26, 2026
2 checks passed
@altaywtf altaywtf deleted the chore/repo-improvements branch April 26, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants