Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

_Nothing yet._

## [0.9.0] - 2026-05-06

### Breaking

- **Phase 13.5 production LLM runtime cutover (port 8101 → 8102)** — `mlx_lm.server` on port 8101 has been replaced by `llama-server` (llama.cpp b9000+, OpenAI-compat) on port 8102 serving `mdemg-llm-v1.Q5_K_M.gguf`. Operators with `LLM_ENDPOINT=http://127.0.0.1:8101/v1` (or `http://host.docker.internal:8101/v1` in a containerized `.env`) will fail mdemg's startup preflight after upgrading. **Migration**: edit `.env` and change the port `8101` → `8102` everywhere. The `com.mdemg.llama-server.plist` LaunchAgent is auto-installed by the formula's `post_install` hook (`mdemg upgrade --docker-only`). Rollback path: restore `~/Library/LaunchAgents/com.mdemg.mlx-server.plist.disabled-phase13_5`, set `LLM_ENDPOINT=http://127.0.0.1:8101/v1`, bootout llama-server, bootstrap mlx-server (will reintroduce the ~14-min Metal-OOM crash cycle that motivated the cutover). Full rationale in CLAUDE.md "Local LLM Runtime — llama.cpp llama-server (Phase 13.5 cutover)".
- **`MLX_*` env vars deprecated in favor of `LLM_*`** — Phase 13.6 renamed the watchdog/preflight env-var family. Legacy `MLX_*` names continue to work but emit `WARN config: env var deprecated, please rename` at boot. Aliases removable ≥1 release cycle from this commit. Migration: rename `MLX_WATCHDOG_ENABLED` → `LLM_WATCHDOG_ENABLED`, `MLX_PROBE_INTERVAL_SEC` → `LLM_PROBE_INTERVAL_SEC`, `MLX_PROBE_TIMEOUT_SEC` → `LLM_PROBE_TIMEOUT_SEC`, `MLX_FAIL_FAST_ENABLED` → `LLM_FAIL_FAST_ENABLED`, `MDEMG_ALLOW_NO_MLX` → `MDEMG_ALLOW_NO_LLM`. Internal Go package (`internal/mlxprobe/`) and Prometheus metric prefix (`mdemg_mlx_*`) retained as operator-invisible / dashboard-coupled.

### Added

- **POST-FT-LORA-PHASE10.5: UBENCH framework promotion** (2026-05-06, commit `0389b49`) — Promotes `neural.benchmarks.run_benchmark` to a UxTS-pattern framework. New `docs/tests/ubench/{schema,specs,runners,contracts}/` tree with lint / contract / run modes; `mdemg.ubench.json` pinned at config sha `76c97eb8` + holdout sha `b2004783` (17 specs / 108 rows / `min_rows_per_task=3`). New `make test-ubench{,-lint,-contract,-run}` Makefile targets. Pytest contract entry: `pytest docs/tests/ubench/contracts/`. Closes #215. Phase 10.5 sub-part 3 (Phase 5 SFT re-pass on `guardrail.evaluate`, ~4-8h MLX) remains operator-deferred (#216); UBENCH contract is the framework-level mitigation for it staying open. Feature doc: `docs/features/ubench-framework.md`.
- **Claude Code GitHub App workflows** (2026-05-06, PRs #378 + #379) — `.github/workflows/claude.yml` handles `@claude` mentions in issues and PR comments; `.github/workflows/claude-code-review.yml` auto-reviews PRs on `opened`/`synchronize`/`ready_for_review`/`reopened`. Powered by `anthropics/claude-code-action@v1`, the `CLAUDE_CODE_OAUTH_TOKEN` repo secret, and Claude Code GitHub App installation on `reh3376/mdemg`.

### Changed

- **POST-FT-LORA-PHASE14.2.3: Per-category context-column weight (default-on)** (2026-05-06) — **120q full A/B PASSED merge gate** (mean +0.009, std -0.023, 11 improvements, **0 regressions**). The Phase 14.2.2 forensic identified 3 categories where the column consistently regressed (`service_relationships` -0.043, `business_logic_constraints` -0.023, `relationship` -0.017). This sprint zero-weights the column on those 3 categories while keeping default `RETRIEVAL_CONTEXT_COLUMN_WEIGHT=0.10` elsewhere — mirror of Phase 14.1's per-category sparse-gate dispatch. New env knob `RETRIEVAL_CONTEXT_COLUMN_CATEGORY_WEIGHTS` (JSON map) with the 3-category zero-weight default seed; operator JSON value REPLACES seed. **Defaults flipped**: `CONTEXT_FINGERPRINT_ENABLED` `false→true`, `RETRIEVAL_CONTEXT_COLUMN_ENABLED` `false→true`. Per-category breakdown post-flip: `architecture_structure` +0.030, `computed_value` +0.070, `data_flow_integration` +0.010, `business_logic_constraints` +0.005 (was -0.023, now neutral), `service_relationships` -0.010 (was -0.043). std dropped 0.072→0.049 — much tighter retrieval scores. min jumped 0.000→0.350 — zero-score rescues stick. Operator opt-out: `RETRIEVAL_CONTEXT_COLUMN_ENABLED=false`. Post: [`phase_14_2_3_post.md`](docs/development/post-ft-lora/phase_14_2_3_post.md). OpenAI spend: ~$10 (one B_full; A_full reused from 14.2.2).
Expand Down
Loading