@@ -712,23 +712,53 @@ Azure Web App has continuous deployment enabled on ACR. When a new image appears
712712
713713| Command | Purpose | Key Options |
714714| ---------| ---------| -------------|
715- | ` sn build ` | Generate standard names from DD paths or facility signals via LLM pipeline | ` --source {dd,signals} ` , ` --ids ` , ` --domain ` , ` --facility ` , ` --cost-limit ` , ` --dry-run ` , ` --force ` , ` --skip-review ` |
715+ | ` sn mint ` | Generate standard names from DD paths or facility signals via LLM pipeline | ` --source {dd,signals} ` , ` --ids ` , ` --domain ` , ` --facility ` , ` --cost-limit ` , ` --dry-run ` , ` --force ` , ` --skip-review ` , ` --reset-to ` |
716716| ` sn publish ` | Export validated StandardName nodes to YAML catalog files | ` --output-dir ` , ` --ids ` , ` --domain ` , ` --group-by {ids,domain,confidence} ` , ` --confidence-min ` , ` --catalog-dir ` , ` --create-pr ` |
717717| ` sn import ` | Import reviewed YAML catalog entries back into graph | ` --catalog-dir ` (required), ` --tags ` , ` --dry-run ` , ` --check ` |
718718| ` sn status ` | Show standard name statistics from graph | — |
719+ | ` sn reset ` | Reset standard names for re-processing | ` --status ` (required), ` --to ` , ` --source ` , ` --ids ` , ` --dry-run ` |
720+ | ` sn clear ` | Delete standard names from the graph (relationship-first safety model) | ` --status ` , ` --all ` , ` --source ` , ` --ids ` , ` --include-accepted ` , ` --dry-run ` |
719721| ` sn benchmark ` | Benchmark LLM models on standard name generation quality | ` --models ` , ` --source ` , ` --reviewer-model ` |
720722
723+ ### Benchmark
724+
725+ ` sn benchmark ` uses the same prompt pipeline as ` sn mint ` (system/user message split via
726+ ` build_compose_context() ` ). Output table includes a ** Cache %** column showing the prompt-cache
727+ hit rate per model (provider-side via OpenRouter — not something we implement). Scoring is
728+ ** 5-dimensional** : accuracy, completeness, physics_correctness, naming_convention, and
729+ overall, evaluated by a reviewer LLM against a gold reference set (` benchmark_reference.py ` ,
730+ 52 entries across 8 IDSs). The calibration dataset (` benchmark_calibration.yaml ` ) provides
731+ known-quality examples for reviewer consistency checks.
732+
721733### StandardName Lifecycle
722734
723735```
724736drafted → published → accepted
725737 ↘ rejected
726738```
727739
728- - ** drafted** : Generated by ` sn build ` (LLM pipeline)
740+ - ** drafted** : Generated by ` sn mint ` (LLM pipeline)
729741- ** published** : Exported by ` sn publish ` to YAML catalog for human review
730742- ** accepted** : Imported by ` sn import ` from reviewed catalog (catalog-authoritative)
731743
744+ ### Reset and Clear Semantics
745+
746+ ** ` sn reset ` ** — Re-processes existing nodes without deleting them. Clears transient fields
747+ (embedding, model, confidence, generated_at) and removes HAS_STANDARD_NAME and CANONICAL_UNITS
748+ relationships. Optionally changes ` review_status ` via ` --to <status> ` . Default (no ` --to ` ) leaves
749+ status unchanged, only clears fields.
750+
751+ ** ` sn clear ` ** — Deletes StandardName nodes. Uses a relationship-first safety model: HAS_STANDARD_NAME
752+ edges are removed before deleting nodes, and scoped deletes only remove orphaned nodes. Requires
753+ either ` --status <value> ` or ` --all ` .
754+
755+ ** Safety guard:** Both commands require ` --include-accepted ` to touch names with ` review_status=accepted ` .
756+ Accepted names are catalog-authoritative and should rarely be deleted from the graph.
757+
758+ ** ` sn mint --reset-to ` ** — Runs a ` sn reset ` before minting, scoped to the same ` --ids ` /` --source `
759+ filter. Accepts ` extracted ` or ` drafted ` as the target status. Useful for a clean re-run on a
760+ specific IDS without touching the rest of the graph.
761+
732762### Write Semantics
733763
734764Two distinct write paths with different semantics:
0 commit comments