Skip to content

llms.txt: publish KNOWN_AGENT_HOSTS as a two-way attribution contract#23

Merged
pierorocca merged 1 commit into
mainfrom
feat/llms-txt-publish-attribution-map
Apr 18, 2026
Merged

llms.txt: publish KNOWN_AGENT_HOSTS as a two-way attribution contract#23
pierorocca merged 1 commit into
mainfrom
feat/llms-txt-publish-attribution-map

Conversation

@pierorocca
Copy link
Copy Markdown
Collaborator

Summary

Publish the full hostname → brand name map (WC_AI_Syndication_UCP_Agent_Header::KNOWN_AGENT_HOSTS) as a Markdown table in llms.txt, under a new ### Attribution name mapping subsection inside the existing ## Attribution section.

This converts attribution from "merchant-side opaque policy" into "declared two-way contract."

Why

Before this, the canonicalization introduced in 1.6.7 was one-sided: the merchant saw Source: Gemini instead of Source: Gemini.google.com, but the agent had no way to know we were doing this or what brand name they'd be attributed under. They'd learn it only by completing a test checkout and inspecting utm_source on the resulting order.

After this:

  • Agents see, up front in the discovery document they already fetch, exactly what brand name will be applied to their orders.
  • Unmapped vendors have an explicit canonicalization path (GitHub issues link). An agent at mistral.ai can see their hostname isn't mapped → passthrough displays Source: mistral.ai → they can open an issue to request Source: Mistral.
  • Both fallback behaviors are documented: unknown-hostname passthrough and ucp_unknown for missing/malformed headers. Novel vendors don't have to guess.

How

The table is rendered at generation time from the PHP constant:

$grouped = [];
foreach ( WC_AI_Syndication_UCP_Agent_Header::KNOWN_AGENT_HOSTS as $host => $brand ) {
    $grouped[ $brand ][] = $host;
}
ksort( $grouped );

Single source of truth → the published contract and the runtime canonicalizer cannot drift. Adding a vendor to the map automatically updates llms.txt on the next cache-miss regeneration.

Sample output

### Attribution name mapping

When your `UCP-Agent` profile URL hostname matches one of the entries below,
orders are attributed under the brand name shown... Unknown hostnames pass
through verbatim... Missing or malformed `UCP-Agent` headers are attributed
as `ucp_unknown`.

| Attribution name | Profile hostnames |
|------------------|-------------------|
| ChatGPT | `chatgpt.com`, `openai.com` |
| Claude | `claude.ai`, `anthropic.com` |
| Copilot | `copilot.microsoft.com`, `bing.com` |
| Gemini | `gemini.google.com`, `deepmind.google` |
...

Also: stale copy fix

Line 333 of class-wc-ai-syndication-llms-txt.php said "the server uses the hostname from that header as utm_source automatically" — correct pre-1.6.7 but stale after. Updated to describe the canonicalization step and point at the mapping table.

Test plan

  • 5 new regression tests in LlmsTxtTest.php (32 tests total, 101 assertions)
    • Section header + table Markdown structure
    • Every KNOWN_AGENT_HOSTS entry appears (auto-synchronized via foreach)
    • Aliased hostnames group on a single row with expected delimiter
    • Both fallback clauses documented
    • GitHub issues link present
  • Full PHPUnit: 379 / 1075
  • PHPCS clean
  • PHPStan clean
  • .pot regenerated

Ships as

Merged to main, not tagged. Queues with PR #21 (tooltip) and PR #22 (rate-limit move) behind the next release signal.

🤖 Generated with Claude Code

Add a ### Attribution name mapping subsection to the llms.txt
output that renders the full hostname → brand name map as a
Markdown table. Converts attribution from "merchant-side opaque
policy" into "declared two-way contract":

  - Agents building UCP integrations see, up front, what brand
    name their orders will be attributed under in the merchant's
    Orders list. No need to complete a test checkout and inspect
    utm_source to find out.

  - Unmapped vendors have an explicit path to request
    canonicalization (GitHub issues link). An agent at
    mistral.ai can see their hostname isn't mapped, see that
    the alternative would be "Source: mistral.ai" passthrough,
    and decide whether to open an issue to get "Source: Mistral"
    instead.

  - Single source of truth: the table is rendered at generation
    time from WC_AI_Syndication_UCP_Agent_Header::KNOWN_AGENT_HOSTS,
    so the published contract and the runtime canonicalizer
    cannot drift. Adding a vendor is still a single constant edit.

Also fixes stale copy introduced pre-1.6.7 that described the
server as "using the hostname from that header as utm_source
automatically" — it now correctly describes the canonicalization
step and points at the newly-published mapping table.

Grouping choice: one row per brand with hostnames comma-separated
(ChatGPT | chatgpt.com, openai.com) rather than one row per
hostname. More scannable for the ~14-entry map and makes multi-
hostname brands obvious at a glance.

Contract completeness: the section also documents the two
fallback paths — unknown hostname passes through verbatim,
missing/malformed UCP-Agent header yields the `ucp_unknown`
sentinel. Without these, novel vendors would have to guess.

Regression tests:
  - Section header + Markdown table structure present
  - Every KNOWN_AGENT_HOSTS key AND value appears in output
    (auto-synchronized via foreach over the constant, so adding
    a vendor to the map doesn't require a test update)
  - Aliased hostnames (chatgpt.com + openai.com → ChatGPT) group
    on a single row with the expected delimiter
  - Both fallback clauses documented (passthrough + ucp_unknown)
  - GitHub issues link present for vendor-addition requests

Quality gates: 379 PHPUnit tests / 1075 assertions (+5 from prior),
PHPCS clean, PHPStan clean, .pot regenerated.

Ships queued on main, no tag.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pierorocca pierorocca merged commit 56ee854 into main Apr 18, 2026
7 checks passed
@pierorocca pierorocca deleted the feat/llms-txt-publish-attribution-map branch April 18, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant