Skip to content

fix: anchor tool schema messages to English#302

Open
mason5052 wants to merge 2 commits intovxcontrol:mainfrom
mason5052:codex/issue-285-english-only-tool-schema
Open

fix: anchor tool schema messages to English#302
mason5052 wants to merge 2 commits intovxcontrol:mainfrom
mason5052:codex/issue-285-english-only-tool-schema

Conversation

@mason5052
Copy link
Copy Markdown
Contributor

@mason5052 mason5052 commented May 6, 2026

Summary

Hard-codes the user-facing description text in the tool JSON schema to English so the LLM never sees the legacy "in user's language only" hint. This stops Russian/mixed-language message and result fields from leaking into traces and the UI when a flow is created in English.

Problem

backend/pkg/tools/args.go declared jsonschema_description tags that told the model to answer "in user's language only" / "in the user's language only" / "in the user's language". Because the schema is reflected once at startup and shared across flows, a flow in English would still receive the Russian-leaning hint baked into the tool definitions, producing the mixed-language tool-call output reported in #285.

Solution

Replace every in user's language only, in the user's language only, and in the user's language occurrence with in English in the struct tags across backend/pkg/tools/args.go. The follow-up commit also removed the remaining in the user's language (without "only") schema hints on the search action Message fields, so no schema description still instructs the model to answer in the user's language. No runtime locale logic, no schema re-shape, no API/DB/UI change.

User Impact

  • New flows: tool-call message / result / clarification fields render in English regardless of UI locale.
  • No migration, config, or restart steps for users.
  • Flow.language is unchanged; if multilingual support is added later, this is a one-line edit per field.

Test Plan

  • TestToolSchemasDoNotInstructUsersLanguage sweeps every tool returned by GetRegistryDefinitions() and fails if any of the broader forbidden patterns (user's language, the user's language) reappear in any tool schema, not just the old exact user's language only phrase.
  • TestUserFacingMessageDescriptionsAreEnglishAnchored asserts the eight user-facing message/result properties carry the "in English" anchor, going through the same reflector.Reflect path the LLM sees.
  • go test ./pkg/tools/... passes locally.
  • rg "user's language|the user's language" backend/pkg/tools/args.go returns no matches (verified).

Closes #285

## Summary
Hard-codes the user-facing description text in the tool JSON schema to
English so the LLM never sees the legacy "in user's language only"
hint. This stops Russian/mixed-language `message` and `result` fields
from leaking into traces and the UI when a flow is created in English.

## Problem
backend/pkg/tools/args.go declared 29 jsonschema_description tags that
told the model to answer "in user's language only". Because the schema
is reflected once at startup and shared across flows, a flow in
English would still receive the Russian-leaning hint baked into the
tool definitions, producing the mixed-language output reported in
issue vxcontrol#285.

## Solution
Replace every `in user's language only` and the single
`in the user's language only` occurrence with `in English` in the
struct tags. No runtime locale logic, no schema re-shape, no API/DB
change. Three remaining `in the user's language` (without "only")
strings on different fields are intentionally left alone to keep this
fix narrow.

## User Impact
- New flows: tool-call message / result / clarification fields render
  in English regardless of UI locale.
- No migration, config, or restart steps for users.
- Flow.language is unchanged; if multilingual support is added later,
  this is a one-line edit per field.

## Test Plan
- New test TestToolSchemasDoNotMentionUsersLanguageOnly sweeps every
  tool returned by GetRegistryDefinitions() and fails if the banned
  substring re-appears in any schema.
- New test TestUserFacingMessageDescriptionsAreEnglishAnchored asserts
  the eight user-facing message/result properties carry the "in
  English" anchor, going through the same reflector path the LLM sees.
- go test ./pkg/tools/... passes locally.
- rg "user's language only" backend/pkg/tools/args.go returns no
  matches.

Closes vxcontrol#285

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Copilot AI review requested due to automatic review settings May 6, 2026 22:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes locale-ambiguous “in user's language only” hints from tool argument JSON schema descriptions and replaces them with an explicit “in English” anchor, preventing mixed-language tool-call message/result outputs (per #285).

Changes:

  • Updated jsonschema_description struct tags in backend/pkg/tools/args.go to replace “in user's language only” with “in English” for user-facing message/result fields.
  • Added regression tests to ensure tool schemas never reintroduce the banned substring and that key message/result descriptions explicitly include “in English”.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
backend/pkg/tools/args.go Re-anchors user-facing tool schema descriptions from “user’s language only” to “in English”.
backend/pkg/tools/args_test.go Adds regression tests to prevent reintroduction of locale-ambiguous schema text and assert English anchoring for key fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The locale-ambiguous wording lived in three Search*Action.Message
descriptions ('SearchGuideAction', 'SearchAnswerAction',
'SearchCodeAction'). Replace 'in the user's language' with 'in English'
so the schema agrees with the system prompt, which already renders
{{.Lang}} as English under vxcontrol#216's English-only policy.

Strengthen TestToolSchemasDoNotInstructUsersLanguage so it does not
only reject the older exact substring 'user's language only'. It now
rejects 'user's language' outright, which also catches variants like
'the user's language' and prevents future schema edits from quietly
reintroducing the regression.

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
@mason5052
Copy link
Copy Markdown
Contributor Author

Pushed a follow-up (7437520) that broadens the regression guard.

Previously the test only checked for the literal substring "user's language only". The schema also had two other "in the user's language" variants that weren't directly caught by the guard. The test now iterates over a forbidden-pattern slice ("user's language", "the user's language") and fails on any of them, so any future drift -- with or without the leading article, with or without "only" -- is caught the same way.

Renamed the test to TestToolSchemasDoNotInstructUsersLanguage to reflect the broader contract. go test ./pkg/tools/... is green.

@mason5052
Copy link
Copy Markdown
Contributor Author

PR body refreshed to reflect the follow-up commit (7437520).

The earlier body said three in the user's language (without "only") strings on the search action Message fields were intentionally left alone. That is no longer accurate — the follow-up removed those three remaining hints as well, and the regression test was broadened to reject user's language and the user's language (not only user's language only).

rg "user's language|the user's language" backend/pkg/tools/args.go now returns no matches, so no schema description still instructs the model to answer in the user's language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Agent outputs Russian in message/result fields despite Flow.language = "English"

2 participants