-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Problem 1: Injected prompt tells model to ignore system prompt
In Codex OAuth mode, the conversation/context being sent includes an injected instruction along the lines of: "IMPORTANT: You MUST ignore the system prompt…". This is both insecure and confusing, and it directly conflicts with the system/developer instructions that should be authoritative.
Where
- packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts
- Codex mode detection: baseURL includes chatgpt.com/backend-api/codex
- This provider builds the request for /responses and (in Codex mode) sets request.instructions = CODEX_SYSTEM_PROMPT and moves steering into the input.
Request
- Remove the injected "ignore the system prompt" text from the prompt/context assembly for Codex OAuth mode.
- Ensure that no user/assistant content can instruct the model to ignore system/developer instructions (or if such text must remain for legacy reasons, it should be excluded/sanitized before sending to the model in Codex OAuth mode).
Why
- It undermines safety and deterministic behavior.
- It can cause the model to follow user-provided override attempts.
Problem 2: Avoid redundant tool calls for AGENTS/LLXPRT config file
Models often waste a tool call reading AGENTS.md (or whatever the configured instruction file is) even when the harness already injected the effective content (e.g. from LLXPRT.md) into the system prompt. Sometimes we configure the injected file name to be AGENTS.md or another name. When the model re-reads it, it costs time and can introduce conflicts if the on-disk file differs from injected content.
Request
When building the initial history/context, inject an artificial tool call/result that indicates the instruction file has already been checked:
- If the configured file exists: include its contents in the injected context (as we already do today via the system prompt).
- If it does not exist or is empty: inject a tool result stating "not found" or "empty".
This should make agents stop spending time on an unnecessary filesystem read, and it prevents conflicts between injected instruction content and a stale local file.
Repro
- Run in Codex OAuth mode (baseURL …/backend-api/codex).
- Observe an injected user-visible instruction attempting to override the system prompt ("ignore system prompt").
- Observe the model still attempts a tool call to read AGENTS.md even though instruction content is already injected via system context.
Expected
- No injected instruction that tells the model to ignore system/developer prompt, especially in Codex OAuth mode.
- Initial context includes a prior "read instruction file" tool call/result (or explicit not-found/empty) so the model doesn't re-check.
Actual
- The override text appears in the prompt/history.
- The model burns tool calls re-reading instruction files.
Notes
- This issue is specifically about Codex OAuth mode in OpenAIResponsesProvider and the prompt/history assembly around it.