Dev/steven/agent convo #56

steven10a · 2025-11-19T19:32:51Z

Correctly pass conversation history to guardrails when using Agents
Updated JB system prompt with banned content (i.e, system prompt, internal details)
Updated tests
Updated eval async running

Copilot

Pull Request Overview

This PR improves conversation history handling in guardrails, particularly for Agents, and updates the Jailbreak guardrail's system prompt. The key changes ensure that conversation-aware guardrails (like Jailbreak) receive proper conversation history from agent sessions, while also supporting mixed evaluation scenarios where both conversation-aware and non-conversation-aware guardrails are used together.

Fixes conversation history passing to guardrails in Agent contexts
Updates Jailbreak system prompt with explicit banned content categories
Refactors eval engine to handle mixed guardrail types (conversation-aware and non-conversation-aware)

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/test_agents.py	Added tests verifying conversation history is properly passed to guardrails from agent sessions
tests/unit/evals/test_async_engine.py	Added test for mixed conversation-aware and non-conversation-aware guardrail evaluation
src/guardrails/evals/core/async_engine.py	Refactored to evaluate both types of guardrails separately and combined results; renamed annotation function for clarity
src/guardrails/client.py	Simplified logic to always create conversation context when history is present, not just for specific guardrails
src/guardrails/checks/text/jailbreak.py	Added explicit banned content categories to system prompt
src/guardrails/agents.py	Updated to load and pass conversation history to all guardrails that need it

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/guardrails/checks/text/jailbreak.py

src/guardrails/evals/core/async_engine.py

tests/unit/evals/test_async_engine.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/guardrails/agents.py

steven10a · 2025-11-19T19:56:44Z

@codex review

Copilot

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/guardrails/evals/core/async_engine.py:93

The function documentation describes it as being for "prompt injection detection", but the function has been renamed to be more generic (_annotate_incremental_result) and is now used for all conversation-aware guardrails, not just prompt injection detection. The documentation should be updated to reflect this generalization.

Consider updating the docstring description to remove the outdated reference and make it clear this applies to any conversation-aware guardrail being evaluated incrementally.

def _annotate_incremental_result(
    result: Any,
    turn_index: int,
    message: dict[str, Any] | Any,
) -> None:
    """Annotate guardrail result with incremental evaluation metadata.

    Adds turn-by-turn context to results from conversation-aware guardrails
    being evaluated incrementally. This includes the turn index, role, and
    message that triggered the guardrail (if applicable).

    Args:
        result: GuardrailResult to annotate
        turn_index: Index of the conversation turn (0-based)
        message: Message object being evaluated (dict or object format)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/guardrails/evals/core/async_engine.py

steven10a · 2025-11-19T20:06:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/guardrails/agents.py

steven10a · 2025-11-19T20:29:12Z

@codex review

chatgpt-codex-connector · 2025-11-19T20:34:05Z

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

steven10a added 4 commits November 19, 2025 14:00

Proper conversation handling with Agents

1a61d10

Remove duplicated code

ad0bcc7

Extract user messages for non-convo aware evals

d8d9b99

Fix logic on which guardrails to eval

0443b91

Copilot AI review requested due to automatic review settings November 19, 2025 19:32

Copilot started reviewing on behalf of steven10a November 19, 2025 19:33 View session

Copilot finished reviewing on behalf of steven10a November 19, 2025 19:34

Copilot AI reviewed Nov 19, 2025

View reviewed changes

src/guardrails/checks/text/jailbreak.py Show resolved Hide resolved

src/guardrails/evals/core/async_engine.py Outdated Show resolved Hide resolved

tests/unit/evals/test_async_engine.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 19, 2025

View reviewed changes

src/guardrails/agents.py Show resolved Hide resolved

Pass kwargs to Agent with context

3c4ad78

steven10a requested a review from Copilot November 19, 2025 19:56

Copilot started reviewing on behalf of steven10a November 19, 2025 19:57 View session

Copilot finished reviewing on behalf of steven10a November 19, 2025 20:01

Copilot AI reviewed Nov 19, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 19, 2025

View reviewed changes

src/guardrails/evals/core/async_engine.py Show resolved Hide resolved

Extract content parts in eval

d5ee6e7

chatgpt-codex-connector bot reviewed Nov 19, 2025

View reviewed changes

src/guardrails/agents.py Outdated Show resolved Hide resolved

Only pass conv history to those that need it

4a251be

steven10a requested a review from gabor-openai November 19, 2025 20:40

gabor-openai approved these changes Nov 19, 2025

View reviewed changes

gabor-openai merged commit 06c1018 into main Nov 19, 2025
3 checks passed

gabor-openai deleted the dev/steven/agent_convo branch November 19, 2025 21:39

Dev/steven/agent convo #56

Dev/steven/agent convo #56

Conversation

steven10a commented Nov 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

steven10a commented Nov 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

steven10a commented Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

steven10a commented Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants