Skip to content

fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

lapinek
Copy link

@lapinek lapinek commented Jul 18, 2025

Description

Currently, when input and output rails process (that is, transform or redact) user and bot messages, the processed versions are not used:

  • LLM flows use original $event.final_transcript instead of processed $user_message.
  • Bot utterances use the original $text instead of processed $bot_message.
  • This allowed sensitive, unfiltered data to leak to LLM and users despite guardrails.

To reproduce, you can run this config against the current develop (requires OPENAI_API_KEY):

#config.yml

colang_version: "2.x"

models:
  - type: main
    engine: openai
    model: gpt-4o-mini
# main

import core
import llm
import guardrails

flow input rails $input_text
    global $user_message
    $user_message = "{$input_text}, Dick"

flow output rails $output_text
    global $bot_message
    $bot_message = "{$output_text}, and Harry"

flow main
  activate llm continuation
poetry run nemoguardrails chat --config /path/to/config
> Echo this: Tom  

Expected output:

Tom, Dick, and Harry

Actual output:

Tom

With patch provided in this PR, the output should be:

Tom, Dick., and Harry

Related Issue(s)

Mentions

@schuellc-nvidia, @drazvan - since you’ve contributed most to the affected files, your review would be much appreciated!

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@lapinek lapinek force-pushed the feature/fix-v2-transformations branch from c99c416 to 9e9d8f6 Compare July 19, 2025 01:30
lapinek added 2 commits July 19, 2025 15:02
- Use processed $user_message instead of raw $event.final_transcript in LLM inputs.
- Use processed $bot_message instead of raw $text in bot outputs.
- Prevent sensitive or unfiltered data from reaching the LLM or users by correctly applying input/output rails transformations.

Signed-off-by: Konstantin Lapine <[email protected]>
…cessed message handling in Colang 2

Signed-off-by: Konstantin Lapine <[email protected]>
@lapinek lapinek force-pushed the feature/fix-v2-transformations branch from 9e9d8f6 to 97871a5 Compare July 19, 2025 23:06
@lapinek
Copy link
Author

lapinek commented Jul 21, 2025

In addition to the provided example and test, you can observe the unfixed behavior in the following branches (we’re planning to open PRs for these as well):

@schuellc-nvidia
Copy link
Collaborator

schuellc-nvidia commented Jul 22, 2025

Thank you @lapinek, will take a look!
@Pouyanpi Can you also take a look at this!

Copy link
Contributor

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1297

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (develop@0d6fa42). Learn more about missing BASE report.
Report is 2 commits behind head on develop.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #1297   +/-   ##
==========================================
  Coverage           ?   70.13%           
==========================================
  Files              ?      161           
  Lines              ?    16037           
  Branches           ?        0           
==========================================
  Hits               ?    11248           
  Misses             ?     4789           
  Partials           ?        0           
Flag Coverage Δ
python 70.13% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@schuellc-nvidia schuellc-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance and continuation on unhandled user utterance that are affected by it.

@lapinek
Copy link
Author

lapinek commented Jul 22, 2025

Changes look good to me!

Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance and continuation on unhandled user utterance that are affected by it.

@schuellc-nvidia, thank you for the review!

Would it be OK if we add these tests in a follow-up? I'd like to take a closer look at the dialog flows first to ensure meaningful coverage. I believe the already included test verifies that the rails transformations are applied. We're relying on this behavior in upcoming PRs.

Update:

@Pouyanpi Pouyanpi added this to the v0.16.0 milestone Aug 1, 2025
@schuellc-nvidia
Copy link
Collaborator

Changes look good to me!
Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance and continuation on unhandled user utterance that are affected by it.

@schuellc-nvidia, thank you for the review!

Would it be OK if we add these tests in a follow-up? I'd like to take a closer look at the dialog flows first to ensure meaningful coverage. I believe the already included test verifies that the rails transformations are applied. We're relying on this behavior in upcoming PRs.

Update:

Yes, that's fine with me.

Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lapinek , this is a critical security fix with solid implementation 👍🏻 It is ready to merge 🚀

Would you please gpg sign your commits following contributing guidelines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants