fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297

lapinek · 2025-07-18T22:44:15Z

Description

Currently, when input and output rails process (that is, transform or redact) user and bot messages, the processed versions are not used:

LLM flows use original $event.final_transcript instead of processed $user_message.
Bot utterances use the original $text instead of processed $bot_message.
This allowed sensitive, unfiltered data to leak to LLM and users despite guardrails.

To reproduce, you can run this config against the current develop (requires OPENAI_API_KEY):

#config.yml

colang_version: "2.x"

models:
  - type: main
    engine: openai
    model: gpt-4o-mini

# main

import core
import llm
import guardrails

flow input rails $input_text
    global $user_message
    $user_message = "{$input_text}, Dick"

flow output rails $output_text
    global $bot_message
    $bot_message = "{$output_text}, and Harry"

flow main
  activate llm continuation

poetry run nemoguardrails chat --config /path/to/config
> Echo this: Tom

Expected output:

Tom, Dick, and Harry

Actual output:

Tom

With patch provided in this PR, the output should be:

Tom, Dick., and Harry

Related Issue(s)

Could be related to: bug: Cannot sanitize user input #882

Mentions

@schuellc-nvidia, @drazvan - since you’ve contributed most to the affected files, your review would be much appreciated!

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

- Use processed $user_message instead of raw $event.final_transcript in LLM inputs. - Use processed $bot_message instead of raw $text in bot outputs. - Prevent sensitive or unfiltered data from reaching the LLM or users by correctly applying input/output rails transformations. Signed-off-by: Konstantin Lapine <[email protected]>

…cessed message handling in Colang 2 Signed-off-by: Konstantin Lapine <[email protected]>

Signed-off-by: Konstantin Lapine <[email protected]>

lapinek · 2025-07-21T23:28:23Z

In addition to the provided example and test, you can observe the unfixed behavior in the following branches (we’re planning to open PRs for these as well):

schuellc-nvidia · 2025-07-22T06:38:48Z

Thank you @lapinek, will take a look!
@Pouyanpi Can you also take a look at this!

github-actions · 2025-07-22T06:40:53Z

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1297

codecov-commenter · 2025-07-22T06:43:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (develop@0d6fa42). Learn more about missing BASE report.
Report is 2 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #1297   +/-   ##
==========================================
  Coverage           ?   70.13%           
==========================================
  Files              ?      161           
  Lines              ?    16037           
  Branches           ?        0           
==========================================
  Hits               ?    11248           
  Misses             ?     4789           
  Partials           ?        0

Flag	Coverage Δ
python	`70.13% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

schuellc-nvidia

Changes look good to me!

Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance and continuation on unhandled user utterance that are affected by it.

lapinek · 2025-07-22T17:55:01Z

Changes look good to me!

Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance and continuation on unhandled user utterance that are affected by it.

@schuellc-nvidia, thank you for the review!

Would it be OK if we add these tests in a follow-up? I'd like to take a closer look at the dialog flows first to ensure meaningful coverage. I believe the already included test verifies that the rails transformations are applied. We're relying on this behavior in upcoming PRs.

Update:

schuellc-nvidia · 2025-08-04T07:23:21Z

Changes look good to me!
Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance and continuation on unhandled user utterance that are affected by it.

@schuellc-nvidia, thank you for the review!

Would it be OK if we add these tests in a follow-up? I'd like to take a closer look at the dialog flows first to ensure meaningful coverage. I believe the already included test verifies that the rails transformations are applied. We're relying on this behavior in upcoming PRs.

Update:

feat: add Pangea AI Guard community integration #1300

Add Colang v2 example for sensitive data detection #1301

Yes, that's fine with me.

Pouyanpi

Thank you @lapinek , this is a critical security fix with solid implementation 👍🏻 It is ready to merge 🚀

Would you please gpg sign your commits following contributing guidelines?

lapinek force-pushed the feature/fix-v2-transformations branch from c99c416 to 9e9d8f6 Compare July 19, 2025 01:30

lapinek added 2 commits July 19, 2025 15:02

docs(changelog): Update CHANGELOG-Colang.md to reflect fixed rail-pro…

97871a5

…cessed message handling in Colang 2 Signed-off-by: Konstantin Lapine <[email protected]>

lapinek force-pushed the feature/fix-v2-transformations branch from 9e9d8f6 to 97871a5 Compare July 19, 2025 23:06

test(colang2): Add test to confirm rail-processed messages are used

c1471c1

Signed-off-by: Konstantin Lapine <[email protected]>

schuellc-nvidia requested review from Pouyanpi and schuellc-nvidia July 22, 2025 06:39

schuellc-nvidia approved these changes Jul 22, 2025

View reviewed changes

lapinek mentioned this pull request Jul 23, 2025

Add Colang v2 example for sensitive data detection #1301

Open

4 tasks

Pouyanpi added this to the v0.16.0 milestone Aug 1, 2025

Pouyanpi approved these changes Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297

fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297

lapinek commented Jul 18, 2025 •

edited

Loading

Uh oh!

lapinek commented Jul 21, 2025

Uh oh!

schuellc-nvidia commented Jul 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

codecov-commenter commented Jul 22, 2025

Uh oh!

schuellc-nvidia left a comment

Uh oh!

lapinek commented Jul 22, 2025 •

edited

Loading

Uh oh!

schuellc-nvidia commented Aug 4, 2025

Uh oh!

Pouyanpi left a comment •

edited

Loading

Uh oh!

Uh oh!

fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297

Are you sure you want to change the base?

fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297

Conversation

lapinek commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Mentions

Checklist

Uh oh!

lapinek commented Jul 21, 2025

Uh oh!

schuellc-nvidia commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Documentation preview

Uh oh!

codecov-commenter commented Jul 22, 2025

Codecov Report

Uh oh!

schuellc-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

lapinek commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schuellc-nvidia commented Aug 4, 2025

Uh oh!

Pouyanpi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lapinek commented Jul 18, 2025 •

edited

Loading

schuellc-nvidia commented Jul 22, 2025 •

edited

Loading

lapinek commented Jul 22, 2025 •

edited

Loading

Pouyanpi left a comment •

edited

Loading