-
Notifications
You must be signed in to change notification settings - Fork 523
fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297
Conversation
c99c416
to
9e9d8f6
Compare
- Use processed $user_message instead of raw $event.final_transcript in LLM inputs. - Use processed $bot_message instead of raw $text in bot outputs. - Prevent sensitive or unfiltered data from reaching the LLM or users by correctly applying input/output rails transformations. Signed-off-by: Konstantin Lapine <[email protected]>
…cessed message handling in Colang 2 Signed-off-by: Konstantin Lapine <[email protected]>
9e9d8f6
to
97871a5
Compare
Signed-off-by: Konstantin Lapine <[email protected]>
In addition to the provided example and test, you can observe the unfixed behavior in the following branches (we’re planning to open PRs for these as well): |
Documentation preview |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1297 +/- ##
==========================================
Coverage ? 70.13%
==========================================
Files ? 161
Lines ? 16037
Branches ? 0
==========================================
Hits ? 11248
Misses ? 4789
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me!
Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance
and continuation on unhandled user utterance
that are affected by it.
@schuellc-nvidia, thank you for the review! Would it be OK if we add these tests in a follow-up? I'd like to take a closer look at the dialog flows first to ensure meaningful coverage. I believe the already included test verifies that the rails transformations are applied. We're relying on this behavior in upcoming PRs. Update: |
Yes, that's fine with me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lapinek , this is a critical security fix with solid implementation 👍🏻 It is ready to merge 🚀
Would you please gpg sign your commits following contributing guidelines?
Description
Currently, when input and output rails process (that is, transform or redact) user and bot messages, the processed versions are not used:
To reproduce, you can run this config against the current
develop
(requiresOPENAI_API_KEY
):poetry run nemoguardrails chat --config /path/to/config > Echo this: Tom
Expected output:
Actual output:
With patch provided in this PR, the output should be:
Related Issue(s)
Mentions
@schuellc-nvidia, @drazvan - since you’ve contributed most to the affected files, your review would be much appreciated!
Checklist