-
Notifications
You must be signed in to change notification settings - Fork 586
feat: add reasoning guardrail connector #1565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation preview |
Greptile SummaryThis PR adds a new Reasoning Guardrail Connector for the Key changes:
Issue found:
|
| Filename | Overview |
|---|---|
| nemoguardrails/llm/output_parsers.py | Added two new output parsers (nemotron_reasoning_parse_prompt_safety and nemotron_reasoning_parse_response_safety) with helper functions for parsing Nemotron Content Safety Reasoning model outputs |
| nemoguardrails/llm/taskmanager.py | Registered new output parsers and exposed config.rails.config object in template rendering context to enable access to configuration settings |
| nemoguardrails/rails/llm/config.py | Added ReasoningConfig model to support reasoning mode toggle for content safety models |
| examples/configs/content_safety_reasoning/prompts.yml | Prompt templates for content safety checks - uses incorrect variable reasoning_mode instead of config.content_safety.reasoning.enabled |
Sequence Diagram
sequenceDiagram
participant User
participant LLMRails
participant TaskManager
participant ContentSafetyModel as Nemotron-4B
participant MainLLM
User->>LLMRails: generate(message)
Note over LLMRails,TaskManager: Input Rail Flow
LLMRails->>TaskManager: render_task_prompt("content_safety_check_input")
TaskManager->>TaskManager: Apply template with config.content_safety.reasoning.enabled
TaskManager-->>LLMRails: Prompt with /think or /no_think
LLMRails->>ContentSafetyModel: Check input safety
ContentSafetyModel-->>LLMRails: "Prompt harm: harmful/unharmful"
LLMRails->>TaskManager: parse_task_output(output)
TaskManager->>TaskManager: nemotron_reasoning_parse_prompt_safety()
TaskManager->>TaskManager: Strip <think> tags, extract harm value
TaskManager-->>LLMRails: [True] or [False]
alt Input is safe
LLMRails->>MainLLM: Generate response
MainLLM-->>LLMRails: Response text
Note over LLMRails,TaskManager: Output Rail Flow
LLMRails->>TaskManager: render_task_prompt("content_safety_check_output")
TaskManager-->>LLMRails: Prompt with /think or /no_think
LLMRails->>ContentSafetyModel: Check output safety
ContentSafetyModel-->>LLMRails: "Response Harm: harmful/unharmful"
LLMRails->>TaskManager: parse_task_output(output)
TaskManager->>TaskManager: nemotron_reasoning_parse_response_safety()
TaskManager-->>LLMRails: [True] or [False]
alt Output is safe
LLMRails-->>User: Response
else Output is unsafe
LLMRails-->>User: "I'm sorry, I can't respond to that."
end
else Input is unsafe
LLMRails-->>User: "I'm sorry, I can't respond to that."
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 2 comments
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
e36d9fc to
4c134db
Compare
Signed-off-by: Makesh Sreedhar <[email protected]>
Signed-off-by: Makesh Sreedhar <[email protected]>
Include the `reasoning_enabled` flag from the content safety config in the context passed to both input and output content safety check prompts. This enables prompt templates to conditionally adjust behavior based on the reasoning setting. Removes redundant config exposure in LLMTaskManager's default context.
4c134db to
921f1d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @makeshn , this is great 🚀
I just made a minor change based on my comment above and added tests. Let me know if you see any issues, we can revert/change.
Please look at the tests to ensure the assertions are aligned with you expectations. As I cannot run it myself, would you please to run it and pass some inputs to see everything is still working. Thanks 👍🏻
trebedea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some suggestions.
docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md
Outdated
Show resolved
Hide resolved
docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md
Show resolved
Hide resolved
docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md
Outdated
Show resolved
Hide resolved
|
@makeshn you're right about streamers.py (I'm going to revert this commit) @tgasser-nv you can reproduce the type checking bug on develop : and when you do or you will see |
This reverts commit e916dd4.
Signed-off-by: Makesh Sreedhar <[email protected]>
Description
This PR adds a new Reasoning Guardrail Connector for the nvidia/Nemotron-Content-Safety-Reasoning-4B model.
Key Changes:
Checklist