feat: add reasoning guardrail connector #1565

makeshn · 2026-01-05T22:47:14Z

Description

This PR adds a new Reasoning Guardrail Connector for the nvidia/Nemotron-Content-Safety-Reasoning-4B model.

Key Changes:

Added a new model type content_safety_reasoning that integrates with the Nemotron Content Safety Reasoning model
Added support for reasoning mode toggle (/think vs /no_think) via config option
Added new output parsers (nemotron_reasoning_parse_prompt_safety, nemotron_reasoning_parse_response_safety) for parsing the model's safety classification output
Added example configuration demonstrating content safety checks on both input and output flows as well as reasoning off and reasoning on modes.
Tutorial showing how to serve and setup the model via vLLM with OpenAI-compatible API

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

nemoguardrails/llm/providers/huggingface/streamers.py

github-actions · 2026-01-05T22:48:36Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1565

nemoguardrails/llm/taskmanager.py

greptile-apps · 2026-01-05T22:50:21Z

Greptile Summary

This PR adds a new Reasoning Guardrail Connector for the nvidia/Nemotron-Content-Safety-Reasoning-4B model, enabling content safety checks with optional reasoning traces.

Key changes:

Added two new output parsers (nemotron_reasoning_parse_prompt_safety and nemotron_reasoning_parse_response_safety) to parse the model's "harmful/unharmful" output format and strip <think> tags
Added ReasoningConfig to the configuration schema to allow toggling between reasoning on/off modes via rails.config.content_safety.reasoning.enabled
Exposed config.rails.config in Jinja2 template rendering context (in taskmanager.py:138) to allow templates to access configuration settings
Provided comprehensive example configuration demonstrating both input and output rails with the new model type
Added detailed tutorial documentation explaining deployment with vLLM and configuration options

Issue found:
The example prompts.yml file uses an undefined variable reasoning_mode instead of config.content_safety.reasoning.enabled, which will cause the template rendering to fail or default incorrectly. The documentation shows the correct usage.

Confidence Score: 3/5

This PR has a critical bug in the example configuration that will prevent the reasoning mode toggle from working correctly
The core implementation (output parsers, config schema, taskmanager changes) is solid and well-structured. However, the example prompts.yml contains a template variable error that references an undefined variable (reasoning_mode) instead of the correct path (config.content_safety.reasoning.enabled). This inconsistency between the example and documentation will cause confusion and broken functionality for users following the example.
examples/configs/content_safety_reasoning/prompts.yml requires fixes to use the correct template variable path

Important Files Changed

Filename	Overview
nemoguardrails/llm/output_parsers.py	Added two new output parsers (`nemotron_reasoning_parse_prompt_safety` and `nemotron_reasoning_parse_response_safety`) with helper functions for parsing Nemotron Content Safety Reasoning model outputs
nemoguardrails/llm/taskmanager.py	Registered new output parsers and exposed `config.rails.config` object in template rendering context to enable access to configuration settings
nemoguardrails/rails/llm/config.py	Added `ReasoningConfig` model to support reasoning mode toggle for content safety models
examples/configs/content_safety_reasoning/prompts.yml	Prompt templates for content safety checks - uses incorrect variable `reasoning_mode` instead of `config.content_safety.reasoning.enabled`

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant TaskManager
    participant ContentSafetyModel as Nemotron-4B
    participant MainLLM
    
    User->>LLMRails: generate(message)
    
    Note over LLMRails,TaskManager: Input Rail Flow
    LLMRails->>TaskManager: render_task_prompt("content_safety_check_input")
    TaskManager->>TaskManager: Apply template with config.content_safety.reasoning.enabled
    TaskManager-->>LLMRails: Prompt with /think or /no_think
    
    LLMRails->>ContentSafetyModel: Check input safety
    ContentSafetyModel-->>LLMRails: "Prompt harm: harmful/unharmful"
    
    LLMRails->>TaskManager: parse_task_output(output)
    TaskManager->>TaskManager: nemotron_reasoning_parse_prompt_safety()
    TaskManager->>TaskManager: Strip <think> tags, extract harm value
    TaskManager-->>LLMRails: [True] or [False]
    
    alt Input is safe
        LLMRails->>MainLLM: Generate response
        MainLLM-->>LLMRails: Response text
        
        Note over LLMRails,TaskManager: Output Rail Flow
        LLMRails->>TaskManager: render_task_prompt("content_safety_check_output")
        TaskManager-->>LLMRails: Prompt with /think or /no_think
        
        LLMRails->>ContentSafetyModel: Check output safety
        ContentSafetyModel-->>LLMRails: "Response Harm: harmful/unharmful"
        
        LLMRails->>TaskManager: parse_task_output(output)
        TaskManager->>TaskManager: nemotron_reasoning_parse_response_safety()
        TaskManager-->>LLMRails: [True] or [False]
        
        alt Output is safe
            LLMRails-->>User: Response
        else Output is unsafe
            LLMRails-->>User: "I'm sorry, I can't respond to that."
        end
    else Input is unsafe
        LLMRails-->>User: "I'm sorry, I can't respond to that."
    end

greptile-apps

_{9 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

examples/configs/content_safety_reasoning/prompts.yml

codecov · 2026-01-05T22:54:23Z

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...oguardrails/llm/providers/huggingface/streamers.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Makesh Sreedhar <[email protected]>

Include the `reasoning_enabled` flag from the content safety config in the context passed to both input and output content safety check prompts. This enables prompt templates to conditionally adjust behavior based on the reasoning setting. Removes redundant config exposure in LLMTaskManager's default context.

examples/configs/content_safety_reasoning/config.yml

Pouyanpi

Thanks a lot @makeshn , this is great 🚀

I just made a minor change based on my comment above and added tests. Let me know if you see any issues, we can revert/change.

Please look at the tests to ensure the assertions are aligned with you expectations. As I cannot run it myself, would you please to run it and pass some inputs to see everything is still working. Thanks 👍🏻

trebedea

Looks good, some suggestions.

docs/getting-started/tutorials/index.md

docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md

examples/configs/content_safety_reasoning/config.yml

examples/configs/content_safety_reasoning/demo.py

Pouyanpi · 2026-01-09T09:36:15Z

@makeshn you're right about streamers.py (I'm going to revert this commit)

@tgasser-nv you can reproduce the type checking bug on develop :

poetry install --with dev
poetry run pip install transformers

and when you do

poetry run pyright

or

poetry run pre-commit run --all-files

you will see

WARNING: there is a new pyright version available (v1.1.405 -> v1.1.408).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`

//nemoguardrails-project/develop/nemoguardrails/llm/providers/huggingface/streamers.py
  //nemoguardrails-project/develop/nemoguardrails/llm/providers/huggingface/streamers.py:22:9 - error: Type "type[transformers.generation.streamers.TextStreamer]" is not assignable to declared type "type[nemoguardrails.llm.providers.huggingface.streamers.TextStreamer]"
    "transformers.generation.streamers.TextStreamer" is not assignable to "nemoguardrails.llm.providers.huggingface.streamers.TextStreamer"
    Type "type[transformers.generation.streamers.TextStreamer]" is not assignable to type "type[nemoguardrails.llm.providers.huggingface.streamers.TextStreamer]" (reportAssignmentType)
//nemoguardrails-project/develop/nemoguardrails/tracing/adapters/filesystem.py
  //nemoguardrails-project/develop/nemoguardrails/tracing/adapters/filesystem.py:63:20 - warning: Import "aiofiles" could not be resolved from source (reportMissingModuleSource)
1 error, 1 warning, 0 informations

This reverts commit e916dd4.

Signed-off-by: Makesh Sreedhar <[email protected]>

makeshn requested review from Pouyanpi, cparisien, tgasser-nv and trebedea January 5, 2026 22:47

makeshn self-assigned this Jan 5, 2026

makeshn commented Jan 5, 2026

View reviewed changes

nemoguardrails/llm/providers/huggingface/streamers.py Show resolved Hide resolved

makeshn commented Jan 5, 2026

View reviewed changes

nemoguardrails/llm/taskmanager.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

examples/configs/content_safety_reasoning/prompts.yml Outdated Show resolved Hide resolved

examples/configs/content_safety_reasoning/prompts.yml Outdated Show resolved Hide resolved

Pouyanpi changed the title ~~add reasoning guardrail connector~~ feat: add reasoning guardrail connector Jan 6, 2026

Pouyanpi force-pushed the feature/reasoning_guardrail_connector branch from e36d9fc to 4c134db Compare January 8, 2026 10:19

makeshn and others added 5 commits January 8, 2026 11:22

add reasoning guardrail connector

643e554

Signed-off-by: Makesh Sreedhar <[email protected]>

update reasoning guardrail prompt

2be40c9

Signed-off-by: Makesh Sreedhar <[email protected]>

revert streamers.py change

e916dd4

test(content_safety): add Nemotron reasoning parser tests

921f1d7

Pouyanpi force-pushed the feature/reasoning_guardrail_connector branch from 4c134db to 921f1d7 Compare January 8, 2026 10:24

Pouyanpi reviewed Jan 8, 2026

View reviewed changes

examples/configs/content_safety_reasoning/config.yml Show resolved Hide resolved

Pouyanpi self-requested a review January 8, 2026 10:30

Pouyanpi approved these changes Jan 8, 2026

View reviewed changes

Pouyanpi added the enhancement New feature or request label Jan 8, 2026

Pouyanpi added this to the 0.20.0 milestone Jan 8, 2026

trebedea approved these changes Jan 8, 2026

View reviewed changes

Pouyanpi and others added 3 commits January 9, 2026 10:37

Revert "revert streamers.py change"

030edef

This reverts commit e916dd4.

fix: add type ignore for streamer import

5a66ed3

minor updates to docs, example

b009a00

Signed-off-by: Makesh Sreedhar <[email protected]>

trebedea merged commit b6f4e6c into develop Jan 13, 2026
10 checks passed

trebedea deleted the feature/reasoning_guardrail_connector branch January 13, 2026 18:33

feat: add reasoning guardrail connector #1565

feat: add reasoning guardrail connector #1565

Uh oh!

Conversation

makeshn commented Jan 5, 2026

Description

Key Changes:

Checklist

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2026

Documentation preview

Uh oh!

Uh oh!

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Pouyanpi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trebedea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pouyanpi commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jan 5, 2026 •

edited

Loading

Pouyanpi left a comment •

edited

Loading