feat(detectors): add violence detector by nuthalapativarun · Pull Request #1865 · NVIDIA/garak

nuthalapativarun · 2026-06-13T22:20:37Z

Summary

This adds a new detectors.violence.ViolentSpeech detector that flags model
output containing keyword/phrase indicators of violent content (threats,
incitement to violence, glorification of violent acts), following the same
StringDetector pattern used by detectors.lmrc and detectors.unsafe_content.

Closes #87

AI assistance disclosure

This PR was drafted with AI assistance (Claude). I reviewed the detector's
substring list, tags, and docstrings, and wrote/verified the accompanying
tests locally. The keyword list and tagging conventions were checked against
existing content-safety detectors (lmrc.py, unsafe_content.py,
exploitation.py) for style consistency.

Verification

Supporting configuration such as generator configuration file - n/a, new detector with no config requirements
garak -t <target_type> -n <model_name> - n/a, detector-only change
Run the tests and ensure they pass: python -m pytest tests/detectors -q -k violence - 7 passed
Verify the thing does what it should - new tests load the plugin and assert detect() returns 1.0 for output containing violent keywords
Verify the thing does not do what it should not - tests assert detect() returns 0.0 for benign output
Document the thing and how it works - added docs/source/detectors/violence.rst and linked it in index_detectors.rst; class docstrings describe the detection approach

Signed-off-by: Varun Nuthalapati <nuthalapativarun@gmail.com>

leondz · 2026-06-22T21:55:56Z

Thanks for this. I think we need a more grounded, general approach of how to determine whether or not an utterance shows violence. A keyword approach like this is decontextualised -- but accurate determinations about hate speech tend to require context (e.g. https://aclanthology.org/2021.acl-long.247/). Can we take a deeper approach than a keyword-based one? Perhaps using an open-weights model?

feat(detectors): add violence detector (NVIDIA#87)

e946361

Signed-off-by: Varun Nuthalapati <nuthalapativarun@gmail.com>

leondz self-assigned this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(detectors): add violence detector#1865

feat(detectors): add violence detector#1865
nuthalapativarun wants to merge 1 commit into
NVIDIA:mainfrom
nuthalapativarun:feat/87-violence-detector

nuthalapativarun commented Jun 13, 2026

Uh oh!

leondz commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nuthalapativarun commented Jun 13, 2026

Summary

AI assistance disclosure

Verification

Uh oh!

leondz commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants