Can current open source guardrails defend against internal agentic risks?

During the summer of 2025, one of the Builder's in Residence wanted to see if current open source guardrails could defend against specific internal agentic risks. This is because the risk surface of agents is greater than that of LLMs. In other words, we don't just have to worry about user inputs and model outputs. The communication between different components within agents contains risks as well.

The below describes how to run our experiments and analyze our data.

Environment Setup

We recommend using uv to setup your environment. Run the following command:

uv sync
source .venv/bin/activate

Running our experiments

For all of our experiments, we used an H100 GPU. However, a high powered MacBook Pro or other equivalent machine should be fine. To run our experiments, run jupyter lab on the command line (after activating the environment) and access the following notebooks:

IPIA_Experiments.ipynb
Function_Calling_Experiments.ipynb

Those will allow you rerun our experiments and obtain the raw results.

Analyzing our data

If you would rather read through our data and results, we recommend using our notebooks to get the metrics we produce in our blog post . To do so, run jupyter lab on the command line and access the following notebooks:

IPIA_Analysis.ipynb
Function_Calling_Analysis.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
Function_Calling_Analysis.ipynb		Function_Calling_Analysis.ipynb
Function_Calling_Experiment.ipynb		Function_Calling_Experiment.ipynb
IPIA_Analysis.ipynb		IPIA_Analysis.ipynb
IPIA_Experiments.ipynb		IPIA_Experiments.ipynb
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Can current open source guardrails defend against internal agentic risks?

Environment Setup

Running our experiments

Analyzing our data

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

mozilla-ai/bir_guardrail_experiments

Folders and files

Latest commit

History

Repository files navigation

Can current open source guardrails defend against internal agentic risks?

Environment Setup

Running our experiments

Analyzing our data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages