RobustCBRN Eval

Toolkit to robustify CBRN MCQA benchmarks: consensus/shortcut detection, verified cloze variants, statistical bias battery; deterministic and fail‑graceful.

Python 3.10+, MIT

What it is

Practical toolkit to evaluate and improve robustness of AI models on CBRN (Chemical, Biological, Radiological, Nuclear) multiple‑choice QA. Implements choices‑only consensus screens, verified cloze scoring, and a heuristics battery (position bias, longest‑answer), with reproducible, fail‑graceful execution.

Quick Start

Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh
Create venv and install deps:
- uv venv && uv pip install -r requirements.txt
Run a sample:
- make setup && make sample
Full pipeline (recommended):
- make pipeline

More pipeline options are documented in scripts/PIPELINE_README.md.

Docs

Overview and rationale: overview.md
Getting started and CLI usage: docs/getting-started/usage.md
Architecture: docs/architecture/architecture.md
Security & release policy: docs/safety/security-considerations.md
Results/report templates: docs/results/
Full docs index: docs/README.md

Development

Lint: .venv/bin/ruff check robustcbrn tests
Tests: .venv/bin/pytest -q
Pre‑commit hooks:
- pip install pre-commit
- pre-commit install (or bash scripts/install-hooks.sh)

Notes

Cross‑platform pipeline (Windows/macOS/Linux) with robust error handling.
Public artifacts are sanitized (no raw questions/choices or per‑item exploit labels). See scripts/validate_release.sh.

Release Checklist: docs/safety/release-checklist.md

Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

Code of conduct
Development workflow
Commit message conventions
Pull request process
Testing requirements

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this toolkit in your research, please cite:

@software{robustcbrn-eval,
  title = {RobustCBRN Eval: Toolkit for Robustifying CBRN AI Benchmarks},
  author = {[Authors]},
  year = {2024},
  url = {https://github.com/apart-research/robustcbrn-eval}
}

Safety & Release Policy

See docs/safety/security-considerations.md for anonymization and public artifact rules.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/workflows		.github/workflows
coding-tasks		coding-tasks
configs		configs
data		data
docs		docs
hackathon-context		hackathon-context
logs_runner_smoke		logs_runner_smoke
robustcbrn		robustcbrn
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
nextsteps.md		nextsteps.md
overview.md		overview.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
test_data_pipeline.py		test_data_pipeline.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RobustCBRN Eval

What it is

Quick Start

Docs

Development

Notes

Contributing

License

Citation

Safety & Release Policy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RobustCBRN Eval

What it is

Quick Start

Docs

Development

Notes

Contributing

License

Citation

Safety & Release Policy

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages