PrivacyBench

This repository is dedicated to benchmarking legal and privacy-related performance of generative AI models and is used to enable appropriate and effective use of AI to assist with legal and compliance matters.

Purpose of PrivacyBench

Existing industry benchmarks, such as the , provide strong indications of linguistic understanding but are not specifically tuned to measure legal, privacy, and compliance tasks.
Existing benchmarks are not adequately specific to legal, privacy and compliance tasks.

Proposed Solution

Develop a testing method for benchmarking performance in personal data redaction.
Develop and report on LLM performance.
Identify, in particular, LLM models that can be deployed locally and efficiently for maximum privacy and security and lowest cost.
Encourage the community development of better tools through benchmarking.

Call for contributions

This repository is open-sourced under MIT license and the code and testing process is free to use with appropriate credit attribution (subject to third-party licenses).

Specific Tasks

The first use case selected to be benchmarked and tested is personal data detection and redaction; please see this task in this repo for additional details.

Trademark

PrivacyBench is a trademark of Alex J. Wall.

About the models referenced in result tables

The benchmark code, questions, methodology, and per-question result rows in this repository are MIT-licensed (see LICENSE).
Specific models appearing in those result tables are owned by their respective publishers and licensed separately. The caiioo-research/* models published by Six Cailloux, LLC. (the company that maintains this benchmark) are proprietary and not currently available for download — they appear in result tables as reference points, not as openly redistributable artifacts.
See MODELS.md for the full list of models referenced, where each can be obtained, what license each falls under, and which are proprietary.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
archive		archive
bias_study		bias_study
contract_summarization		contract_summarization
hipaa_lds_task		hipaa_lds_task
pii_redaction_heldout_2026		pii_redaction_heldout_2026
pii_redaction_task		pii_redaction_task
.gitignore		.gitignore
LICENSE		LICENSE
MODELS.md		MODELS.md
NOTICE		NOTICE
README.md		README.md
find_null_responses_in_logs.py		find_null_responses_in_logs.py
output_results.json		output_results.json
privacybench_PII_filtering_task.json		privacybench_PII_filtering_task.json
privacybenchrollup.py		privacybenchrollup.py
requirements.txt		requirements.txt
responsetext.json		responsetext.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrivacyBench

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PrivacyBench

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages