This repository is dedicated to benchmarking legal and privacy-related performance of generative AI models and is used to enable appropriate and effective use of AI to assist with legal and compliance matters.
Purpose of PrivacyBench
- Existing industry benchmarks, such as the
, provide strong indications of linguistic understanding but are not specifically tuned to measure legal, privacy, and compliance tasks.
- Existing benchmarks are not adequately specific to legal, privacy and compliance tasks.
Proposed Solution
- Develop a testing method for benchmarking performance in personal data redaction.
- Develop and report on LLM performance.
- Identify, in particular, LLM models that can be deployed locally and efficiently for maximum privacy and security and lowest cost.
- Encourage the community development of better tools through benchmarking.
Call for contributions
- This repository is open-sourced under MIT license and the code and testing process is free to use with appropriate credit attribution (subject to third-party licenses).
Specific Tasks
- The first use case selected to be benchmarked and tested is personal data detection and redaction; please see this task in this repo for additional details.
Trademark
- PrivacyBench is a trademark of Alex J. Wall.
About the models referenced in result tables
- The benchmark code, questions, methodology, and per-question result rows
in this repository are MIT-licensed (see
LICENSE). - Specific models appearing in those result tables are owned by their
respective publishers and licensed separately. The
caiioo-research/*models published by Six Cailloux, LLC. (the company that maintains this benchmark) are proprietary and not currently available for download — they appear in result tables as reference points, not as openly redistributable artifacts. - See
MODELS.mdfor the full list of models referenced, where each can be obtained, what license each falls under, and which are proprietary.