Train a smaller LLM or a BERT-style model for the PII detection pipeline #250

simonaszilinskas · 2026-01-12T10:53:17Z

simonaszilinskas
Jan 12, 2026
Maintainer

The current anonymisation / PII pipeline lives in:
utils/topics_pii.py

Today we rely on LLM-as-a-judge to decide whether a conversation contains PII.

If a conversation is flagged as containing PII, we do not export it at all, nor any linked votes or reactions.
The internal dataset conversations_raw keeps everything, including PII.
We do not perform PII replacement or redaction.

This design is intentional and conservative, but has limitations.

We explored PII replacement, not just detection, but abandoned it:

Google Data Loss Prevention (DLP)
Too many false positives for our data.
Presidio / spaCy-based detection
Too many false positives and false negatives.
LLM-generated redacted conversations
The LLM kept injecting extra content (explanations, formatting, meta text).
This might be salvageable by enforcing fenced output, but it was not robust enough at the time.

Because of this, we settled on binary classification only: PII or not, based on an LLM with structured outputs.

Cost
- Uses a large LLM for a simple classification task.
- Currently relies on free credits; paid usage would be expensive.
Model is overkill
- The task is essentially “does this conversation contain PII?”
- A large generative model is likely unnecessary.
False negatives exist
- Very few, but non-zero.
- Any false negative risks PII leaking into exported datasets.

Improve the current pipeline while keeping it simple and conservative:

Either