Skip to content
This repository was archived by the owner on Jul 23, 2025. It is now read-only.

Commit e8fe1d1

Browse files
committed
Add PII redaction
1 parent 3e9b1be commit e8fe1d1

File tree

7 files changed

+65
-35
lines changed

7 files changed

+65
-35
lines changed

docs/about/changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,11 @@ Major features and changes are noted here. To review all updates, see the
1313

1414
Related: [Upgrade CodeGate](../how-to/install.md#upgrade-codegate)
1515

16+
- **PII redaction:** - 10 Feb, 2025\
17+
Starting with v0.1.18, CodeGate now redacts personally identifiable
18+
information (PII) found in LLM prompts and context. See the
19+
[feature page](../features/secrets-encryption.md) to learn more.
20+
1621
- **Model muxing** - 7 Feb, 2025\
1722
With CodeGate v0.1.17 you can use the new `/v1/mux` endpoint to configure
1823
model selection based on your workspace! Learn more in the

docs/features/dependency-risk.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,14 @@
11
---
22
title: Dependency risk awareness
33
description: Protection from malicious or vulnerable dependencies
4-
sidebar_position: 20
54
---
65

76
## What's the risk?
87

98
The large language models (LLMs) that drive AI coding assistants are incredibly
109
costly and time-consuming to train. That's why each one has a "knowledge cutoff
1110
date" which is often months or even years in the past. For example, GPT-4o's
12-
training cutoff was October 2023\.
11+
training cutoff was October 2023.
1312

1413
But the open source software ecosystem moves quickly, and so do malicious actors
1514
seeking to exploit the software supply chain. LLMs often suggest outdated,

docs/features/muxing.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
title: Model muxing
33
description: Configure a per-workspace LLM
4-
sidebar_position: 35
54
---
65

76
## Overview
Lines changed: 55 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,90 @@
11
---
2-
title: Secrets encryption
2+
title: Secrets encryption and PII redaction
33
description: Keep your secrets a secret
4-
sidebar_position: 10
54
---
65

76
## What's the risk?
87

9-
As you interact with an AI coding assistant, sensitive data like passwords and
10-
access tokens can be unintentionally exposed to third-party providers through
11-
the code snippets and files you share as context. These secrets may become part
12-
of the training data used to improve the AI model and potentially be exposed to
13-
other users.
8+
As you interact with an AI coding assistant, sensitive data like passwords
9+
access tokens, and even personally identifiable information (PII) can be
10+
unintentionally exposed to third-party providers through the code and files you
11+
share as context. Besides the privacy and regulatory implications of exposing
12+
this information, it may become part of the AI model's training data and
13+
potentially be exposed to future users.
1414

1515
## How CodeGate helps
1616

1717
CodeGate helps you protect sensitive information from being accidentally exposed
1818
to AI models and third-party AI provider systems by redacting detected secrets
19-
from your prompts using encryption.
19+
and PII found in your prompts.
2020

2121
## How it works
2222

23-
CodeGate automatically scans all prompts for secrets such as:
23+
CodeGate automatically scans all prompts for secrets and PII. This happens
24+
transparently without requiring a specific prompt. Without interrupting your
25+
development flow, CodeGate protects your data by encrypting secrets and
26+
anonymizing PII. These changes are made before the prompt is sent to the LLM and
27+
are restored when the result is returned to your machine.
2428

25-
- API keys and tokens
26-
- Private keys and certificates
27-
- Database credentials
28-
- SSH keys
29-
- Cloud provider credentials
30-
31-
This scan happens transparently without requiring a specific prompt.
29+
When a secret or PII is detected, CodeGate adds a message to the LLM's output
30+
and an alert is recorded in the [dashboard](../how-to/dashboard.md) (PII alerts
31+
in the dashboard are coming soon).
3232

3333
:::info
3434

3535
Since CodeGate runs locally, your secrets never leave your system unprotected.
3636

3737
:::
3838

39-
CodeGate transparently encrypts secrets before sending the prompt to the LLM.
40-
This way, CodeGate protects your sensitive data without blocking your
41-
development flow. This is performed on the fly using AES256-GCM encryption with
42-
a temporary per-session key that is securely erased from memory after the
43-
response is delivered to your plugin.
44-
4539
```mermaid
4640
sequenceDiagram
4741
participant Client as AI coding<br>assistant
4842
participant CodeGate as CodeGate<br>(local)
4943
participant LLM as AI model<br>(remote)
5044
51-
Client ->> CodeGate: Prompt with<br>plaintext secrets
45+
Client ->> CodeGate: Prompt with<br>plaintext secrets/PII
5246
activate CodeGate
53-
CodeGate ->> LLM: Prompt with<br>encrypted secrets
47+
CodeGate ->> LLM: Prompt with<br>redacted secrets/PII
5448
deactivate CodeGate
5549
activate LLM
56-
note right of LLM: LLM only sees<br>encrypted values
57-
LLM -->> CodeGate: Response with<br>encrypted secrets
50+
note right of LLM: LLM only sees<br>redacted values
51+
LLM -->> CodeGate: Response with<br>redacted data
5852
deactivate LLM
5953
activate CodeGate
60-
CodeGate -->> Client: Response with<br>plaintext secrets
54+
CodeGate -->> Client: Response with<br>original data
6155
deactivate CodeGate
6256
```
57+
58+
### Secrets encryption
59+
60+
CodeGate uses pattern matching to detect secrets such as:
61+
62+
- API keys and tokens
63+
- Private keys and certificates
64+
- Database credentials
65+
- SSH keys
66+
- Cloud provider credentials
67+
- ...and more - see the
68+
[signatures file](https://github.com/stacklok/codegate/blob/main/signatures.yaml)
69+
in the project repo
70+
71+
CodeGate transparently encrypts secrets before sending the prompt to the LLM.
72+
This is performed on the fly using AES256-GCM encryption with a temporary
73+
per-session key. When the LLM returns a response, CodeGate decrypts the secret
74+
before delivering it to your coding assistant, then securely erases the
75+
temporary key from memory.
76+
77+
### PII redaction
78+
79+
CodeGate scans for common types of PII like:
80+
81+
- Email addresses
82+
- Phone numbers
83+
- Government identification numbers
84+
- Credit card numbers
85+
- Bank accounts and crypto wallet IDs
86+
87+
CodeGate anonymizes PII by replacing each string with a unique identifier before
88+
sending the prompt to the LLM. This way, CodeGate protects your sensitive data
89+
without blocking your development flow. When the LLM returns a response,
90+
CodeGate matches up the identifier and replaces it with the original value.

docs/features/security-reviews.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
title: Security reviews
33
description: Enhanced secure coding guidance
4-
sidebar_position: 30
54
---
65

76
## What's the risk?

docs/features/workspaces.mdx

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
---
22
title: Workspaces
33
description: Organize and customize your project environments
4-
sidebar_position: 40
54
---
65

76
import useBaseUrl from '@docusaurus/useBaseUrl';

docs/index.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,9 @@ sequenceDiagram
3636
CodeGate includes several key features for privacy, security, and coding
3737
efficiency, including:
3838

39-
- [Secrets encryption](./features/secrets-encryption.md) to protect your
40-
sensitive credentials
39+
- [Secrets encryption and PII redaction](./features/secrets-encryption.md) to
40+
protect your sensitive credentials and anonymize personally identifiable
41+
information
4142
- [Dependency risk awareness](./features/dependency-risk.md) to update the LLM's
4243
knowledge of malicious or deprecated open source packages
4344
- [Model muxing](./features/muxing.md) to quickly select the best LLM
@@ -101,7 +102,7 @@ Review the [installation instructions](./how-to/install.md).
101102

102103
Learn more about CodeGate's features:
103104

104-
- [Secrets encryption](./features/secrets-encryption.md)
105+
- [Secrets and PII redaction](./features/secrets-encryption.md)
105106
- [Dependency risk awareness](./features/dependency-risk.md)
106107
- [Security reviews](./features/security-reviews.md)
107108
- [Workspaces](./features/workspaces.mdx)

0 commit comments

Comments
 (0)