Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add RFC for advanced data privacy and response obfuscation #1645

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
192 changes: 192 additions & 0 deletions rfc/data-privacy/v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
---
title: "RFC: Advanced Data Privacy - Configurable response obfuscating"
author: Jens Neuse
---

# Problem

In organizations where data privacy is paramount,
it's often necessary to obfuscate certain fields in the response to allow developers to debug issues without exposing sensitive information,
or to export data through APIs for data science or analytics purposes,
all in compliance with data privacy regulations.

Furthermore, it's important that obfuscation policies are stable,
ensuring that the data is always obfuscated in the same way,
allowing developers to rely on the obfuscation behavior.

# Solution

We propose a new feature that allows to obfuscate fields in responses depending on the user's role or the environment.
The configuration should be done at the Router level as Subgraph developers should not be the single point of control for data privacy.
The security team should be able to define the rules for obfuscation and the Router should enforce them.

To achieve this, we propose to add a new configuration entry in the Router configuration file that allows to define obfuscation policies.
Each policy is defined by two expressions using expr-lang,
one expression to activate the policy,
and one expression to actually obfuscate the field.

The activation expression should be evaluated globally for each client request,
and if it evaluates to true, the obfuscation expression should be evaluated for each field in the response.

## Example Configuration

For obfuscation policies to work,
we need to define a context object that contains all the information necessary to evaluate the expressions.

Context Object:

```json
{
"fieldName": "email",
"fieldType": "String",
"parentType": "User",
"value": "\"[email protected]\"",
"valueDataType": "string"
}
```

Side note, the "activate" expression has access to the request context object, not the context object defined above.
The response context object is defined here: https://cosmo-docs.wundergraph.com/router/configuration/template-expressions

Example Policy:

```yaml
data_privacy:
obfuscation:
expressions:
- name: "email",
expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
policies:
- name: "Data Scientist Obfuscation"
activate: "'data-scientist' in request.auth.roles"
obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
- name: "Developer Obfuscation"
activate: "'developer' in request.auth.roles"
obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
```
In this example, we define an expression (email) that obfuscates the email field by replacing the local part with asterisks.
We then define two policies, one for data scientists and one for developers.
Additional use cases and example policies of how they can be solved as follows:
### Implementing a default obfuscation policy that applies to all fields
We can define a "default" expression that returns the following value for the different valueDataTypes as follows:
- boolean: false
- number: 0
- string: "*" repeated as many times as the length of the value
```yaml
data_privacy:
obfuscation:
expressions:
- name: "default",
expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
policies:
- name: "Default Obfuscation"
activate: "true"
obfuscate: "default(value)"
```
You can also combine this with other policies to apply the default obfuscation only if no other policy applies.
```yaml
data_privacy:
obfuscation:
expressions:
- name: "default",
expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
- name: "email"
expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
policies:
- name: "Data Scientist Obfuscation"
activate: "'data-scientist' in request.auth.roles"
obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : default(value)"
```
### Disabling obfuscation for a specific scalar type
Let's say we've got a scalar "ISODate" that should never be obfuscated.
We can define an "exclude" expression that returns the value as in before evaluating any other expression.
```yaml
data_privacy:
obfuscation:
expressions:
- name: "default",
expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
- name: "email"
expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
- name: "exclude",
expression: "fieldType == 'ISODate' ? true : false"
policies:
- name: "Data Scientist Obfuscation"
activate: "'data-scientist' in request.auth.roles"
obfuscate: >
exclude() ? value :
typeName == 'User' && fieldName == 'email' ? email(value) :
default(value)
```
### Custom obfuscation for a specific scalar
If we'd like to obfuscate any 'ISODate' field as `'2024-**-**'`, obfuscating the month and day, but keeping the year,
we can define the expression as follows:
```yaml
data_privacy:
obfuscation:
expressions:
- name: "default",
expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
- name: "isoDate",
expression: >-
date(value,"2006-01-02 15:04:05").Year() + "-**-**"
- name: "email"
expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
policies:
- name: "Data Scientist Obfuscation"
activate: "'data-scientist' in request.auth.roles"
obfuscate: >
fieldType == 'ISODate' ? isoDate(value) :
typeName == 'User' && fieldName == 'email' ? email(value) :
default(value)
```
### Telemetry: Log Warning when "fieldType" doesn't exist in the GraphQL Schema
To avoid misconfiguration, we can enable logging of policies that reference fields that don't exist in the GraphQL Schema.
```yaml
data_privacy:
obfuscation:
log_undefined_field_types: true
```
In this case, the Router will parse the expressions on startup and look at the expression AST to determine if any fieldTypes are referenced that don't exist in the GraphQL Schema.
If it finds any, it will log a warning with a name of the policy and the fieldType that doesn't exist.
Example log message in json format:
```json
{
"level": "warn",
"message": "Field type 'ISODate' referenced in policy 'Data Scientist Obfuscation' doesn't exist in the GraphQL Schema",
"timestamp": "2024-01-01T12:00:00Z"
}
```

# Implementation

The "activate" expression will be evaluated once per request and should have access to the request context object.
If the expression evaluates to true, the "obfuscate" expression will be evaluated for each field in the response.
If the activation expression evaluates to false, the "obfuscate" expression will be skipped.

The "obfuscate" expression will be evaluated for leaf fields only.
If the field is a scalar, the expression will be evaluated with the context object defined above.

We can implement this functionality in graphql-go-tools by adding a "leaf node resolver" interface which the Router can implement.
Based on the "activate" result, the interface can be a no-op.

If there are no policies defined, the interface will be nil, and the Router will not evaluate any expressions.
Loading