wundergraph · jensneuse · Mar 3, 2025 · Mar 4, 2025 · Mar 12, 2025 · Mar 12, 2025
diff --git a/rfc/data-privacy/v1.md b/rfc/data-privacy/v1.md
@@ -0,0 +1,192 @@
+---
+title: "RFC: Advanced Data Privacy - Configurable response obfuscating"
+author: Jens Neuse
+---
+
+# Problem
+
+In organizations where data privacy is paramount,
+it's often necessary to obfuscate certain fields in the response to allow developers to debug issues without exposing sensitive information,
+or to export data through APIs for data science or analytics purposes,
+all in compliance with data privacy regulations.
+
+Furthermore, it's important that obfuscation policies are stable,
+ensuring that the data is always obfuscated in the same way,
+allowing developers to rely on the obfuscation behavior.
+
+# Solution
+
+We propose a new feature that allows to obfuscate fields in responses depending on the user's role or the environment.
+The configuration should be done at the Router level as Subgraph developers should not be the single point of control for data privacy.
+The security team should be able to define the rules for obfuscation and the Router should enforce them.
+
+To achieve this, we propose to add a new configuration entry in the Router configuration file that allows to define obfuscation policies.
+Each policy is defined by two expressions using expr-lang,
+one expression to activate the policy,
+and one expression to actually obfuscate the field.
+
+The activation expression should be evaluated globally for each client request,
+and if it evaluates to true, the obfuscation expression should be evaluated for each field in the response.
+
+## Example Configuration
+
+For obfuscation policies to work,
+we need to define a context object that contains all the information necessary to evaluate the expressions.
+
+Context Object:
+
+```json
+{
+  "fieldName": "email",
+  "fieldType": "String",
+  "parentType": "User",
+  "value": "\"[email protected]\"",
+  "valueDataType": "string"
+}
+```
+
+Side note, the "activate" expression has access to the request context object, not the context object defined above.
+The response context object is defined here: https://cosmo-docs.wundergraph.com/router/configuration/template-expressions
+
+Example Policy:
+
+```yaml
+data_privacy:
+    obfuscation:
+        expressions:
+          - name: "email",
+            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
+        policies:
+            - name: "Data Scientist Obfuscation"
+              activate: "'data-scientist' in request.auth.roles"
+              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
+            - name: "Developer Obfuscation"
+              activate: "'developer' in request.auth.roles"
+              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
+```
+
+In this example, we define an expression (email) that obfuscates the email field by replacing the local part with asterisks.
+We then define two policies, one for data scientists and one for developers.
+
+Additional use cases and example policies of how they can be solved as follows:
+
+### Implementing a default obfuscation policy that applies to all fields
+
+We can define a "default" expression that returns the following value for the different valueDataTypes as follows:
+- boolean: false
+- number: 0
+- string: "*" repeated as many times as the length of the value
+
+```yaml
+data_privacy:
+    obfuscation:
+        expressions:
+          - name: "default",
+            expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
+        policies:
+            - name: "Default Obfuscation"
+              activate: "true"
+              obfuscate: "default(value)"
+```
+
+You can also combine this with other policies to apply the default obfuscation only if no other policy applies.
+
+```yaml
+data_privacy:
+    obfuscation:
+        expressions:
+          - name: "default",
+            expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
+          - name: "email"
+            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
+        policies:
+            - name: "Data Scientist Obfuscation"
+              activate: "'data-scientist' in request.auth.roles"
+              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : default(value)"
+```
+
+### Disabling obfuscation for a specific scalar type
+
+Let's say we've got a scalar "ISODate" that should never be obfuscated.
+We can define an "exclude" expression that returns the value as in before evaluating any other expression.
+
+```yaml
+data_privacy:
+  obfuscation:
+    expressions:
+      - name: "default",
+        expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
+      - name: "email"
+        expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
+      - name: "exclude",
+        expression: "fieldType == 'ISODate' ? true : false"
+    policies:
+      - name: "Data Scientist Obfuscation"
+        activate: "'data-scientist' in request.auth.roles"
+        obfuscate: >
+          exclude() ? value :
+          typeName == 'User' && fieldName == 'email' ? email(value) :
+          default(value)
+```
+
+### Custom obfuscation for a specific scalar
+
+If we'd like to obfuscate any 'ISODate' field as `'2024-**-**'`, obfuscating the month and day, but keeping the year,
+we can define the expression as follows:
+
+```yaml
+data_privacy:
+  obfuscation:
+    expressions:
+      - name: "default",
+        expression: "valueDataType == 'boolean' ? false : valueDataType == 'number' ? 0 : repeat(\"*\",len(value))"
+      - name: "isoDate",
+        expression: >-
+          date(value,"2006-01-02 15:04:05").Year() + "-**-**"
+      - name: "email"
+        expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
+    policies:
+      - name: "Data Scientist Obfuscation"
+        activate: "'data-scientist' in request.auth.roles"
+        obfuscate: >
+          fieldType == 'ISODate' ? isoDate(value) :
+          typeName == 'User' && fieldName == 'email' ? email(value) :
+          default(value)
+```
+
+### Telemetry: Log Warning when "fieldType" doesn't exist in the GraphQL Schema
+
+To avoid misconfiguration, we can enable logging of policies that reference fields that don't exist in the GraphQL Schema.
+
+```yaml
+data_privacy:
+  obfuscation:
+    log_undefined_field_types: true
+```
+
+In this case, the Router will parse the expressions on startup and look at the expression AST to determine if any fieldTypes are referenced that don't exist in the GraphQL Schema.
+If it finds any, it will log a warning with a name of the policy and the fieldType that doesn't exist.
+
+Example log message in json format:
+
+```json
+{
+  "level": "warn",
+  "message": "Field type 'ISODate' referenced in policy 'Data Scientist Obfuscation' doesn't exist in the GraphQL Schema",
+  "timestamp": "2024-01-01T12:00:00Z"
+}
+```
+
+# Implementation
+
+The "activate" expression will be evaluated once per request and should have access to the request context object.
+If the expression evaluates to true, the "obfuscate" expression will be evaluated for each field in the response.
+If the activation expression evaluates to false, the "obfuscate" expression will be skipped.
+
+The "obfuscate" expression will be evaluated for leaf fields only.
+If the field is a scalar, the expression will be evaluated with the context object defined above.
+
+We can implement this functionality in graphql-go-tools by adding a "leaf node resolver" interface which the Router can implement.
+Based on the "activate" result, the interface can be a no-op.
+
+If there are no policies defined, the interface will be nil, and the Router will not evaluate any expressions.