chore: add RFC for advanced data privacy and response obfuscation #1645

jensneuse · 2025-03-03T13:00:07Z

No description provided.

maxkomarychev

This looks very promising!

I left a few comments in the code.

Here's one more more: the obfuscate is very powerful and very generic. It can contain conditions which is great but my concern is that on a large enough schema those blocks will eventually contain a ton of if/else and ternary operators to enable rules based on type/field/scalar/scalar field etc.

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

this

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"

could become

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

thanks!

rfc/data-privacy/v1.md

Co-authored-by: Max Komarychev <[email protected]>

jensneuse · 2025-03-04T08:35:30Z

This looks very promising!

I left a few comments in the code.

Here's one more more: the obfuscate is very powerful and very generic. It can contain conditions which is great but my concern is that on a large enough schema those blocks will eventually contain a ton of if/else and ternary operators to enable rules based on type/field/scalar/scalar field etc.

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

this

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"

could become

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

thanks!

I was initially trying to keep the complexity of the configuration as simple as possible. I was also thinking about something similar to what you're describing. Here's an alternative approach to your idea. Let me know what you think about it.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              filter: "typeName == 'User' && fieldName == 'email'"
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

This achieves the same functionality, but it's more flexible, e.g. we can also filter by scalar name.
That said, as the obfuscate field is an expression itself, we could also simplify, although it's less re-usable then.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

jensneuse · 2025-03-04T08:37:09Z

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

@maxkomarychev You mentioned code generation. Can you explain what the relationship would be between obfuscation and code generation? What are you looking to generate?

maxkomarychev · 2025-03-04T09:31:21Z

This looks very promising!
I left a few comments in the code.
Here's one more more: the obfuscate is very powerful and very generic. It can contain conditions which is great but my concern is that on a large enough schema those blocks will eventually contain a ton of if/else and ternary operators to enable rules based on type/field/scalar/scalar field etc.
I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?
this

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"

could become

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

thanks!

I was initially trying to keep the complexity of the configuration as simple as possible. I was also thinking about something similar to what you're describing. Here's an alternative approach to your idea. Let me know what you think about it.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              filter: "typeName == 'User' && fieldName == 'email'"
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

This achieves the same functionality, but it's more flexible, e.g. we can also filter by scalar name. That said, as the obfuscate field is an expression itself, we could also simplify, although it's less re-usable then.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

This is more readable. There is a clear place which selects the field for obfuscation and obfuscation block is simpler too because it doesn't need ternary operation with fallback to "value".

maxkomarychev · 2025-03-04T09:34:25Z

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

@maxkomarychev You mentioned code generation. Can you explain what the relationship would be between obfuscation and code generation? What are you looking to generate?

If we can allow having simpler expressions in obfuscate then it may be easier to codegenerate entire config. In case if we'd want to maintain custom policies for 50+ fields we could maintain own datastructure outside of the config which then can be translated into expressions easier.

maxkomarychev · 2025-03-04T19:59:44Z

@jensneuse One more thing: we'd like to be able to filter a field using the interface. Can we have the information to check if a field comes as a result of implementing an interface? e.g.

interface Hello {
  world: String!
}

type One implements Hello {
  world: String
}
type Two {
  world: String
}

I'd like to be able to target only instances of world which exist because of Hello

rfc/data-privacy/v1.md

Introduces a comprehensive RFC for configurable response obfuscation in the Cosmo Router. Key features include: - Flexible data privacy policies with role-based and type-based targeting - Configurable transformers for obfuscating sensitive fields - Support for dynamic activation rules using expr-lang - Detailed configuration options for handling undefined fields - Comprehensive schema validation and performance considerations

maxkomarychev · 2025-03-12T10:53:22Z

rfc/data-privacy/v2.md

+```yaml
+targets:
+  # Match a specific field on a specific type
+  - type: "User"


will this work for interfaces?

When we resolve a response, we always resolve a concrete type, not an interface. Consequently, you need to define all types implementing an interface. Or to put it another way, you can ignore interfaces.

maxkomarychev · 2025-03-12T10:56:43Z

rfc/data-privacy/v2.md

+          date(value, "2006-01-02 15:04:05").Year() + "-**-**"
+    
+    # Obfuscation policies
+    policies:


are policies supposed to contain mutually exclusive conditions in their activate blocks? can multiple policies be applied one after another?

Multiple policies can be active, yes, but if one active policy matches a field, we will not evaluate further. Is this in line with your expectations or are you thinking of having multiple policies active?

we don't need multiple policies to be active. "first policy wins" is good enough 👍

maxkomarychev · 2025-03-12T11:02:55Z

rfc/data-privacy/v2.md

+      validate_transformers: true # Validate transformer return types on startup
+
+    # Reusable transformers
+    transformers:


will transformers be invoked for null values?

Is there a use case of obfuscating null? I'd say that we don't obfuscate null and just ignore null values.

works for me

maxkomarychev · 2025-03-12T11:03:08Z

rfc/data-privacy/v2.md

+  "fieldName": "email",
+  "fieldType": "String",
+  "parentType": "User",
+  "value": "[email protected]",


can this be null?

scalar fields can be null, yes

in the answer above you said we don't deal with nulls 🤔

maxkomarychev · 2025-03-12T11:03:36Z

rfc/data-privacy/v2.md

+{
+  "fieldName": "email",
+  "fieldType": "String",
+  "parentType": "User",


can we expect interface names here?

no, the parent will always be a concrete type

maxkomarychev · 2025-03-12T11:05:04Z

rfc/data-privacy/v2.md

+
+        # For complex conditions not easily expressed in YAML
+        # Custom rule is only evaluated if no target rule matches
+        custom: |


is custom optional? is it correct to assume fields not matched by targets will go unchanged if custom is missing?

it's optional, yes
if custom is missing and no target matches, fields will be unchanged, yes
however, it's possible to define a default transformer if you don't have a custom function

maxkomarychev

Love the v2. custom and default_transform, targets are great!

Could you clarify please if the matchers will be called for null values and if we can use interfaces in targets?

Noroth · 2025-03-12T11:32:59Z

rfc/data-privacy/v2.md

+
+### Target-Based Field Selection
+
+The `targets` field provides a readable, YAML-based way to specify which fields should be obfuscated and how:


Would it make sense to change field to fields in case you want to apply the same obfuscation function to multiple fields in one type? That might omit a bit of redundant configuration

we can, in fact, we can allow "fields" to be a selection set, so you can just define multiple space delimited fields

Why space delimited and not a yaml array?

yaml array is probably better 👍

chore: add RFC for advanced data privacy and response obfuscation

5eb42b9

maxkomarychev reviewed Mar 4, 2025

View reviewed changes

Update rfc/data-privacy/v1.md

5b7da94

Co-authored-by: Max Komarychev <[email protected]>

SkArchon reviewed Mar 5, 2025

View reviewed changes

rfc/data-privacy/v1.md Show resolved Hide resolved

rfc/data-privacy/v1.md Show resolved Hide resolved

rfc/data-privacy/v1.md Show resolved Hide resolved

rfc/data-privacy/v1.md Show resolved Hide resolved

rfc/data-privacy/v1.md Show resolved Hide resolved

maxkomarychev reviewed Mar 12, 2025

View reviewed changes

Noroth reviewed Mar 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add RFC for advanced data privacy and response obfuscation #1645

chore: add RFC for advanced data privacy and response obfuscation #1645

jensneuse commented Mar 3, 2025

maxkomarychev left a comment •

edited

Loading

jensneuse commented Mar 4, 2025

jensneuse commented Mar 4, 2025

maxkomarychev commented Mar 4, 2025 •

edited

Loading

maxkomarychev commented Mar 4, 2025

maxkomarychev commented Mar 4, 2025

maxkomarychev Mar 12, 2025

jensneuse Mar 12, 2025

maxkomarychev Mar 12, 2025

jensneuse Mar 12, 2025

maxkomarychev Mar 12, 2025

maxkomarychev Mar 12, 2025

jensneuse Mar 12, 2025

maxkomarychev Mar 12, 2025

maxkomarychev Mar 12, 2025

jensneuse Mar 12, 2025

maxkomarychev Mar 12, 2025

maxkomarychev Mar 12, 2025

jensneuse Mar 12, 2025

maxkomarychev Mar 12, 2025

jensneuse Mar 12, 2025

maxkomarychev left a comment

Noroth Mar 12, 2025

jensneuse Mar 12, 2025

Noroth Mar 12, 2025

jensneuse Mar 12, 2025


		### Target-Based Field Selection

		The `targets` field provides a readable, YAML-based way to specify which fields should be obfuscated and how:

chore: add RFC for advanced data privacy and response obfuscation #1645

Are you sure you want to change the base?

chore: add RFC for advanced data privacy and response obfuscation #1645

Conversation

jensneuse commented Mar 3, 2025

maxkomarychev left a comment • edited Loading

Choose a reason for hiding this comment

jensneuse commented Mar 4, 2025

jensneuse commented Mar 4, 2025

maxkomarychev commented Mar 4, 2025 • edited Loading

maxkomarychev commented Mar 4, 2025

maxkomarychev commented Mar 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxkomarychev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxkomarychev left a comment •

edited

Loading

maxkomarychev commented Mar 4, 2025 •

edited

Loading