Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add RFC for advanced data privacy and response obfuscation #1645

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jensneuse
Copy link
Member

No description provided.

Copy link
Contributor

@maxkomarychev maxkomarychev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very promising!

I left a few comments in the code.

Here's one more more: the obfuscate is very powerful and very generic. It can contain conditions which is great but my concern is that on a large enough schema those blocks will eventually contain a ton of if/else and ternary operators to enable rules based on type/field/scalar/scalar field etc.

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

this

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"

could become

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

thanks!

Co-authored-by: Max Komarychev <[email protected]>
@jensneuse
Copy link
Member Author

This looks very promising!

I left a few comments in the code.

Here's one more more: the obfuscate is very powerful and very generic. It can contain conditions which is great but my concern is that on a large enough schema those blocks will eventually contain a ton of if/else and ternary operators to enable rules based on type/field/scalar/scalar field etc.

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

this

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"

could become

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

thanks!

I was initially trying to keep the complexity of the configuration as simple as possible. I was also thinking about something similar to what you're describing. Here's an alternative approach to your idea. Let me know what you think about it.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              filter: "typeName == 'User' && fieldName == 'email'"
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

This achieves the same functionality, but it's more flexible, e.g. we can also filter by scalar name.
That said, as the obfuscate field is an expression itself, we could also simplify, although it's less re-usable then.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

@jensneuse
Copy link
Member Author

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

@maxkomarychev You mentioned code generation. Can you explain what the relationship would be between obfuscation and code generation? What are you looking to generate?

@maxkomarychev
Copy link
Contributor

maxkomarychev commented Mar 4, 2025

This looks very promising!
I left a few comments in the code.
Here's one more more: the obfuscate is very powerful and very generic. It can contain conditions which is great but my concern is that on a large enough schema those blocks will eventually contain a ton of if/else and ternary operators to enable rules based on type/field/scalar/scalar field etc.
I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?
this

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"

could become

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

thanks!

I was initially trying to keep the complexity of the configuration as simple as possible. I was also thinking about something similar to what you're describing. Here's an alternative approach to your idea. Let me know what you think about it.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              filter: "typeName == 'User' && fieldName == 'email'"
              obfuscate: email(value)
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

This achieves the same functionality, but it's more flexible, e.g. we can also filter by scalar name. That said, as the obfuscate field is an expression itself, we could also simplify, although it's less re-usable then.

data_privacy:
    obfuscation:
        expressions:
          - name: "email",
            expression: "repeat(\"*\",len(split(value,\"@\")[0])) + \"@\" + split(value,\"@\")[1]"
        policies:
            - name: "Data Scientist Obfuscation"
              activate: "'data-scientist' in request.auth.roles"
              obfuscate: "typeName == 'User' && fieldName == 'email' ? email(value) : value"
            - name: "Developer Obfuscation"
              activate: "'developer' in request.auth.roles"
              target:
                typeName: User
                fieldName: email
              obfuscate: email(value)

This is more readable. There is a clear place which selects the field for obfuscation and obfuscation block is simpler too because it doesn't need ternary operation with fallback to "value".

@maxkomarychev
Copy link
Contributor

I believe it would be nice to have ability to trigger the rule using yaml. IMO it will significantly improve readability and make it less error prone. It'll also be easier to do codegeneration. Can we have in addition to everything else a field (an object) in yaml which can be used to specify "target" of the obfuscation block using data available in the context object?

@maxkomarychev You mentioned code generation. Can you explain what the relationship would be between obfuscation and code generation? What are you looking to generate?

If we can allow having simpler expressions in obfuscate then it may be easier to codegenerate entire config. In case if we'd want to maintain custom policies for 50+ fields we could maintain own datastructure outside of the config which then can be translated into expressions easier.

@maxkomarychev
Copy link
Contributor

@jensneuse One more thing: we'd like to be able to filter a field using the interface. Can we have the information to check if a field comes as a result of implementing an interface? e.g.

interface Hello {
  world: String!
}

type One implements Hello {
  world: String
}
type Two {
  world: String
}

I'd like to be able to target only instances of world which exist because of Hello

Introduces a comprehensive RFC for configurable response obfuscation in the Cosmo Router. Key features include:

- Flexible data privacy policies with role-based and type-based targeting
- Configurable transformers for obfuscating sensitive fields
- Support for dynamic activation rules using expr-lang
- Detailed configuration options for handling undefined fields
- Comprehensive schema validation and performance considerations
```yaml
targets:
# Match a specific field on a specific type
- type: "User"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this work for interfaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we resolve a response, we always resolve a concrete type, not an interface. Consequently, you need to define all types implementing an interface. Or to put it another way, you can ignore interfaces.

date(value, "2006-01-02 15:04:05").Year() + "-**-**"
# Obfuscation policies
policies:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are policies supposed to contain mutually exclusive conditions in their activate blocks? can multiple policies be applied one after another?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple policies can be active, yes, but if one active policy matches a field, we will not evaluate further. Is this in line with your expectations or are you thinking of having multiple policies active?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need multiple policies to be active. "first policy wins" is good enough 👍

validate_transformers: true # Validate transformer return types on startup

# Reusable transformers
transformers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will transformers be invoked for null values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case of obfuscating null? I'd say that we don't obfuscate null and just ignore null values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

"fieldName": "email",
"fieldType": "String",
"parentType": "User",
"value": "[email protected]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scalar fields can be null, yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the answer above you said we don't deal with nulls 🤔

{
"fieldName": "email",
"fieldType": "String",
"parentType": "User",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we expect interface names here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the parent will always be a concrete type


# For complex conditions not easily expressed in YAML
# Custom rule is only evaluated if no target rule matches
custom: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is custom optional? is it correct to assume fields not matched by targets will go unchanged if custom is missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's optional, yes
if custom is missing and no target matches, fields will be unchanged, yes
however, it's possible to define a default transformer if you don't have a custom function

Copy link
Contributor

@maxkomarychev maxkomarychev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the v2. custom and default_transform, targets are great!

Could you clarify please if the matchers will be called for null values and if we can use interfaces in targets?


### Target-Based Field Selection

The `targets` field provides a readable, YAML-based way to specify which fields should be obfuscated and how:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to change field to fields in case you want to apply the same obfuscation function to multiple fields in one type? That might omit a bit of redundant configuration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can, in fact, we can allow "fields" to be a selection set, so you can just define multiple space delimited fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why space delimited and not a yaml array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml array is probably better 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants