Skip to content

Conversation

@TheFrozenFire
Copy link
Contributor

Closes #807

Need to add a test to check this behaviour. I've run the whole test suite locally, and nothing breaks - but there's no coverage on the new behaviour.

@yuroitaki yuroitaki requested review from sinui0 and yuroitaki April 24, 2025 05:03
Copy link
Member

@sinui0 sinui0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some sort of limit on how many values we will commit to by default.

Maybe it would also make sense to only commit to string and object values in an array by default. Redacting individual numbers in an array seems less useful.

@TheFrozenFire
Copy link
Contributor Author

Maybe it would also make sense to only commit to string and object values in an array by default. Redacting individual numbers in an array seems less useful.

I can think of some scenarios where committing to numbers might be useful. If an array is a list of group IDs your user is a member of, you might want to disclose that you're a member of one group, without disclosing all groups you're a member of.

Or, if the endpoint you're notarizing is outputting a table of data, the individual arrays may represent rows with columns. Some of the columns values might be numbers, and you may wish to reveal only that column.

@TheFrozenFire
Copy link
Contributor Author

TheFrozenFire commented Apr 24, 2025

We need some sort of limit on how many values we will commit to by default.

Should I be trying to tackle that here? Having some sort of defense against a malicious server sending a parser bomb is sensible, but it's not actually an array-specific thing. The adversary could send a deeply nested array (e.g. [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]), or a deeply nested object ({"a":{"a":{ ....).

It seems like it would be preferable to have the builder maintain a counter, and raise an exception when you blow out some limit.

@sinui0
Copy link
Member

sinui0 commented Apr 24, 2025

If the response payload isn't malicious but nonetheless contains a huge array it could cause problems. Though on second thought the storage multiplier for commitment data is worst case only ~64x for encoding commitments. Given we're likely not going to be committing more than low MB it should be fine.

@TheFrozenFire
Copy link
Contributor Author

From discord: TranscriptCommitConfig should be updated to query if it contains a commitment to a specific range. Then we need a bunch of tests asserting that it produces the expected config

That seems like a sensible test, but also, that would create annoyingly fragile tests. I think the tests should instead focus on enforcing backwards compatibility.

For instance, in my first pass, I accidentally broke the existing behaviour which commits to without_values(), which would break existing code which reveals without_values().

I output the commits that the current implementation produces, and here they are, ordered by start index. A valid test may be to check that the whole value, the without_values() and the individual elements are committed. However, I'm not 100% sure what a good test for the separators would be.

[1,2,3,4,5,6,"7",false,3.14]
[]
1
,
2
,
3
,
4
,
5
,
6
,"
7
",
false
,
3.14

@TheFrozenFire
Copy link
Contributor Author

If the response payload isn't malicious but nonetheless contains a huge array it could cause problems. Though on second thought the storage multiplier for commitment data is worst case only ~64x for encoding commitments. Given we're likely not going to be committing more than low MB it should be fine.

IMO, that would be a strong example of where complicating the default implementation for an edge case isn't sensible. If someone encounters that case, they could implement a custom committer which skips particular paths, for instance.

@sinui0
Copy link
Member

sinui0 commented Apr 24, 2025

Not sure what the difference is between what I initially said and what you just suggested. But yes, we can add tests which at the very least let us know when the behavior has been modified.

spansy provides a visitor trait which can be used to implement a behavioral test such as "all types of values are committed to individually": https://github.com/tlsnotary/tlsn-utils/blob/dev/spansy/src/json/visit.rs

Yeah, dealing with separators or other tokens more generally is harder. When I finally get a time to tackle spansy again I was thinking of having it span the tokens as well.

@sinui0 sinui0 changed the base branch from main to dev April 24, 2025 22:15
@sinui0
Copy link
Member

sinui0 commented Apr 24, 2025

Need to run rustfmt

@sinui0
Copy link
Member

sinui0 commented Apr 24, 2025

We can move circuit breaker behavior out of scope

@TheFrozenFire
Copy link
Contributor Author

I think I've implemented the needed tests now. As a matter of interest, I realized that the visitor produces a value which has no span content, and no ranges. I don't think this is an issue that I introduced, and I see no cause for it. I skip that in my commit checker, because it only happens once, and it's an issue out of scope.

@TheFrozenFire
Copy link
Contributor Author

Weird that the formatting checks are insisting that the new json fixture macro calls need to be formatted differently from the http ones.

&self,
direction: Direction,
range: &dyn ToRangeSet<usize>,
kind: Option<TranscriptCommitmentKind>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional arguments are typically avoided because you can't at a glance know what is being omitted at the call site (None tells you nothing about the type). Instead it would be typical to add a contains_with_kind function which accepts this argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to now look for any commit kind with the specified direction and ranges, and added the _with_kind for looking for specific kinds.

Change its logic to look for any commit type with direction, index.
Add contains_with_kind.
Tidying.
.any(|(_, kind)| matches!(kind, TranscriptCommitmentKind::Encoding))
}

/// Returns whether the builder has a commitment for the given direction and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns whether the builder has a commitment for the given direction and
/// Returns `true` if the configuration contains the provided ranges.

Comment on lines +88 to +89
/// Returns whether the builder has a commitment for the given direction and
/// range, with the given kind.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns whether the builder has a commitment for the given direction and
/// range, with the given kind.
/// Returns `true` if the configuration contains the provided ranges with the given commitment kind.

Comment on lines +83 to +85
self.commits
.iter()
.any(|((d, i), _)| *d == direction && *i == idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being O(N) isn't great, but ok for now.

};
use tlsn_data_fixtures::json as fixtures;

#[rstest]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[rstest]
// Asserts the behavior that every value in the transcript is committed individually.
#[rstest]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JSON committer should commit array values individually

2 participants