Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cip: Batched Anchor Data Structure #74

Merged
merged 5 commits into from
Dec 10, 2020
Merged

cip: Batched Anchor Data Structure #74

merged 5 commits into from
Dec 10, 2020

Conversation

oed
Copy link
Member

@oed oed commented Nov 12, 2020

PR for CIP-69.
Discussion #69

@oed oed requested review from stbrody and simonovic86 November 12, 2020 14:10
@oed oed self-assigned this Nov 12, 2020
```js
const { BloomFilter } = require('bloom-filters')
const filterEntries = [...] // An array of strings
const errorRate = 0.04 // 4 % error rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why .04? Was that chosen arbitrarily or based on some calculations about the size of the resulting filter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of arbitrarily. Just took it out of the readme of the bloom-filters package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I guess I had been thinking this was supposed to be an example of how someone might consume the information stored in the bloomFilter field of the TreeMetadata to reconstitute the bloom filter and use it, but I guess this is more like example code of how we might build the bloom filter before persisting it in the TreeMetadata. In that case, is this example even really relevant/useful to include in this document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it hurts to include?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only risk is confusion/distraction from the main focus of how to use/consume this structure. Probably not a big deal either way though


type TreeMetaData struct {
numEntries Int
bloomType String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want more than just a string for specifying the bloomType here. In order to consume the bloom filter users will have to know the arguments used to construct it (either size of the bloom filter and number of hash functions, or target number of items and false positive rate). To provide that to them we'll either need a public mapping of the bloomType string to the arguments used to construct it (so that we could change the bloomType string if we ever wanted to change the arguments we're using to construct the bloom filter, even if the underlying bloom filter library we're using doesn't change) or we need to encode those arguments here. I think putting them here probably makes more sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have to put it here, because there's no way to know how many entries are in the bloom filter otherwise. Different anchor merkle trees could wind up with a different number of filter entries, and if we're generating the bloom filter based on the exact known number of entries, then that's going to change the size of the filter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library we intend on using will export to a format which includes this information: https://github.com/Callidon/bloom-filters#export-and-import
So the info will be in the bloomFilter property.

2. `schema` - if multiple documents have the same topic, sort by the *schema*
3. `DID` - if multiple documents have the same topic and schema, extract and sorty by the DID from the `kid` in the signed record
3. `controllers` - if multiple documents have the same topic and schema, sort by the first controller, then subsequent ones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work right, we'll have to enforce that the controllers array in records is always stored sorted, regardless of the order the user specifies them in. Should we make a story for this now so we don't forget? Is there an epic for supporting multiple controllers we could put it in?

Copy link
Member Author

@oed oed Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's true, unless you already know the order from a previous version of the document. I don't think there is an epic yet, but feel free to create one!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```js
const { BloomFilter } = require('bloom-filters')
const filterEntries = [...] // An array of strings
const errorRate = 0.04 // 4 % error rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I guess I had been thinking this was supposed to be an example of how someone might consume the information stored in the bloomFilter field of the TreeMetadata to reconstitute the bloom filter and use it, but I guess this is more like example code of how we might build the bloom filter before persisting it in the TreeMetadata. In that case, is this example even really relevant/useful to include in this document?

@oed
Copy link
Member Author

oed commented Nov 18, 2020

@stbrody Pushed some updates here.

@stbrody
Copy link
Contributor

stbrody commented Nov 18, 2020

LGTM!

type TreeMetadata struct {
numEntries Int
bloomType String
bloomFilter {String:Any}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't seem to make a mult-line suggestion in github but how about instead of separate top-level bloomType and bloomFilter fields we have a single top-level bloomFilter field whose value is an object with a type and a data field? So then we can add more tree metadata in the future and it's not mixed-in with bloom filter specific data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@oed oed merged commit 33414a6 into master Dec 10, 2020
@oed oed deleted the cip/batched-anchors branch February 16, 2021 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants