-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cip: Batched Anchor Data Structure #74
Conversation
```js | ||
const { BloomFilter } = require('bloom-filters') | ||
const filterEntries = [...] // An array of strings | ||
const errorRate = 0.04 // 4 % error rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why .04? Was that chosen arbitrarily or based on some calculations about the size of the resulting filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of arbitrarily. Just took it out of the readme of the bloom-filters
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I guess I had been thinking this was supposed to be an example of how someone might consume the information stored in the bloomFilter
field of the TreeMetadata
to reconstitute the bloom filter and use it, but I guess this is more like example code of how we might build the bloom filter before persisting it in the TreeMetadata. In that case, is this example even really relevant/useful to include in this document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it hurts to include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only risk is confusion/distraction from the main focus of how to use/consume this structure. Probably not a big deal either way though
|
||
type TreeMetaData struct { | ||
numEntries Int | ||
bloomType String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want more than just a string for specifying the bloomType here. In order to consume the bloom filter users will have to know the arguments used to construct it (either size of the bloom filter and number of hash functions, or target number of items and false positive rate). To provide that to them we'll either need a public mapping of the bloomType
string to the arguments used to construct it (so that we could change the bloomType
string if we ever wanted to change the arguments we're using to construct the bloom filter, even if the underlying bloom filter library we're using doesn't change) or we need to encode those arguments here. I think putting them here probably makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we have to put it here, because there's no way to know how many entries are in the bloom filter otherwise. Different anchor merkle trees could wind up with a different number of filter entries, and if we're generating the bloom filter based on the exact known number of entries, then that's going to change the size of the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library we intend on using will export to a format which includes this information: https://github.com/Callidon/bloom-filters#export-and-import
So the info will be in the bloomFilter
property.
2. `schema` - if multiple documents have the same topic, sort by the *schema* | ||
3. `DID` - if multiple documents have the same topic and schema, extract and sorty by the DID from the `kid` in the signed record | ||
3. `controllers` - if multiple documents have the same topic and schema, sort by the first controller, then subsequent ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to work right, we'll have to enforce that the controllers
array in records is always stored sorted, regardless of the order the user specifies them in. Should we make a story for this now so we don't forget? Is there an epic for supporting multiple controllers we could put it in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's true, unless you already know the order from a previous version of the document. I don't think there is an epic yet, but feel free to create one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```js | ||
const { BloomFilter } = require('bloom-filters') | ||
const filterEntries = [...] // An array of strings | ||
const errorRate = 0.04 // 4 % error rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I guess I had been thinking this was supposed to be an example of how someone might consume the information stored in the bloomFilter
field of the TreeMetadata
to reconstitute the bloom filter and use it, but I guess this is more like example code of how we might build the bloom filter before persisting it in the TreeMetadata. In that case, is this example even really relevant/useful to include in this document?
@stbrody Pushed some updates here. |
LGTM! |
type TreeMetadata struct { | ||
numEntries Int | ||
bloomType String | ||
bloomFilter {String:Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't seem to make a mult-line suggestion in github but how about instead of separate top-level bloomType
and bloomFilter
fields we have a single top-level bloomFilter
field whose value is an object with a type
and a data
field? So then we can add more tree metadata in the future and it's not mixed-in with bloom filter specific data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
PR for CIP-69.
Discussion #69