-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(CIP-145): updates from forum discussion #149
base: main
Are you sure you want to change the base?
Conversation
CIPs/cip-145.md
Outdated
Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is | ||
only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can | ||
be determined by looking at the `prev` fields of all the Data Events for that stream. | ||
1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would time events get pruned?
Consider Fig. 6:
- Data B expires at time
3<n<4
- Data C has a time event at
4
- If Time 3 is pruned, then it's unclear why Data B should remain valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good question. @AaronGoldman and I had a good discussion about this scenario.
In this example, Data C
can only be created after Data B
has been validated to either be within the expiration timeout or to already have a valid Time Event.
We can assume that the author of Data C
validated Data B
because Data C
follows Data B
.
Of course, if Time 3
for Data B
is available before Data C
is created, the CID of Time 3
will be included in Data C
's multi-prev.
To your point, @m0ar, including Time 1
's descendants allows multi-prev to cover existing history (even if pruned) in the stream state, whereas today, pruned events are lost from the stream state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can assume that the author of Data C validated Data B because Data C follows Data B.
Can we really make this assumption? In theory the author of Data C
could "import" Data B
into the log even though it's invalid, e.g. Time 3
doesn't exist or is too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can assume that the author of Data C validated Data B because Data C follows Data B.
Can we really make this assumption? In theory the author of
Data C
could "import"Data B
into the log even though it's invalid, e.g.Time 3
doesn't exist or is too late.
Events like Time 3
are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.
We feel that the validity of Data C
does not require proof of validity of Data A
or Data B
. However, establishing the latter's time bounds does require recording Time 2
and Time 3
. Time 2
is already part of Data C
's history. Including Time 3
in a future multi-prev would ensure that it also becomes part of the stream history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We feel that the validity of
Data C
does not require proof of validity ofData A
orData B
. However, establishing the latter's time bounds does require recordingTime 2
andTime 3
.Time 2
is already part ofData C
's history. IncludingTime 3
in a future multi-prev would ensure that it also becomes part of the stream history.
Agree that Data C
's validity is not predicated on the validity of A
or B
. However, the validity of Data B
is predicated on Time 3
being available. Therefore, it doesn't seem right to say that we can prune Time 3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We feel that the validity of
Data C
does not require proof of validity ofData A
orData B
. However, establishing the latter's time bounds does require recordingTime 2
andTime 3
.Time 2
is already part ofData C
's history. IncludingTime 3
in a future multi-prev would ensure that it also becomes part of the stream history.Agree that
Data C
's validity is not predicated on the validity ofA
orB
. However, the validity ofData B
is predicated onTime 3
being available. Therefore, it doesn't seem right to say that we can pruneTime 3
.
Yes, what would you think about including Time 3
in the multi-prev of the next event to be added to the stream? That's what we meant to say here:
Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.
What if a new event Data D
(occurring after Time 4
) had a prev
of [Time 4, Time 3]
? Even though Time 3
will not take precedence during tip selection, it will always remain part of the stream history.
Or, let's say, Time 3
didn't show up until we already had Time 4
-> Data D
-> Time 5
-> Data E
-> Time 6
. Then Data F
(occurring after Time 6
) would have a prev
of [Time 6, Time 3]
. Data B
would remain in an "unverified" state until Time 3
was discovered.
Tracking Data B
's validity this way would be a little more complicated than the usual flow, but always possible. Moreover, now all events related to the stream would be part of the DAG, which is great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I mean! Just wanted to be clear that Time 3
can't be pruned if we expect the creator of Data D
to include it in prev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smrz2001 since it seems hat we are in agreement, maybe you can update the language to be clear that the TimeEvent doesn't get pruned?
CIPs/cip-145.md
Outdated
3. The Data Event that is covered by the earliest Time Event wins (see event `A` in fig. 5). | ||
4. If two Data Events share the earliest timestamp, then the branch of the Data Event with the lower CID wins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for a Data Event to "win" in the context of multiple prev?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We updated the wording. A Data Event "winning" here meant that that Data Event would be marked the tip by the protocol. An application would be able to make use of this information to create a merge Data Event, though other candidate CIDs would also be present in the prev
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev
field? I don't follow why the protocol needs to care about "winning"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev
array when it's being constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get where you're going, but I'm not sure how well interoperability would work if two projects have different ideas on tip consensus 🤔
Or do you with application here mean the ceramic node? As in, the stream type implementation would decide on how to solve conflicts? I think that would make sense if so, as there may be other valid interpretations of this depending on the stream type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when I say protocol here I'm referring to the event streaming protocol. Stream type handlers is an application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes full-on sense then 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the
prev
field? I don't follow why the protocol needs to care about "winning"?
e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it'sprev
array when it's being constructed.
To answer your question, @oed, there are two reasons for this:
- It is simpler for applications to just be given a tip per some default, predictable algorithm. They can choose to override this order, but don't have to.
- There is always some eventually consistent state of a diverged stream, even if the controller never comes back to create the Merge Event, because the default precedence rules can be used to determine the tip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the application need to be given all tips anyway? In order to include them all in the prev
array? Are we simply talking about the ordering of the CIDs in the returned array here?
CIPs/cip-145.md
Outdated
3. This branch later forks into additional branches for `Data E`, `Time 4`, and `Data F`. Based on rules (3) and (4), | ||
the branches for `Data E` and `Data F` are the only branches considered for tip selection. | ||
data:image/s3,"s3://crabby-images/ce4de/ce4de6366576e2ca07c63b5e83f5c18e13945b51" alt="Alt text" | ||
4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be rule 4.
4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is | |
4. Based on rule (4), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we end up in the case where there is a yet-unknown, earlier anchor for the other branch in transit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, yes, it should be rule 4. It is time that made the decision because a Data Event without a Time Event is as if it occurred at time infinity.
Can we end up in the case where there is a yet-unknown, earlier anchor for the other branch in transit?
Yes, that's possible. If there was an as of yet unknown Time 5.5
corresponding to Data F
, then Time 5.5
becomes the tip.
While a lot less likely in the absence of a malicious CAS attempting a late-publishing attack, it is also possible for example for a Time 1.5
covering Data B
to be discovered late. This would rewind the state of the stream, marking Time 5
the tip.
Having said that, this spec provides a way for the application to resolve such a situation without data loss. A user can decide whether to override the default tip with a new event, while keeping the stream history intact.
CIPs/cip-145.md
Outdated
be determined by looking at the `prev` fields of all the Data Events for that stream. | ||
1. If a stream is in a diverged state, each uncovered event is a candidate tip. | ||
2. Branches that do not contain dominant Data Events cannot be the tip. | ||
3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to clarify that it's not just this event that will be included in the new tip, but its corresponding branch. I like first
more than earliest
, because the latter made me think of anchor time instead of ordering.
3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point. | |
3. For branches that contain dominant Data Events, consider the first Data Event after a fork point when electing a new tip branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we actually did mean to refer to anchor time here. This helps keep the language consistent with other places.
We hope this earlier clarification of a fork point
help clarify what we mean:
* A `fork point` for a branch is the earliest event on that branch that is not on another branch.
CIPs/cip-145.md
Outdated
1. If a stream is in a diverged state, each uncovered event is a candidate tip. | ||
2. Branches that do not contain dominant Data Events cannot be the tip. | ||
3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point. | ||
4. The branch with the earliest Data Event becomes the tip. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth mentioning the case for step 4 in the example below, where one candidate branch is anchored and one isn't. Is the anchored one earlier by definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the anchored one is earlier by definition. A Data Event without a Time Event is as if it occurred at time infinity.
We'll update the rule to state this.
CIPs/cip-145.md
Outdated
Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is | ||
only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can | ||
be determined by looking at the `prev` fields of all the Data Events for that stream. | ||
1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, these are the most important features from my perspective:
The reason is that a historical shuffle should not prevent resolution of a once-valid commit, which we should be able to implement in the client using these two properties. Otherwise, there is an avenue for abusing late publish as an undo button. We would like to be able to rely heavily on deterministic resolution of state, and this is OK as long as the commits are preserved and communicated regardless of a consensus change. If I understand this CIP correctly, I think the scope of a late publishing attack would be equated to just adding a new commit on the tip, which is anyway possible to do for the controller. What I'm not sure about though is what the merge of a divergent branch means for the state of the stream, but I'm not sure if this is relevant in this context. |
The two big advantages of this CIP are:
|
Updates based on this forum discussion.
cc @m0ar