Skip to content

spec: revisit the "trust" relation between consensus logic and driver #383

@cason

Description

@cason

In Brussels, some people asked us why we decided to split the consensus logic in two components, the consensus state-machine, and the so-called driver.

In my mind, the idea of the driver is to collect single events received from the network (e.g., single votes) and deliver them to consensus when they become a complex event, composed by multiple single events (e.g., votes from a quorum of validators).

But as the modelling and implementation has evolved, I noticed that the role of the consensus component has became less and less relevant because the "complex event" have became more and more complex. For instance, if we see a Polka on the driver, we don't deliver it to consensus if we don't have the associated full-value, because otherwise it will not trigger a state transition (it might schedule a timeout, true, but this is not my point here).

My point here is that the events provided by the driver to the consensus component have become very opaque. It is like the consensus module (blindly?) trust that we have a Polka for v at round r without having any access to the actual set of Prevote messages that produced the Polka for id(v) not even the proof that the full value v was received and validated.

And this somehow worries me. It is not that I want a new Comet with 2893748 levels of checking for the same information (we really have a quorum, is this really a valid value?) but I think that the consensus module should be less lenient to possible bugs in the driver implementation. By the way, my rationale here is based on the Quint modelling, not in the actual Rust code.


I brought that discussion to @josef-widder, that pointed me out to something really relevant in this separation of concerns. The consensus pseudo-code contains multiple upon causes that might become true at the same time, and the role of the driver, on one of which, is to prioritize events. For instance, if we can decide a value, why to bother the consensus model with information from previous rounds or round steps? This is very good point.

But from this perspective, we might have to consider multiple possible implementations of the driver, in terms of how it handles priorities. And I wonder whether the consensus module has enough logic and information to work properly in the case the driver implementation is not optimal, or, even worse, it is wrong by design or have a bug.

So my general question is: Shouldn't we be able to test the consensus logic by providing to it events that our current design of the driver will not deliver to it (at a given state)? Should the consensus logic store some partial information, that by itself does not produce a state transition (e.g., a Polka for id(v)) and be able to use this partial information to produce state transition (e.g., if afterwards it receives an event indicating that v was received and it is valid)?

This separation of concerns is worrying me somehow. As I see how complex the logic of the driver is (again, having Quint models in mind) and how simple (can I say naïve or even dumb) is the consensus logic. More specifically here, it appears that we have the driver producing all events that lead to a pseudo-code upon clause to be processed. And the consensus logic just... applies the complex event, without basically any check, and produces the state-transition.

I hope I am wrong on this rationale and only don't have enough knowledge of the model and implementation. If so, we can just close this issue. Otherwise, we should consider that the driver may have issues and that consensus is able to identify them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    specRelated to specifications

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions