Replace networking stack with etcd-based approach #1720

noonio · 2024-10-25T10:14:10Z

Why

We saw in the recent spike #1591 that Raft (via etcd) provides benefits over our present approach: it allows for more instability in the client connections, and more transparency in what is happening.

We would like a resiliant and reliable networking layer, and this seems like a viable approach.

What

Use learnings from the spike to integrate raft via etcd
Rely on the etcd cluster binary
- Do not use the etcdctl binary to send client requests; use a Haskell (client) library instead
The PR Example of the simple dropout example on-top of the raft branch #1786 must run successfully on top of this work (note that it worked on the Spike, and fails on present master)
An ADR is created to supersede previous decisions and document the current path (relevant: https://hydra.family/head-protocol/adr/6, https://hydra.family/head-protocol/adr/17, and https://hydra.family/head-protocol/adr/27)
The (updated) network stack is documented in https://hydra.family/head-protocol/docs/dev/architecture/networking

How

The text was updated successfully, but these errors were encountered:

locallycompact · 2025-02-10T11:40:18Z

What is the difference between our use and cardano-node's use of ouroboros that causes us to seek out this alternate solution? Why can we not improve things there?

noonio · 2025-02-10T12:24:48Z

I think, indeed, we could use cardano-style networking, but that would again be wildly upping the difficulty of the task.

What we have right now was, I suppose, basically just the bare minimum to get it working. This task improves on that somewhat with some more deliverability features; and we know if fixes a problem we care about ; i.e. #1786.

I think in some sense our whole approach to Hydra, and it's networking, could be different and could've focused on direct re-use of the actual cardano-node binary; but given where we are this is a fine incremental improvement, in my view.

locallycompact · 2025-02-10T12:27:26Z

I have created this task, assuming we want to address the use of the CLI with a recommended approach #1843

ch1bo · 2025-02-12T12:37:37Z

What is the difference between our use and cardano-node's use of ouroboros that causes us to seek out this alternate solution? Why can we not improve things there?

@locallycompact We only (mis-)use the same framework to actually send out CBOR-encoded messages over TCP in a fully connected topology. Over this, we tried (and failed?) to implement reliable broadcast primitives. The cardano network works differently as it consists of pull-based, stateful protocols that can deal with network failures by design, i.e. clients need to ask and pull data from (uniform) upstream peers anyways. See the spike #1591 and this monthly report for more details why using etcd with its Raft-consensus is an alternative to reliable broadcast as we tried to achieve.

locallycompact · 2025-02-14T10:43:20Z

I have sketched out both calls to put and to watch using grapesy in the grapesy-etcd repo:

https://github.com/cardano-scaling/grapesy-etcd/blob/bc3517caea2a87f1715e186b9a96d2c6c5779c3a/grapesy-etcd/app/Client.hs#L22-L35

And integrated the put command in hydra that replaces the call to etcdctl in a branch.

63d5260

Replacing the call to etcdctl watch is a little more involved but it suggests removing the json response logic and replacing it with the optics provided by proto-lens.

ch1bo · 2025-02-17T08:24:11Z

Re-open as not done yet

This is a change I encountered when rebasing `raft-network` for #1720 and was useful back in the spike, but would have also been valuable in the `hydra-doom` use case. Anyways, this is adding a `--offline-head-seed` argument to offline mode and fixes the "simulation" opened head to be deterministic across multiple instances, resulting that the nodes can and do talk to each other and consequently sign snapshots. --- * [x] CHANGELOG updated * [x] Documentation updated * [x] Haddocks updated * [x] No new TODOs introduced

locallycompact · 2025-02-18T13:32:43Z

I will provide a "simple" interface to grapesy-etcd cardano-scaling/grapesy-etcd#3

noonio added the 💭 idea An idea or feature request label Oct 25, 2024

github-project-automation bot added this to ☕ Hydra Team Work Oct 25, 2024

github-project-automation bot moved this to In Progress 🕐 in ☕ Hydra Team Work Oct 25, 2024

noonio added this to 🚢 Hydra Head Roadmap Oct 25, 2024

noonio added the amber ⚠️ Medium complexity or partly unclear feature label Oct 25, 2024

noonio moved this from In Progress 🕐 to Later in ☕ Hydra Team Work Oct 25, 2024

noonio added the 💬 feature A feature on our roadmap label Oct 25, 2024

ch1bo mentioned this issue Jan 15, 2025

Heads stuck in a state without being able to progress snapshots #1773

Closed

noonio mentioned this issue Jan 20, 2025

Investigate resiliance of hydra when one node goes offline and a tx is submitted #1792

Closed

noonio assigned ch1bo Jan 31, 2025

noonio moved this from Later to Todo 📋 in ☕ Hydra Team Work Jan 31, 2025

This was referenced Feb 3, 2025

Repro for node-offline issue #1780

Closed

Example of the simple dropout example on-top of the raft branch #1786

Closed

noonio linked a pull request Feb 4, 2025 that will close this issue

Example of the simple dropout example on-top of the raft branch #1786

Closed

noonio removed the amber ⚠️ Medium complexity or partly unclear feature label Feb 4, 2025

ch1bo added green 💚 Low complexity or well understood feature and removed 💭 idea An idea or feature request labels Feb 4, 2025

ch1bo mentioned this issue Feb 12, 2025

Multi party, networked offline heads #1851

Merged

4 tasks

ch1bo moved this from Todo 📋 to In progress 🕐 in ☕ Hydra Team Work Feb 12, 2025

ch1bo linked a pull request Feb 12, 2025 that will close this issue

Write an ADR about network properties and etcd #1852

Merged

ch1bo closed this as completed in #1852 Feb 14, 2025

github-project-automation bot moved this to Done in 🚢 Hydra Head Roadmap Feb 14, 2025

github-project-automation bot moved this from In progress 🕐 to Done ✔ in ☕ Hydra Team Work Feb 14, 2025

ch1bo reopened this Feb 17, 2025

github-project-automation bot moved this from Done ✔ to Triage 🏥 in ☕ Hydra Team Work Feb 17, 2025

ch1bo linked a pull request Feb 17, 2025 that will close this issue

WIP: Replace network with etcd #1854

Draft

4 tasks

noonio moved this from Triage 🏥 to Todo 📋 in ☕ Hydra Team Work Feb 18, 2025

ch1bo moved this from Todo 📋 to In progress 🕐 in ☕ Hydra Team Work Feb 18, 2025

ch1bo removed the status in 🚢 Hydra Head Roadmap Feb 18, 2025

noonio assigned locallycompact Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace networking stack with etcd-based approach #1720

Replace networking stack with etcd-based approach #1720

noonio commented Oct 25, 2024 •

edited by ch1bo

Loading

locallycompact commented Feb 10, 2025

noonio commented Feb 10, 2025 •

edited

Loading

locallycompact commented Feb 10, 2025 •

edited

Loading

ch1bo commented Feb 12, 2025

locallycompact commented Feb 14, 2025

ch1bo commented Feb 17, 2025

locallycompact commented Feb 18, 2025

Replace networking stack with etcd-based approach #1720

Replace networking stack with etcd-based approach #1720

Comments

noonio commented Oct 25, 2024 • edited by ch1bo Loading

Why

What

How

locallycompact commented Feb 10, 2025

noonio commented Feb 10, 2025 • edited Loading

locallycompact commented Feb 10, 2025 • edited Loading

ch1bo commented Feb 12, 2025

locallycompact commented Feb 14, 2025

ch1bo commented Feb 17, 2025

locallycompact commented Feb 18, 2025

noonio commented Oct 25, 2024 •

edited by ch1bo

Loading

noonio commented Feb 10, 2025 •

edited

Loading

locallycompact commented Feb 10, 2025 •

edited

Loading