Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(e2e): support multiple aggregators in the e2e tests #2378

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

jpraynaud
Copy link
Member

@jpraynaud jpraynaud commented Mar 18, 2025

Content

This PR includes the adaptation of the e2e tests to support multiple aggregators:

  • The relay has been updated to support both Passthrough (messages are sent to the configured aggregator endpoint) and P2P (messages are sent to the P2P network) modes for both the signer registration and signature registration. The configuration options have been updated in that sense
  • The end to end test configuration has evolved:
    • number_of_aggregators and number_of_signers are specified instead of number_of_pool_nodes
    • use_p2pmode has been replaced by more appropriate use_relays
    • relay_signer_registration_mode and relay-signature_registration_mode have been added (used with the use_relays option)
  • RunOnly mode of the e2e test has been adapted to support concurrently multiple aggregators
  • Spec mode of the e2e test has been adapted to support concurrently multiple aggregators
  • Slave registration of the aggregator has been fixed as it was not targeting the correct verification keys. The associated integration test has been rewritten for finer testing of evolving Mithril stake distribution for each epoch.

Pre-submit checklist

  • Branch
    • Tests are provided (if possible)
    • Crates versions are updated (if relevant)
    • CHANGELOG file is updated (if relevant)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
  • PR
    • All check jobs of the CI have succeeded
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested
  • Documentation
    • No new TODOs introduced

Issue(s)

Closes #2361

@jpraynaud jpraynaud self-assigned this Mar 18, 2025
Copy link

github-actions bot commented Mar 18, 2025

Test Results

    3 files  ±0     57 suites  ±0   11m 34s ⏱️ -1s
1 774 tests +2  1 774 ✅ +2  0 💤 ±0  0 ❌ ±0 
2 172 runs  +2  2 172 ✅ +2  0 💤 ±0  0 ❌ ±0 

Results for commit 42b4d01. ± Comparison against base commit 81e90f7.

♻️ This comment has been updated with latest results.

@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 99077bb to 9ad40fc Compare March 18, 2025 16:50
@jpraynaud jpraynaud temporarily deployed to testing-preview March 18, 2025 17:01 — with GitHub Actions Inactive
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 5 times, most recently from cf5d195 to 7c6a300 Compare March 19, 2025 17:47
@jpraynaud jpraynaud temporarily deployed to testing-preview March 19, 2025 17:58 — with GitHub Actions Inactive
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 3 times, most recently from 2ff1fe1 to e24e007 Compare March 20, 2025 17:49
First aggregator is 'master', and others (if any) are 'slave' to the
'master'.
…ggregators

Better P2P relays topology and fix log files collisions.
By providing information about the targeted aggregator in logs and errors.
Which could prevent signature from signers even with loose protocol parameters.
Which can be 'Passthrough' or 'P2P'.
@jpraynaud jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 2 times, most recently from de78895 to 42b4d01 Compare March 20, 2025 18:34
@jpraynaud jpraynaud marked this pull request as ready for review March 20, 2025 18:35
Copy link
Collaborator

@sfauvel sfauvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Just some remarks

@@ -940,7 +940,7 @@ mod tests {
runner
.expect_inform_new_epoch()
.with(predicate::eq(new_time_point_clone.clone().epoch))
.once()
.times(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to change number of calls ?
Modification on the state_machine seems to concern only the slave mode.
Does it mean we are running a slave ?
Test name say that it is a master: "idle_new_epoch_detected_and_master_has_transitioned_to_epoch"

Epoch(*index),
EpochFixtures {
registering: &fixtures[&Epoch(*index)],
next_signing: if *index >= 1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use checked_sub to avoid the 'if'.
index.checked_sub(1).and_then(|e| fixtures.get(&Epoch(e)));
or
index.checked_sub(1).map(Epoch).and_then(|e| fixtures.get(&e));

I'm not sure it's more readable but we centralized the epoch computation instead of making it twice (in comparison and in substraction)

match signer_relay_mode {
SignerRelayMode::P2P => {
repeater.set_message(register_signer_message.clone()).await;
match tx.send(register_signer_message) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match on tx.send seems to be a technical stuff (expect perhaps the StatusCode returned on sucess) and we may extract it into a function.
It'll simplify the function and help to understand faster what it does.
The new function can be reuse for register_signatures_handler

debug!(logger, "Serve HTTP route /register-signatures"; "signer_relay_mode" => ?signer_relay_mode, "register_signature_message" => #?register_signature_message);

match signer_relay_mode {
SignerRelayMode::P2P => match tx.send(register_signature_message) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the match tx.send into a function

))),
}
}
SignerRelayMode::Passthrough => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test in this file about this code.
Are tests useless here?
Is it tested elsewhere ?

use super::*;

#[test]
fn has_master_aggregator_succeeds() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what it's tested with this name.

let config = MithrilInfrastructureConfig {
use_relays: true,
relay_signer_registration_mode: SignerRelayMode::Passthrough,
number_of_aggregators: 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may add the case of Passthrough with only 1 aggregator.
It may be not a valid configuration but in that case, why there is code to handle it ?

Copy link
Collaborator

@dlachaume dlachaume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"At least two pool nodes are required (one for the aggregator, one for at least one \
signer), number given: {s}",
))
fn validate(&self) -> StdResult<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better to use clap directly to handle argument validation?

@@ -103,11 +108,19 @@ pub struct Args {
#[clap(long)]
run_only: bool,

/// Enable P2P network mode
/// Use Mithril relays
#[clap(long)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[clap(long)]
#[clap(long, requires_if("number_of_aggregators > 1", "use_relays"))]

#[clap(long, default_value_t = 3, value_parser = has_at_least_two_pool_nodes)]
number_of_pool_nodes: u8,
/// Number of aggregators
#[clap(long, default_value_t = 1)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[clap(long, default_value_t = 1)]
#[clap(long, default_value_t = 1, value_parser = clap::value_parser!(u8).range(1..))]

number_of_aggregators: u8,

/// Number of signers
#[clap(long, default_value_t = 2)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[clap(long, default_value_t = 2)]
#[clap(long, default_value_t = 1, value_parser = clap::value_parser!(u8).range(1..))]

@@ -193,13 +210,18 @@ async fn main_exec() -> StdResult<()> {
fs::create_dir(&path).expect("Artifacts dir creation failure");
path
};
let store_dir = {
let path = work_dir.join("stores");
fs::create_dir(&path).expect("Stores dir creation failure");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error handling approach seems to prevent capturing the underlying reason for the error. Would it be possible to use anyhow with context instead and preserve the stack trace?

Suggested change
fs::create_dir(&path).expect("Stores dir creation failure");
fs::create_dir(&path).with_context(|| "Failed to create stores directory")?;

let path = work_dir.join("stores");
fs::create_dir(&path).expect("Stores dir creation failure");
path
};

let mut app = App::new();
let mut app_stopper = AppStopper::new(&app);
let mut join_set = JoinSet::new();
with_gracefull_shutdown(&mut join_set);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo not introduced with this PR.

Suggested change
with_gracefull_shutdown(&mut join_set);
with_graceful_shutdown(&mut join_set);

@@ -38,7 +38,7 @@ impl Epoch {
pub const CARDANO_STAKE_DISTRIBUTION_SNAPSHOT_OFFSET: u64 = 2;

/// The epoch offset used to retrieve the epoch at which a signer has registered to the master aggregator.
pub const SIGNER_MASTER_SYNCHRONIZATION_OFFSET: u64 = 1;
pub const SIGNER_MASTER_SYNCHRONIZATION_OFFSET: u64 = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the declaration of this constant still useful with a value of 0? Same question for the offset_to_master_synchronization_epoch function that use it.

@@ -35,6 +35,7 @@ impl<M: Clone + Debug + Sync + Send + 'static> MessageRepeater<M> {
}

/// Set the message to repeat
#[allow(dead_code)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this attribute is not required.

Comment on lines +96 to +97
let number_of_aggregators = config.number_of_aggregators as usize;
let number_of_signers = config.number_of_signers as usize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to declare the clap arguments and types in the MithrilInfrastructureConfig structure as usize to avoid conversion?

// This should be removed when the aggregator is able to synchronize its certificate chain from another aggregator
if !aggregator.is_first() {
tokio::time::sleep(std::time::Duration::from_millis(
5 * aggregator.mithril_run_interval() as u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extracting the hardcoded value 5 into a named constant would improve readability and maintability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2e tests adaptation for multiple aggregators
3 participants