-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(e2e): support multiple aggregators in the e2e tests #2378
base: main
Are you sure you want to change the base?
Conversation
99077bb
to
9ad40fc
Compare
cf5d195
to
7c6a300
Compare
2ff1fe1
to
e24e007
Compare
First aggregator is 'master', and others (if any) are 'slave' to the 'master'.
…ggregators Better P2P relays topology and fix log files collisions.
By providing information about the targeted aggregator in logs and errors.
Which could prevent signature from signers even with loose protocol parameters.
Which can be 'Passthrough' or 'P2P'.
As master/slave signer registration is only one of the configurations to be tested.
Until we can fix the source of flakiness.
de78895
to
42b4d01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just some remarks
@@ -940,7 +940,7 @@ mod tests { | |||
runner | |||
.expect_inform_new_epoch() | |||
.with(predicate::eq(new_time_point_clone.clone().epoch)) | |||
.once() | |||
.times(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to change number of calls ?
Modification on the state_machine seems to concern only the slave mode.
Does it mean we are running a slave ?
Test name say that it is a master: "idle_new_epoch_detected_and_master_has_transitioned_to_epoch"
Epoch(*index), | ||
EpochFixtures { | ||
registering: &fixtures[&Epoch(*index)], | ||
next_signing: if *index >= 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use checked_sub to avoid the 'if'.
index.checked_sub(1).and_then(|e| fixtures.get(&Epoch(e)));
or
index.checked_sub(1).map(Epoch).and_then(|e| fixtures.get(&e));
I'm not sure it's more readable but we centralized the epoch computation instead of making it twice (in comparison and in substraction)
match signer_relay_mode { | ||
SignerRelayMode::P2P => { | ||
repeater.set_message(register_signer_message.clone()).await; | ||
match tx.send(register_signer_message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match on tx.send seems to be a technical stuff (expect perhaps the StatusCode returned on sucess) and we may extract it into a function.
It'll simplify the function and help to understand faster what it does.
The new function can be reuse for register_signatures_handler
debug!(logger, "Serve HTTP route /register-signatures"; "signer_relay_mode" => ?signer_relay_mode, "register_signature_message" => #?register_signature_message); | ||
|
||
match signer_relay_mode { | ||
SignerRelayMode::P2P => match tx.send(register_signature_message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract the match tx.send
into a function
))), | ||
} | ||
} | ||
SignerRelayMode::Passthrough => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no test in this file about this code.
Are tests useless here?
Is it tested elsewhere ?
use super::*; | ||
|
||
#[test] | ||
fn has_master_aggregator_succeeds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what it's tested with this name.
let config = MithrilInfrastructureConfig { | ||
use_relays: true, | ||
relay_signer_registration_mode: SignerRelayMode::Passthrough, | ||
number_of_aggregators: 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may add the case of Passthrough with only 1 aggregator.
It may be not a valid configuration but in that case, why there is code to handle it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"At least two pool nodes are required (one for the aggregator, one for at least one \ | ||
signer), number given: {s}", | ||
)) | ||
fn validate(&self) -> StdResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better to use clap directly to handle argument validation?
@@ -103,11 +108,19 @@ pub struct Args { | |||
#[clap(long)] | |||
run_only: bool, | |||
|
|||
/// Enable P2P network mode | |||
/// Use Mithril relays | |||
#[clap(long)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[clap(long)] | |
#[clap(long, requires_if("number_of_aggregators > 1", "use_relays"))] |
#[clap(long, default_value_t = 3, value_parser = has_at_least_two_pool_nodes)] | ||
number_of_pool_nodes: u8, | ||
/// Number of aggregators | ||
#[clap(long, default_value_t = 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[clap(long, default_value_t = 1)] | |
#[clap(long, default_value_t = 1, value_parser = clap::value_parser!(u8).range(1..))] |
number_of_aggregators: u8, | ||
|
||
/// Number of signers | ||
#[clap(long, default_value_t = 2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[clap(long, default_value_t = 2)] | |
#[clap(long, default_value_t = 1, value_parser = clap::value_parser!(u8).range(1..))] |
@@ -193,13 +210,18 @@ async fn main_exec() -> StdResult<()> { | |||
fs::create_dir(&path).expect("Artifacts dir creation failure"); | |||
path | |||
}; | |||
let store_dir = { | |||
let path = work_dir.join("stores"); | |||
fs::create_dir(&path).expect("Stores dir creation failure"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error handling approach seems to prevent capturing the underlying reason for the error. Would it be possible to use anyhow with context instead and preserve the stack trace?
fs::create_dir(&path).expect("Stores dir creation failure"); | |
fs::create_dir(&path).with_context(|| "Failed to create stores directory")?; |
let path = work_dir.join("stores"); | ||
fs::create_dir(&path).expect("Stores dir creation failure"); | ||
path | ||
}; | ||
|
||
let mut app = App::new(); | ||
let mut app_stopper = AppStopper::new(&app); | ||
let mut join_set = JoinSet::new(); | ||
with_gracefull_shutdown(&mut join_set); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo not introduced with this PR.
with_gracefull_shutdown(&mut join_set); | |
with_graceful_shutdown(&mut join_set); |
@@ -38,7 +38,7 @@ impl Epoch { | |||
pub const CARDANO_STAKE_DISTRIBUTION_SNAPSHOT_OFFSET: u64 = 2; | |||
|
|||
/// The epoch offset used to retrieve the epoch at which a signer has registered to the master aggregator. | |||
pub const SIGNER_MASTER_SYNCHRONIZATION_OFFSET: u64 = 1; | |||
pub const SIGNER_MASTER_SYNCHRONIZATION_OFFSET: u64 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the declaration of this constant still useful with a value of 0
? Same question for the offset_to_master_synchronization_epoch
function that use it.
@@ -35,6 +35,7 @@ impl<M: Clone + Debug + Sync + Send + 'static> MessageRepeater<M> { | |||
} | |||
|
|||
/// Set the message to repeat | |||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this attribute is not required.
let number_of_aggregators = config.number_of_aggregators as usize; | ||
let number_of_signers = config.number_of_signers as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be easier to declare the clap
arguments and types in the MithrilInfrastructureConfig
structure as usize
to avoid conversion?
// This should be removed when the aggregator is able to synchronize its certificate chain from another aggregator | ||
if !aggregator.is_first() { | ||
tokio::time::sleep(std::time::Duration::from_millis( | ||
5 * aggregator.mithril_run_interval() as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extracting the hardcoded value 5
into a named constant would improve readability and maintability?
Content
This PR includes the adaptation of the e2e tests to support multiple aggregators:
Passthrough
(messages are sent to the configured aggregator endpoint) andP2P
(messages are sent to the P2P network) modes for both the signer registration and signature registration. The configuration options have been updated in that sensenumber_of_aggregators
andnumber_of_signers
are specified instead ofnumber_of_pool_nodes
use_p2pmode
has been replaced by more appropriateuse_relays
relay_signer_registration_mode
andrelay-signature_registration_mode
have been added (used with theuse_relays
option)RunOnly
mode of the e2e test has been adapted to support concurrently multiple aggregatorsSpec
mode of the e2e test has been adapted to support concurrently multiple aggregatorsPre-submit checklist
Issue(s)
Closes #2361