-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Improve p2p tx propagation functional test #9762
base: master
Are you sure you want to change the base?
Conversation
for daemon in [daemon2, daemon3]: | ||
# Due to Dandelion++, the network propagates transactions with a | ||
# random delay, so poll for the transaction with a timeout | ||
timeout = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtnerd Do you know the expected maximum propagation delay in this scenario, based on the D++ paper? That value would act as an approximate lower bound for the timeout, which could then be further padded to reflect physical realities of data transmission and processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delay should usually trigger the inbound fluff delay where it's using a poisson distribution, simply mimicking what Bitcoin was doing for the same situation. 95% of the values are in the 3-7.3 second range. A delay of 16 is basically nearly impossibl, so this should be acceptable.
As an explanation of why not exponential here - one of the nodes will have no outbound peers. I don't recall the paper stating how to handle the situation, so I decided to make it immediately fluff, as it would be an edge case. A fluff does randomized poisson delays, similar to what Bitcoin was doing at the time. I don't recall the d++ specifying how to handle fluff precisely either, maybe I need to revisit that paper.
I guess you were convinced the failure was primarily the sleep timeout? That was my assessment, as it seemed like an obvious issue. |
eb9cb97
to
6d147d1
Compare
I think the low, fixed timeout generates quite a few false positive failures, consistent with the observed behavior of this test in CI. If any actual transaction propagation errors exist, I would expect they do not owe to recent connection management commits. I just modified the test to better differentiate propagation to 0-2 daemons, which might make it a bit more useful. |
6d147d1
to
216523d
Compare
Reduce the likelihood of false positive failures in the p2p transaction propagation functional test by waiting up to a maximum timeout for a transaction to propagate, rather than using a fixed timeout, to reflect the random delay of Dandelion++ transaction propagation. This strategy also speeds test execution in cases where propagation occurs faster than the previously expected fixed delay.
216523d
to
f918d48
Compare
Test failure is a known-flaky unit test ( |
@vtnerd I made one minor change to show how many daemons see the transactions, but I think this will clear up the p2p test issue. |
An example failure in a recent CI build (without this change): https://github.com/monero-project/monero/actions/runs/13163403906/job/36738631593?pr=9771. Every failure I've seen on CI (and I've seen quite a few, at this point) is the same behavior, it's not the RPC refusing the connection, or generating a garbage response, it's simply the transaction not appearing in the pool by design; the failures are successes and the test has a wrong methodology. |
Reduce the likelihood of false positive failures in the p2p transaction propagation functional test by waiting up to a maximum timeout for a transaction to propagate, rather than using a fixed timeout, to reflect the random delay of Dandelion++ transaction propagation. This strategy also speeds test execution in cases where propagation occurs faster than the previously expected fixed delay.