Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Improve p2p tx propagation functional test #9762

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iamamyth
Copy link

@iamamyth iamamyth commented Feb 1, 2025

Reduce the likelihood of false positive failures in the p2p transaction propagation functional test by waiting up to a maximum timeout for a transaction to propagate, rather than using a fixed timeout, to reflect the random delay of Dandelion++ transaction propagation. This strategy also speeds test execution in cases where propagation occurs faster than the previously expected fixed delay.

for daemon in [daemon2, daemon3]:
# Due to Dandelion++, the network propagates transactions with a
# random delay, so poll for the transaction with a timeout
timeout = 16
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtnerd Do you know the expected maximum propagation delay in this scenario, based on the D++ paper? That value would act as an approximate lower bound for the timeout, which could then be further padded to reflect physical realities of data transmission and processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delay should usually trigger the inbound fluff delay where it's using a poisson distribution, simply mimicking what Bitcoin was doing for the same situation. 95% of the values are in the 3-7.3 second range. A delay of 16 is basically nearly impossibl, so this should be acceptable.

As an explanation of why not exponential here - one of the nodes will have no outbound peers. I don't recall the paper stating how to handle the situation, so I decided to make it immediately fluff, as it would be an edge case. A fluff does randomized poisson delays, similar to what Bitcoin was doing at the time. I don't recall the d++ specifying how to handle fluff precisely either, maybe I need to revisit that paper.

@vtnerd
Copy link
Contributor

vtnerd commented Feb 2, 2025

I guess you were convinced the failure was primarily the sleep timeout? That was my assessment, as it seemed like an obvious issue.

@iamamyth iamamyth force-pushed the tests-p2p-tx-propagation branch from eb9cb97 to 6d147d1 Compare February 2, 2025 06:26
@iamamyth
Copy link
Author

iamamyth commented Feb 2, 2025

I think the low, fixed timeout generates quite a few false positive failures, consistent with the observed behavior of this test in CI. If any actual transaction propagation errors exist, I would expect they do not owe to recent connection management commits. I just modified the test to better differentiate propagation to 0-2 daemons, which might make it a bit more useful.

@iamamyth iamamyth force-pushed the tests-p2p-tx-propagation branch from 6d147d1 to 216523d Compare February 2, 2025 08:29
Reduce the likelihood of false positive failures in the p2p
transaction propagation functional test by waiting up to a
maximum timeout for a transaction to propagate, rather than using a
fixed timeout, to reflect the random delay of Dandelion++ transaction
propagation. This strategy also speeds test execution in cases where
propagation occurs faster than the previously expected fixed delay.
@iamamyth iamamyth force-pushed the tests-p2p-tx-propagation branch from 216523d to f918d48 Compare February 2, 2025 19:29
@iamamyth
Copy link
Author

iamamyth commented Feb 2, 2025

Test failure is a known-flaky unit test (node_server.race_condition) unrelated to this PR.

@iamamyth
Copy link
Author

iamamyth commented Feb 5, 2025

@vtnerd I made one minor change to show how many daemons see the transactions, but I think this will clear up the p2p test issue.

@iamamyth
Copy link
Author

iamamyth commented Feb 6, 2025

An example failure in a recent CI build (without this change): https://github.com/monero-project/monero/actions/runs/13163403906/job/36738631593?pr=9771.

Every failure I've seen on CI (and I've seen quite a few, at this point) is the same behavior, it's not the RPC refusing the connection, or generating a garbage response, it's simply the transaction not appearing in the pool by design; the failures are successes and the test has a wrong methodology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants