Description
Describe the bug
Dear, I ran into an inconsistency in results. Actually, the run-time
in terminal's output shows different values than total-time
in .pymon
.
To Reproduce
import time
import pytest
def wrapper_function(foo: int) -> None:
accumulator = []
for _ in range(1000):
accumulator = accumulator + [_]
class TestClass:
@staticmethod
def setup_stuff(foo: int) -> int:
if foo == 1:
time.sleep(10)
return foo
@pytest.mark.parametrize("foo", [1])
@pytest.mark.parametrize("rounds, iterations", [(1, 1), (1, 10), (10, 1), (10, 10)])
def test(self, benchmark, foo: int, rounds: int, iterations: int) -> None:
TestClass.setup_stuff(foo)
benchmark.pedantic(
wrapper_function,
kwargs=dict(foo=foo),
iterations=iterations,
rounds=rounds,
warmup_rounds=0,
)
executed with pytest mwe.py -v
output
============================= test session starts ==============================
platform linux -- Python 3.12.8, pytest-8.3.3, pluggy-1.5.0 -- /home/.....
cachedir: .pytest_cache
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/azureuser/projects/bigplan-algorithm
plugins: monitor-1.6.1, benchmark-5.1.0, xdist-3.6.1
collecting ... collected 4 items
mwe.py::TestClass::test[1-1-1] PASSED [ 25%]
mwe.py::TestClass::test[1-10-1] PASSED [ 50%]
mwe.py::TestClass::test[10-1-1] PASSED [ 75%]
mwe.py::TestClass::test[10-10-1] PASSED [100%]
------------------------------------------------------------------------------------------ benchmark: 4 tests -----------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test[10-1-1] 849.1300 (1.0) 1,331.9500 (1.45) 905.0879 (1.05) 150.1471 (inf) 858.7840 (1.0) 14.8370 (inf) 1;1 1,104.8651 (0.96) 10 1
test[10-10-1] 852.9680 (1.00) 937.6166 (1.02) 865.4228 (1.0) 25.4941 (inf) 858.9587 (1.00) 5.1887 (inf) 1;1 1,155.5045 (1.0) 10 10
test[1-10-1] 919.9207 (1.08) 919.9207 (1.0) 919.9207 (1.06) 0.0000 (1.0) 919.9207 (1.07) 0.0000 (1.0) 0;0 1,087.0502 (0.94) 1 10
test[1-1-1] 1,395.1670 (1.64) 1,395.1670 (1.52) 1,395.1670 (1.61) 0.0000 (1.0) 1,395.1670 (1.62) 0.0000 (1.0) 0;0 716.7601 (0.62) 1 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
============================== 4 passed in 40.21s ==============================
Expected behavior
Now, when I open .pymon
file and find that test, I will end up with four runs, each one taking little bit over 10
seconds, i.e.
item_variant total_time
test[1, 1, 1] 10.005931615829468
test[1, 10, 1] 10.01454782485962
test[10, 1, 1] 10.013768434524536
test[10, 10, 1] 10.092158317565918
I wonder why the database stores the whole run-time of the tested method (given parameters input). Pytest measures each single round, iteration, etc., providing expected time of a single benchmark.pednatic(...)
call. However, total_time
incorporates even setup time, i.e. TestClass::setup_stuff
.
If this is a feature, can I somehow get run-time per benchmark.pedantic(...)
? Sure, I can divide total_time
by rounds * iteration
but that would not be correct as setup time would be divided as well.
While this is a trivial MWE with a trivial example, my real use-case is dealing with much bigger data. Let's imagine that I'm passing path to a huge file. In the setup phase of a test method, I load this file. Passing a loaded file directly via parametrization/fixtures is possible in theory; however, then, I would need to load all huge files to memory, keep them as parameters, and then use in the test. (Besides memory / cost requirements, it could force some memory processes on background, e.g. swaps).
Desktop (please complete the following information):
linux
Python 3.12.8
pytest-8.3.3
monitor-1.6.1
benchmark-5.1.0
xdist-3.6.1
Additional context
If this is a feature with expected behavior, please refer me to the manual.