Add --output-warmup-metrics flag to cpu userbenchmark scripts #2604

murste01 · 2025-03-31T15:08:19Z

Adds a new --output-warmup-metrics flag which adds warmup metrics to benchmark result JSON files. This allows us to analyse warmup iterations and decide how many are enough.

Adds a new `--output-warmup-metrics` flag which adds warmup metrics to benchmark result JSON files. This allows us to analyse warmup iterations and decide how many are enough.

murste01 · 2025-03-31T15:16:17Z

An example with and without the new flag:

Without --output-warmup-metrics:

Example command:

$ python3 run_benchmark.py cpu -t eval -m llama --precision fp32 --nwarmup 10 --niter 10 -o results
    
Running benchmark: /home/murste01/miniforge3/envs/pytorch/bin/python3 /home/murste01/git/benchmark/userbenchmark/cpu/run_config.py -m llama -d cpu -t eval --precision fp32 --metrics latencies --nwarmup 10 --niter 1
0 -o results
Running TorchBenchModelConfig(name='llama', test='eval', device='cpu', batch_size=None, extra_args=['--precision', 'fp32'], extra_env=None, output_dir=None, skip=False) ... [Done]

Output tree:

...
metrics-20250331150609.json
results/                                                                                                                                                                                                              `-- llama-eval 
    `-- metrics-515.json
...

metrics-20250331150609.json contents:

$ cat metrics-20250331150609.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "llama-eval_latency": 114.53452100000001
    }
}

results/llama-eval/metrics-515.json contents:

$ cat results/llama-eval/metrics-515.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "latency": 114.53452100000001
    }   
}

With --output-warmup-metrics:

Example command:

$ python3 run_benchmark.py cpu -t eval -m llama --precision fp32 --nwarmup 10 --niter 10 -o results --output-warmup-metrics
Running benchmark: /home/murste01/miniforge3/envs/pytorch/bin/python3 /home/murste01/git/benchmark/userbenchmark/cpu/run_config.py -m llama -d cpu -t eval --precision fp32 --output-warmup-metrics --metrics latencies --nwarmup 10 --niter 10 -o results
Running TorchBenchModelConfig(name='llama', test='eval', device='cpu', batch_size=None, extra_args=['--precision', 'fp32'], extra_env=None, output_dir=None, skip=False) ... [Done]

Output tree:

metrics-20250331150656.json
results/
`-- llama-eval
    `-- metrics-710.json

metrics-20250331150656.json contents:

$ cat metrics-20250331150656.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "llama-eval_warmup_latency": 109.78462999999999,
        "llama-eval_latency": 108.5176915
    }
}

results/llama-eval/metrics-710.json contents:

$ cat results/llama-eval/metrics-710.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "warmup_latency": 109.78462999999999,
        "latency": 108.5176915
    }
}

murste01 · 2025-03-31T15:16:58Z

cc: @FindHao

Thanks in advance!

FindHao · 2025-03-31T22:59:35Z

torchbenchmark/util/experiment/metrics.py

@@ -42,13 +44,10 @@ def maybe_synchronize(device: str):

 def get_latencies(
    func, device: str, nwarmup=WARMUP_ROUNDS, num_iter=BENCHMARK_ITERS
-) -> List[float]:
+) -> Tuple[List[float], List[float]]:


I'm concerned that this PR introduces too significant a change to the core APIs, not only for this line. As an alternative, can you consider adding an option to skip the warmup phase and use the actual run results as the 'warmup' results?

murste01 · 2025-04-08T13:59:25Z

I've decided to drop this change in favour of using --output-iter-metrics + --nwarmup 0 + --niter $N and removing the first $M runs externally.

Thanks @FindHao.

Add --output-warmup-metrics flag to cpu userbenchmark scripts

491f9e7

Adds a new `--output-warmup-metrics` flag which adds warmup metrics to benchmark result JSON files. This allows us to analyse warmup iterations and decide how many are enough.

facebook-github-bot added the cla signed label Mar 31, 2025

FindHao requested changes Mar 31, 2025

View reviewed changes

murste01 closed this Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --output-warmup-metrics flag to cpu userbenchmark scripts #2604

Add --output-warmup-metrics flag to cpu userbenchmark scripts #2604

murste01 commented Mar 31, 2025

murste01 commented Mar 31, 2025

murste01 commented Mar 31, 2025

FindHao Mar 31, 2025 •

edited

Loading

murste01 commented Apr 8, 2025

Add --output-warmup-metrics flag to cpu userbenchmark scripts #2604

Add --output-warmup-metrics flag to cpu userbenchmark scripts #2604

Conversation

murste01 commented Mar 31, 2025

murste01 commented Mar 31, 2025

murste01 commented Mar 31, 2025

FindHao Mar 31, 2025 • edited Loading

Choose a reason for hiding this comment

murste01 commented Apr 8, 2025

FindHao Mar 31, 2025 •

edited

Loading