Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --output-warmup-metrics flag to cpu userbenchmark scripts #2604

Conversation

murste01
Copy link
Contributor

Adds a new --output-warmup-metrics flag which adds warmup metrics to benchmark result JSON files. This allows us to analyse warmup iterations and decide how many are enough.

Adds a new `--output-warmup-metrics` flag which adds warmup metrics to
benchmark result JSON files. This allows us to analyse warmup iterations
and decide how many are enough.
@murste01
Copy link
Contributor Author

An example with and without the new flag:

Without --output-warmup-metrics:

Example command:

$ python3 run_benchmark.py cpu -t eval -m llama --precision fp32 --nwarmup 10 --niter 10 -o results
    
Running benchmark: /home/murste01/miniforge3/envs/pytorch/bin/python3 /home/murste01/git/benchmark/userbenchmark/cpu/run_config.py -m llama -d cpu -t eval --precision fp32 --metrics latencies --nwarmup 10 --niter 1
0 -o results
Running TorchBenchModelConfig(name='llama', test='eval', device='cpu', batch_size=None, extra_args=['--precision', 'fp32'], extra_env=None, output_dir=None, skip=False) ... [Done]

Output tree:

...
metrics-20250331150609.json
results/                                                                                                                                                                                                              `-- llama-eval 
    `-- metrics-515.json
... 

metrics-20250331150609.json contents:

$ cat metrics-20250331150609.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "llama-eval_latency": 114.53452100000001
    }
}

results/llama-eval/metrics-515.json contents:

$ cat results/llama-eval/metrics-515.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "latency": 114.53452100000001
    }   
}   

With --output-warmup-metrics:

Example command:

$ python3 run_benchmark.py cpu -t eval -m llama --precision fp32 --nwarmup 10 --niter 10 -o results --output-warmup-metrics
Running benchmark: /home/murste01/miniforge3/envs/pytorch/bin/python3 /home/murste01/git/benchmark/userbenchmark/cpu/run_config.py -m llama -d cpu -t eval --precision fp32 --output-warmup-metrics --metrics latencies --nwarmup 10 --niter 10 -o results
Running TorchBenchModelConfig(name='llama', test='eval', device='cpu', batch_size=None, extra_args=['--precision', 'fp32'], extra_env=None, output_dir=None, skip=False) ... [Done]

Output tree:

metrics-20250331150656.json
results/
`-- llama-eval
    `-- metrics-710.json

metrics-20250331150656.json contents:

$ cat metrics-20250331150656.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "llama-eval_warmup_latency": 109.78462999999999,
        "llama-eval_latency": 108.5176915
    }
}

results/llama-eval/metrics-710.json contents:

$ cat results/llama-eval/metrics-710.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "2236df1770800ffea5697b11b0bb0d910b2e59e1"
    },
    "metrics": {
        "warmup_latency": 109.78462999999999,
        "latency": 108.5176915
    }
}

@murste01
Copy link
Contributor Author

cc: @FindHao

Thanks in advance!

@@ -42,13 +44,10 @@ def maybe_synchronize(device: str):

def get_latencies(
func, device: str, nwarmup=WARMUP_ROUNDS, num_iter=BENCHMARK_ITERS
) -> List[float]:
) -> Tuple[List[float], List[float]]:
Copy link
Member

@FindHao FindHao Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this PR introduces too significant a change to the core APIs, not only for this line. As an alternative, can you consider adding an option to skip the warmup phase and use the actual run results as the 'warmup' results?

@murste01
Copy link
Contributor Author

murste01 commented Apr 8, 2025

I've decided to drop this change in favour of using --output-iter-metrics + --nwarmup 0 + --niter $N and removing the first $M runs externally.

Thanks @FindHao.

@murste01 murste01 closed this Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants