benchdiff is opinionated, which is great, but this also prevents some sensible usages. For example, currently, there's no way to specify -test.cpus=x.
Rather than introducing one-offs for everything, we should add a general way to pass flags through - which would include bespoke flags the binary supports that benchdiff can't even know about.