Skip to content

Remove the commutation cache from the commutation checker#15988

Open
mtreinish wants to merge 1 commit intoQiskit:mainfrom
mtreinish:cache-no-more
Open

Remove the commutation cache from the commutation checker#15988
mtreinish wants to merge 1 commit intoQiskit:mainfrom
mtreinish:cache-no-more

Conversation

@mtreinish
Copy link
Copy Markdown
Member

Summary

An early performance optimization we made in the commutation analysis pass in #3878 was to enable caching of the commutation relations between gates. Back then the commutation was only checked via the matrix multiplication method where we would compose a pair of gates' unitary matrices forwards and backwards and determine if the product of those forward and backward compositions were an identity. This was a fairly costly operation back then for various reasons and the cache enabled a large speedup by avoiding repeated computation unnecessarily. However, since that time > 6 years ago the code base has evolved substantially, including not relying on matrix multiplication based commutation determination as the default. Now we have a pre-computed library of commuting gates and also have special handling for cases where we can know very easily whether gates commute or not. Additionally, all of this code has been ported to rust so the matrix multiplication based approach is not nearly as expensive (although it's not free either).

In this new world the cache is actually doing more harm then good because maintaining the cache, adding extra lookups and hashing while constant time is not free. We've reached a point where all the complexity of the cache is no longer worth it, so this commit removes the commutation cache. One caveat is some aspects of this internal cache have leaked into the public API. Specifically the CommutationChecker class is public, and that includes a documented init argument cache_max_entries as well as two public methods clear_cached_commutations and num_cached_entries. I believe the original intent for these methods was either debugging the cache logic was correct (as they were used in tests) or to enable the user to manage the cache size manually if they so wished. The cache_max_entries argument was used to manage the total memory size for the cache to avoid using to much memory in certain applications. Since these are part of the public documented api we can't remove them without violating our stability guidelines. So instead this commit opts to just make them no-ops or in the case of the num_cached_entries it will always return 0 (since there are no longer any cached entries). These are all marked as deprecated in this PR to mark them for removal in 3.0.

In practice there is a minimal performance difference from this change which means we don't need the extra code anymore, although in some very specific benchmarks a small speedup may be seen (those dominated by commutation checking). However, there is one case where running without a cache can be slower, in cases when there are a large number of gate pairs that involves a gate that we don't know whether it commutes or not without the matrix multiplication (i.e. it's not a known pauli, rotation gate, or in the library) and we have multiple repeated pairs of the same gate. In practice this doesn't come up very frequently because typically in a preset passmanager's workflow we have lowered to all 1q and 2q gates and that lowering involves standard gates we know how to work with in the checker. The only edge case is if there was a circuit with a large number of custom 1q or 2q gates that have matrix definitions in a circuit (which is not common). But, the asv benchmark for commutation analysis will likely show a roughly 5x slowdown with this commit. That benchmark is highlighting this edge case because it is running the pass on a random circuit with gates up to 3 qubits in width which will involve multiple repeated checks via matrix multiplication which does not come up in practice normally. Additionally, unlike in #3878 the regression being flagged is only on the 5x slowdown is on the scale of tens of ms, back in 2020 we could only have dreamed to have the CommutationAnalysis pass execute in < 100ms in asv, let alone a world where a 5x regression flagged in asv would be so quick. The entire pass was at least 2 orders of magnitude slower at that point we introduced the cache.

Details and comments

An early performance optimization we made in the commutation analysis
pass in Qiskit#3878 was to enable caching of the commutation relations between
gates. Back then the commutation was only checked via the matrix
multiplication method where we would compose a pair of gates' unitary
matrices forwards and backwards and determine if the product of those
forward and backward compositions were an identity. This was a fairly
costly operation back then for various reasons and the cache enabled a
large speedup by avoiding repeated computation unnecessarily. However,
since that time > 6 years ago the code base has evolved substantially,
including not relying on matrix multiplication based commutation
determination as the default. Now we have a precomputed library of
commuting gates and also have special handling for cases where we can
know very easily whether gates commute or not. Additionally, all of this
code has been ported to rust so the matrix multiplication based
approach is not nearly as expensive (although it's not free either).

In this new world the cache is actually doing more harm then good because
maintaining the cache, adding extra lookups and hashing while constant
time is not free. We've reached a point where all the complexity of the
cache is no longer worth it, so this commit removes the commutation
cache. One caveat is some aspects of this internal cache have leaked
into the public API. Specifically the CommutationChecker class is
public, and that includes a documented init argument `cache_max_entries`
as well as two public methods `clear_cached_commutations` and
`num_cached_entries`. I believe the original intent for these methods
was either debugging the cache logic was correct (as they were used in
tests) or to enable the user to manage the cache size manually if they
so wished. The `cache_max_entries` argument was used to manage the total
memory size for the cache to avoid using to much memory in certain
applications. Since these are part of the public documented api we
can't remove them without violating our stability guidelines. So instead
this commit opts to just make them no-ops or in the case of the
`num_cached_entries` it will always return 0 (since there are no longer
any cached entries). These are all marked as deprecated in this PR to
mark them for removal in 3.0.

In practice there is a minimal performance difference from this
change which means we don't need the extra code anymore, although in
some very specific benchmarks a small speedup may be seen (those
dominated by commutation checking). However, there is one case where
running without a cache can be slower, in cases when there are a large
number of gate pairs that involves a gate that we don't know
whether it commutes or not without the matrix multiplication (i.e. it's not a
known pauli, rotation gate, or in the library) and we have multiple
repeated pairs of the same gate. In practice this doesn't come up very
frequently because typically in a preset passmanager's workflow we have
lowered to all 1q and 2q gates and that lowering involves standard gates
we know how to work with in the checker. The only edge case is if there
was a circuit with a large number of custom 1q or 2q gates that have
matrix definitions in a circuit (which is not common). But, the asv
benchmark for commutation analysis will likely show a roughly 5x slowdown
with this commit. That benchmark is highlighting this edge case because
it is running the pass on a random circuit with gates up to 3 qubits in
width which will involve multiple repeated checks via matrix
multiplication which does not come up in practice normally. Additionally,
unlike in Qiskit#3878 the regression being flagged is only on the 5x slowdown
is on the scale of tens of ms, back in 2020 we could only have dreamed
to have the CommutationAnalysis pass execute in < 100ms in asv, let alone
a world where a 5x regression flagged in asv would be so quick. The
entire pass was at least 2 orders of magnitude slower at that point we
introduced the cache.
@mtreinish mtreinish added the Changelog: Deprecated Add a "Deprecated" entry in the GitHub Release changelog. label Apr 9, 2026
@mtreinish mtreinish requested a review from a team as a code owner April 9, 2026 22:25
@mtreinish mtreinish added Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Apr 9, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@Cryoris
Copy link
Copy Markdown
Collaborator

Cryoris commented Apr 10, 2026

I'm in big favor or removing the custom hash logic we've implement here too. But given that asv will flag a regression, do you have other benchmarks (I'm assuming you ran benchpress for this?) to support that we can drop the cache?

This method will always return 0 because there is no longer an
internal cache.
"""
return 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a backward compatible API change, since the commutation checker does give the same results, just might take longer, but we are dropping functionality user's might've relied upon. I think it would be nice to point this out more explicitly in the release notes, what do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reword the release note to make it more explicit. Do you want me to add an other note to call it out more?

Functionally the caching only did anything in a very specific case, you had a gate pair of StandardGate that had all float parameters (which includes no params), and at least one gate we didn't account for in the library or another mechanism. You then were repeating the same checks with these gates repeatedly. Outside of this specific case we never cached anything. So I'm not sure how people could be relying on it in practice.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording is good enough, just to point out users would have to do their own caching if they relied on that 👍🏻 I also don't think it's a commonly used thing, but the past has shown that every minuscule feature is used somehow by someone 😄

@mtreinish
Copy link
Copy Markdown
Member Author

I'm in big favor or removing the custom hash logic we've implement here too. But given that asv will flag a regression, do you have other benchmarks (I'm assuming you ran benchpress for this?) to support that we can drop the cache?

Asv only flags a regression on the CommutationAnalysis micro-benchmark which I'd argue is more of a benchmark bug than a real regression. I can dig into which gates the random_circuit() call is using for that benchmark, so we can try to augment the library or other handling to check without relying on the matmul path.

We also don't run the standalone pass anymore, only CommutativeCancellation and hopefully CommutativeOptimization in 2.5, the transpiler benchmarks don't show any significant difference with this PR. I didn't run a full benchpress run because it didn't seem necessary for a change this small.

Here is the asv output of the benchmarks I ran:

Benchmarks that have stayed the same:

| Change   | Before [f23f0bcf] <nalgebra-weyls-are-everywhere^2>   | After [9eb86350]    | Ratio   | Benchmark (Parameter)                                                                                           |
|----------|-------------------------------------------------------|---------------------|---------|-----------------------------------------------------------------------------------------------------------------|
|          | 0                                                     | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                                                   |
|          | 0                                                     | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                                                   |
|          | 0                                                     | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                                                  |
|          | 12.6±0.2ms                                            | 13.0±0.3ms          | 1.03    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                            |
|          | 185±0.7ms                                             | 188±1ms             | 1.02    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 124±0.6ms                                             | 125±0.5ms           | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 29.3±0.2ms                                            | 29.4±0.5ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                         |
|          | 14.4±0.09ms                                           | 14.6±0.05ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)                 |
|          | 19.1±0.1ms                                            | 19.2±0.05ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)                 |
|          | 3.79±0.04ms                                           | 3.81±0.02ms         | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
|          | 3.76±0.03ms                                           | 3.80±0.02ms         | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                                |
|          | 116±0.7ms                                             | 116±0.7ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 110±0.3ms                                             | 109±0.4ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
|          | 200±0.5ms                                             | 199±1ms             | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 175±0.8ms                                             | 175±2ms             | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
|          | 38.5±0.2ms                                            | 38.7±0.1ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
|          | 42.8±0.2ms                                            | 42.9±0.2ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
|          | 36.2±0.2ms                                            | 36.1±0.1ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['u', 'cx', 'id'])                     |
|          | 31.1±0.1ms                                            | 31.1±0.1ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                            |
|          | 143±0.4ms                                             | 143±0.2ms           | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                            |
|          | 34.2±0.2ms                                            | 34.4±0.7ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                         |
|          | 5.62±0.03ms                                           | 5.65±0.03ms         | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                                   |
|          | 5.44±0.01ms                                           | 5.44±0.01ms         | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                                   |
|          | 4.00±0.01ms                                           | 4.02±0.03ms         | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)                 |
|          | 1429                                                  | 1429                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)                     |
|          | 1316                                                  | 1316                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)                     |
|          | 1174                                                  | 1174                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)                     |
|          | 1213                                                  | 1213                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)                     |
|          | 2705                                                  | 2705                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)                            |
|          | 2005                                                  | 2005                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)                            |
|          | 7                                                     | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)                            |
|          | 7                                                     | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)                            |
|          | 11117                                                 | 11117               | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)          |
|          | 5015                                                  | 5015                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)          |
|          | 16                                                    | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)          |
|          | 16                                                    | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)          |
|          | 1035                                                  | 1035                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)                                 |
|          | 817                                                   | 817                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                                 |
|          | 615                                                   | 615                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)                                 |
|          | 634                                                   | 634                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                                 |
|          | 36.4±0.1ms                                            | 36.3±0.3ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                                                          |
|          | 33.5±0.2ms                                            | 33.5±0.3ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                                          |
|          | 11.6±0.1ms                                            | 11.6±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                                         |
|          | 13.4±0.2ms                                            | 13.4±0.1ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                                        |
|          | 3.08±0.07s                                            | 3.06±0.1s           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                                                      |
|          | 3.02±0.09s                                            | 3.01±0.05s          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                                                     |
|          | 223±0.5ms                                             | 223±1ms             | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                                                     |
|          | 225±0.7ms                                             | 224±0.7ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                                                     |
|          | 3.80±0.04ms                                           | 3.81±0.03ms         | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                                 |
|          | 41.2±0.5ms                                            | 41.1±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                                  |
|          | 41.3±0.5ms                                            | 41.4±0.3ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                                 |
|          | 13.3±0.1ms                                            | 13.3±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                                    |
|          | 13.3±0.1ms                                            | 13.3±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
|          | 379±2ms                                               | 378±1ms             | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                                              |
|          | 395±2ms                                               | 393±0.7ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                                             |
|          | 400                                                   | 400                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                                                   |
|          | 400                                                   | 400                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                                                   |
|          | 400                                                   | 400                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                                                  |
|          | 1312                                                  | 1312                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cx')                                               |
|          | 1313                                                  | 1313                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cz')                                               |
|          | 1313                                                  | 1313                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('ecr')                                              |
|          | 300                                                   | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                                                  |
|          | 300                                                   | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                                                  |
|          | 300                                                   | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')                                                 |
|          | 367549                                                | 367549              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cx')                                                    |
|          | 387065                                                | 387065              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cz')                                                    |
|          | 391069                                                | 391069              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('ecr')                                                   |
|          | 1590                                                  | 1590                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                                     |
|          | 1603                                                  | 1603                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                                     |
|          | 1603                                                  | 1603                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                                    |
|          | 2692                                                  | 2692                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                                      |
|          | 2744                                                  | 2744                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                                      |
|          | 2744                                                  | 2744                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                                     |
|          | 2571                                                  | 2571                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                                                       |
|          | 2571                                                  | 2571                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                                                       |
|          | 2571                                                  | 2571                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                                                      |
|          | 480                                                   | 480                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                                        |
|          | 480                                                   | 480                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                                        |
|          | 480                                                   | 480                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                                       |
|          | 121±0.7ms                                             | 120±0.2ms           | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                            |
|          | 3.63±0.01ms                                           | 3.60±0.01ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                                   |
|          | 5.25±0.02ms                                           | 5.23±0.02ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                                   |
|          | 4.18±0.02ms                                           | 4.15±0.03ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)                 |
|          | 18.7±0.3ms                                            | 18.6±0.2ms          | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                                        |
|          | 33.6±0.1ms                                            | 33.2±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                                         |
|          | 13.1±0.1ms                                            | 13.0±0.08ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                                         |
|          | 15.3±0.02s                                            | 15.1±0.01s          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                                                           |
|          | 20.6±0.02s                                            | 20.4±0.02s          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                                                           |
|          | 19.5±0.04s                                            | 19.4±0.01s          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                                                          |
|          | 224±2ms                                               | 223±0.4ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                                                    |
|          | 41.6±0.6ms                                            | 41.0±0.1ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                                  |
|          | 13.4±0.07ms                                           | 13.3±0.1ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                                   |
|          | 151±0.9ms                                             | 149±0.4ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                            |
|          | 178±0.6ms                                             | 177±0.8ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                            |
|          | 170±0.9ms                                             | 169±0.4ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                           |
|          | 301±0.7ms                                             | 296±0.8ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                            |
|          | 397±1ms                                               | 392±0.8ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                                              |
|          | 50.2±0.2ms                                            | 49.6±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                               |
|          | 47.9±0.2ms                                            | 47.6±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                              |
|          | 9.29±0.08ms                                           | 9.14±0.07ms         | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                                        |
|          | 21.3±0.1ms                                            | 20.8±0.2ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                                        |
|          | 3.13±0.07s                                            | 3.07±0.1s           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                                                      |
|          | 250±1ms                                               | 245±0.4ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                             |
|          | 295±0.7ms                                             | 290±0.6ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                             |
|          | 44.8±0.4ms                                            | 43.9±0.06ms         | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                               |
|          | 3.14±0.02ms                                           | 3.04±0.03ms         | 0.97    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                                                          |
|          | 3.12±0.02ms                                           | 3.02±0.01ms         | 0.97    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                                                         |
|          | 3.13±0.03ms                                           | 3.01±0.02ms         | 0.96    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                                                          |
|          | 6.40±0.3ms                                            | 6.07±0.01ms         | 0.95    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                                        |

Benchmarks that have got worse:

| Change   | Before [f23f0bcf] <nalgebra-weyls-are-everywhere^2>   | After [9eb86350]    |   Ratio | Benchmark (Parameter)                                     |
|----------|-------------------------------------------------------|---------------------|---------|-----------------------------------------------------------|
| +        | 2.99±0.01ms                                           | 16.8±0.03ms         |    5.62 | passes.PassBenchmarks.time_commutation_analysis(5, 1024)  |
| +        | 9.48±0.03ms                                           | 52.2±0.06ms         |    5.51 | passes.PassBenchmarks.time_commutation_analysis(14, 1024) |
| +        | 14.2±0.07ms                                           | 76.8±0.09ms         |    5.41 | passes.PassBenchmarks.time_commutation_analysis(20, 1024) |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

What I forgot to put in the commit message was that this PR is actually important for enabling subsequent optimizations. Specifically the caching is why the commutation checker needs to be mutable everywhere it is used right now, which prevents parallelism in the commutation analysis (without copying it everywhere).

@Cryoris
Copy link
Copy Markdown
Collaborator

Cryoris commented Apr 10, 2026

Ok thanks! CommutationAnalysis is another thing we might be able to take the axe to -- or does the parallelized approach rely on precomputing the commutations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Deprecated Add a "Deprecated" entry in the GitHub Release changelog. mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library mod: transpiler Issues and PRs related to Transpiler Rust This PR or issue is related to Rust code in the repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants