Remove the commutation cache from the commutation checker by mtreinish · Pull Request #15988 · Qiskit/qiskit

mtreinish · 2026-04-09T22:25:49Z

Summary

An early performance optimization we made in the commutation analysis pass in #3878 was to enable caching of the commutation relations between gates. Back then the commutation was only checked via the matrix multiplication method where we would compose a pair of gates' unitary matrices forwards and backwards and determine if the product of those forward and backward compositions were an identity. This was a fairly costly operation back then for various reasons and the cache enabled a large speedup by avoiding repeated computation unnecessarily. However, since that time > 6 years ago the code base has evolved substantially, including not relying on matrix multiplication based commutation determination as the default. Now we have a pre-computed library of commuting gates and also have special handling for cases where we can know very easily whether gates commute or not. Additionally, all of this code has been ported to rust so the matrix multiplication based approach is not nearly as expensive (although it's not free either).

In this new world the cache is actually doing more harm then good because maintaining the cache, adding extra lookups and hashing while constant time is not free. We've reached a point where all the complexity of the cache is no longer worth it, so this commit removes the commutation cache. One caveat is some aspects of this internal cache have leaked into the public API. Specifically the CommutationChecker class is public, and that includes a documented init argument cache_max_entries as well as two public methods clear_cached_commutations and num_cached_entries. I believe the original intent for these methods was either debugging the cache logic was correct (as they were used in tests) or to enable the user to manage the cache size manually if they so wished. The cache_max_entries argument was used to manage the total memory size for the cache to avoid using to much memory in certain applications. Since these are part of the public documented api we can't remove them without violating our stability guidelines. So instead this commit opts to just make them no-ops or in the case of the num_cached_entries it will always return 0 (since there are no longer any cached entries). These are all marked as deprecated in this PR to mark them for removal in 3.0.

In practice there is a minimal performance difference from this change which means we don't need the extra code anymore, although in some very specific benchmarks a small speedup may be seen (those dominated by commutation checking). However, there is one case where running without a cache can be slower, in cases when there are a large number of gate pairs that involves a gate that we don't know whether it commutes or not without the matrix multiplication (i.e. it's not a known pauli, rotation gate, or in the library) and we have multiple repeated pairs of the same gate. In practice this doesn't come up very frequently because typically in a preset passmanager's workflow we have lowered to all 1q and 2q gates and that lowering involves standard gates we know how to work with in the checker. The only edge case is if there was a circuit with a large number of custom 1q or 2q gates that have matrix definitions in a circuit (which is not common). But, the asv benchmark for commutation analysis will likely show a roughly 5x slowdown with this commit. That benchmark is highlighting this edge case because it is running the pass on a random circuit with gates up to 3 qubits in width which will involve multiple repeated checks via matrix multiplication which does not come up in practice normally. Additionally, unlike in #3878 the regression being flagged is only on the 5x slowdown is on the scale of tens of ms, back in 2020 we could only have dreamed to have the CommutationAnalysis pass execute in < 100ms in asv, let alone a world where a 5x regression flagged in asv would be so quick. The entire pass was at least 2 orders of magnitude slower at that point we introduced the cache.

Details and comments

An early performance optimization we made in the commutation analysis pass in Qiskit#3878 was to enable caching of the commutation relations between gates. Back then the commutation was only checked via the matrix multiplication method where we would compose a pair of gates' unitary matrices forwards and backwards and determine if the product of those forward and backward compositions were an identity. This was a fairly costly operation back then for various reasons and the cache enabled a large speedup by avoiding repeated computation unnecessarily. However, since that time > 6 years ago the code base has evolved substantially, including not relying on matrix multiplication based commutation determination as the default. Now we have a precomputed library of commuting gates and also have special handling for cases where we can know very easily whether gates commute or not. Additionally, all of this code has been ported to rust so the matrix multiplication based approach is not nearly as expensive (although it's not free either). In this new world the cache is actually doing more harm then good because maintaining the cache, adding extra lookups and hashing while constant time is not free. We've reached a point where all the complexity of the cache is no longer worth it, so this commit removes the commutation cache. One caveat is some aspects of this internal cache have leaked into the public API. Specifically the CommutationChecker class is public, and that includes a documented init argument `cache_max_entries` as well as two public methods `clear_cached_commutations` and `num_cached_entries`. I believe the original intent for these methods was either debugging the cache logic was correct (as they were used in tests) or to enable the user to manage the cache size manually if they so wished. The `cache_max_entries` argument was used to manage the total memory size for the cache to avoid using to much memory in certain applications. Since these are part of the public documented api we can't remove them without violating our stability guidelines. So instead this commit opts to just make them no-ops or in the case of the `num_cached_entries` it will always return 0 (since there are no longer any cached entries). These are all marked as deprecated in this PR to mark them for removal in 3.0. In practice there is a minimal performance difference from this change which means we don't need the extra code anymore, although in some very specific benchmarks a small speedup may be seen (those dominated by commutation checking). However, there is one case where running without a cache can be slower, in cases when there are a large number of gate pairs that involves a gate that we don't know whether it commutes or not without the matrix multiplication (i.e. it's not a known pauli, rotation gate, or in the library) and we have multiple repeated pairs of the same gate. In practice this doesn't come up very frequently because typically in a preset passmanager's workflow we have lowered to all 1q and 2q gates and that lowering involves standard gates we know how to work with in the checker. The only edge case is if there was a circuit with a large number of custom 1q or 2q gates that have matrix definitions in a circuit (which is not common). But, the asv benchmark for commutation analysis will likely show a roughly 5x slowdown with this commit. That benchmark is highlighting this edge case because it is running the pass on a random circuit with gates up to 3 qubits in width which will involve multiple repeated checks via matrix multiplication which does not come up in practice normally. Additionally, unlike in Qiskit#3878 the regression being flagged is only on the 5x slowdown is on the scale of tens of ms, back in 2020 we could only have dreamed to have the CommutationAnalysis pass execute in < 100ms in asv, let alone a world where a 5x regression flagged in asv would be so quick. The entire pass was at least 2 orders of magnitude slower at that point we introduced the cache.

qiskit-bot · 2026-04-09T22:25:54Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

Cryoris · 2026-04-10T06:51:28Z

I'm in big favor or removing the custom hash logic we've implement here too. But given that asv will flag a regression, do you have other benchmarks (I'm assuming you ran benchpress for this?) to support that we can drop the cache?

Cryoris · 2026-04-10T07:12:37Z

qiskit/circuit/commutation_checker.py

+        This method will always return 0 because there is no longer an
+        internal cache.
+        """
+        return 0


So this is a backward compatible API change, since the commutation checker does give the same results, just might take longer, but we are dropping functionality user's might've relied upon. I think it would be nice to point this out more explicitly in the release notes, what do you think?

I can reword the release note to make it more explicit. Do you want me to add an other note to call it out more?

Functionally the caching only did anything in a very specific case, you had a gate pair of StandardGate that had all float parameters (which includes no params), and at least one gate we didn't account for in the library or another mechanism. You then were repeating the same checks with these gates repeatedly. Outside of this specific case we never cached anything. So I'm not sure how people could be relying on it in practice.

Rewording is good enough, just to point out users would have to do their own caching if they relied on that 👍🏻 I also don't think it's a commonly used thing, but the past has shown that every minuscule feature is used somehow by someone 😄

qiskit/circuit/commutation_checker.py

mtreinish · 2026-04-10T11:41:27Z

I'm in big favor or removing the custom hash logic we've implement here too. But given that asv will flag a regression, do you have other benchmarks (I'm assuming you ran benchpress for this?) to support that we can drop the cache?

Asv only flags a regression on the CommutationAnalysis micro-benchmark which I'd argue is more of a benchmark bug than a real regression. I can dig into which gates the random_circuit() call is using for that benchmark, so we can try to augment the library or other handling to check without relying on the matmul path.

We also don't run the standalone pass anymore, only CommutativeCancellation and hopefully CommutativeOptimization in 2.5, the transpiler benchmarks don't show any significant difference with this PR. I didn't run a full benchpress run because it didn't seem necessary for a change this small.

Here is the asv output of the benchmarks I ran:

Benchmarks that have stayed the same:

| Change   | Before [f23f0bcf] <nalgebra-weyls-are-everywhere^2>   | After [9eb86350]    | Ratio   | Benchmark (Parameter)                                                                                           |
|----------|-------------------------------------------------------|---------------------|---------|-----------------------------------------------------------------------------------------------------------------|
|          | 0                                                     | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                                                   |
|          | 0                                                     | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                                                   |
|          | 0                                                     | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                                                  |
|          | 12.6±0.2ms                                            | 13.0±0.3ms          | 1.03    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                            |
|          | 185±0.7ms                                             | 188±1ms             | 1.02    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 124±0.6ms                                             | 125±0.5ms           | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 29.3±0.2ms                                            | 29.4±0.5ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                         |
|          | 14.4±0.09ms                                           | 14.6±0.05ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)                 |
|          | 19.1±0.1ms                                            | 19.2±0.05ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)                 |
|          | 3.79±0.04ms                                           | 3.81±0.02ms         | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
|          | 3.76±0.03ms                                           | 3.80±0.02ms         | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                                |
|          | 116±0.7ms                                             | 116±0.7ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 110±0.3ms                                             | 109±0.4ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
|          | 200±0.5ms                                             | 199±1ms             | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 175±0.8ms                                             | 175±2ms             | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
|          | 38.5±0.2ms                                            | 38.7±0.1ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
|          | 42.8±0.2ms                                            | 42.9±0.2ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
|          | 36.2±0.2ms                                            | 36.1±0.1ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['u', 'cx', 'id'])                     |
|          | 31.1±0.1ms                                            | 31.1±0.1ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                            |
|          | 143±0.4ms                                             | 143±0.2ms           | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                            |
|          | 34.2±0.2ms                                            | 34.4±0.7ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                         |
|          | 5.62±0.03ms                                           | 5.65±0.03ms         | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                                   |
|          | 5.44±0.01ms                                           | 5.44±0.01ms         | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                                   |
|          | 4.00±0.01ms                                           | 4.02±0.03ms         | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)                 |
|          | 1429                                                  | 1429                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)                     |
|          | 1316                                                  | 1316                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)                     |
|          | 1174                                                  | 1174                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)                     |
|          | 1213                                                  | 1213                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)                     |
|          | 2705                                                  | 2705                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)                            |
|          | 2005                                                  | 2005                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)                            |
|          | 7                                                     | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)                            |
|          | 7                                                     | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)                            |
|          | 11117                                                 | 11117               | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)          |
|          | 5015                                                  | 5015                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)          |
|          | 16                                                    | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)          |
|          | 16                                                    | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)          |
|          | 1035                                                  | 1035                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)                                 |
|          | 817                                                   | 817                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                                 |
|          | 615                                                   | 615                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)                                 |
|          | 634                                                   | 634                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                                 |
|          | 36.4±0.1ms                                            | 36.3±0.3ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                                                          |
|          | 33.5±0.2ms                                            | 33.5±0.3ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                                          |
|          | 11.6±0.1ms                                            | 11.6±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                                         |
|          | 13.4±0.2ms                                            | 13.4±0.1ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                                        |
|          | 3.08±0.07s                                            | 3.06±0.1s           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                                                      |
|          | 3.02±0.09s                                            | 3.01±0.05s          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                                                     |
|          | 223±0.5ms                                             | 223±1ms             | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                                                     |
|          | 225±0.7ms                                             | 224±0.7ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                                                     |
|          | 3.80±0.04ms                                           | 3.81±0.03ms         | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                                 |
|          | 41.2±0.5ms                                            | 41.1±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                                  |
|          | 41.3±0.5ms                                            | 41.4±0.3ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                                 |
|          | 13.3±0.1ms                                            | 13.3±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                                    |
|          | 13.3±0.1ms                                            | 13.3±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
|          | 379±2ms                                               | 378±1ms             | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                                              |
|          | 395±2ms                                               | 393±0.7ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                                             |
|          | 400                                                   | 400                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                                                   |
|          | 400                                                   | 400                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                                                   |
|          | 400                                                   | 400                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                                                  |
|          | 1312                                                  | 1312                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cx')                                               |
|          | 1313                                                  | 1313                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cz')                                               |
|          | 1313                                                  | 1313                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('ecr')                                              |
|          | 300                                                   | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                                                  |
|          | 300                                                   | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                                                  |
|          | 300                                                   | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')                                                 |
|          | 367549                                                | 367549              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cx')                                                    |
|          | 387065                                                | 387065              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cz')                                                    |
|          | 391069                                                | 391069              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('ecr')                                                   |
|          | 1590                                                  | 1590                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                                     |
|          | 1603                                                  | 1603                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                                     |
|          | 1603                                                  | 1603                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                                    |
|          | 2692                                                  | 2692                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                                      |
|          | 2744                                                  | 2744                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                                      |
|          | 2744                                                  | 2744                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                                     |
|          | 2571                                                  | 2571                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                                                       |
|          | 2571                                                  | 2571                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                                                       |
|          | 2571                                                  | 2571                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                                                      |
|          | 480                                                   | 480                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                                        |
|          | 480                                                   | 480                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                                        |
|          | 480                                                   | 480                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                                       |
|          | 121±0.7ms                                             | 120±0.2ms           | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                            |
|          | 3.63±0.01ms                                           | 3.60±0.01ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                                   |
|          | 5.25±0.02ms                                           | 5.23±0.02ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                                   |
|          | 4.18±0.02ms                                           | 4.15±0.03ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)                 |
|          | 18.7±0.3ms                                            | 18.6±0.2ms          | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                                        |
|          | 33.6±0.1ms                                            | 33.2±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                                         |
|          | 13.1±0.1ms                                            | 13.0±0.08ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                                         |
|          | 15.3±0.02s                                            | 15.1±0.01s          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                                                           |
|          | 20.6±0.02s                                            | 20.4±0.02s          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                                                           |
|          | 19.5±0.04s                                            | 19.4±0.01s          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                                                          |
|          | 224±2ms                                               | 223±0.4ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                                                    |
|          | 41.6±0.6ms                                            | 41.0±0.1ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                                  |
|          | 13.4±0.07ms                                           | 13.3±0.1ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                                   |
|          | 151±0.9ms                                             | 149±0.4ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                            |
|          | 178±0.6ms                                             | 177±0.8ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                            |
|          | 170±0.9ms                                             | 169±0.4ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                           |
|          | 301±0.7ms                                             | 296±0.8ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                            |
|          | 397±1ms                                               | 392±0.8ms           | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                                              |
|          | 50.2±0.2ms                                            | 49.6±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                               |
|          | 47.9±0.2ms                                            | 47.6±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                              |
|          | 9.29±0.08ms                                           | 9.14±0.07ms         | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                                        |
|          | 21.3±0.1ms                                            | 20.8±0.2ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                                        |
|          | 3.13±0.07s                                            | 3.07±0.1s           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                                                      |
|          | 250±1ms                                               | 245±0.4ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                             |
|          | 295±0.7ms                                             | 290±0.6ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                             |
|          | 44.8±0.4ms                                            | 43.9±0.06ms         | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                               |
|          | 3.14±0.02ms                                           | 3.04±0.03ms         | 0.97    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                                                          |
|          | 3.12±0.02ms                                           | 3.02±0.01ms         | 0.97    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                                                         |
|          | 3.13±0.03ms                                           | 3.01±0.02ms         | 0.96    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                                                          |
|          | 6.40±0.3ms                                            | 6.07±0.01ms         | 0.95    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                                        |

Benchmarks that have got worse:

| Change   | Before [f23f0bcf] <nalgebra-weyls-are-everywhere^2>   | After [9eb86350]    |   Ratio | Benchmark (Parameter)                                     |
|----------|-------------------------------------------------------|---------------------|---------|-----------------------------------------------------------|
| +        | 2.99±0.01ms                                           | 16.8±0.03ms         |    5.62 | passes.PassBenchmarks.time_commutation_analysis(5, 1024)  |
| +        | 9.48±0.03ms                                           | 52.2±0.06ms         |    5.51 | passes.PassBenchmarks.time_commutation_analysis(14, 1024) |
| +        | 14.2±0.07ms                                           | 76.8±0.09ms         |    5.41 | passes.PassBenchmarks.time_commutation_analysis(20, 1024) |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

What I forgot to put in the commit message was that this PR is actually important for enabling subsequent optimizations. Specifically the caching is why the commutation checker needs to be mutable everywhere it is used right now, which prevents parallelism in the commutation analysis (without copying it everywhere).

Cryoris · 2026-04-10T11:50:14Z

Ok thanks! CommutationAnalysis is another thing we might be able to take the axe to -- or does the parallelized approach rely on precomputing the commutations?

mtreinish added the Changelog: Deprecated Add a "Deprecated" entry in the GitHub Release changelog. label Apr 9, 2026

mtreinish requested a review from a team as a code owner April 9, 2026 22:25

mtreinish added Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Apr 9, 2026

Cryoris reviewed Apr 10, 2026

View reviewed changes

qiskit/circuit/commutation_checker.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the commutation cache from the commutation checker#15988

Remove the commutation cache from the commutation checker#15988
mtreinish wants to merge 1 commit intoQiskit:mainfrom
mtreinish:cache-no-more

mtreinish commented Apr 9, 2026

Uh oh!

qiskit-bot commented Apr 9, 2026

Uh oh!

Cryoris commented Apr 10, 2026

Uh oh!

Cryoris Apr 10, 2026

Uh oh!

mtreinish Apr 10, 2026

Uh oh!

Cryoris Apr 10, 2026

Uh oh!

Uh oh!

mtreinish commented Apr 10, 2026

Uh oh!

Cryoris commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mtreinish commented Apr 9, 2026

Summary

Details and comments

Uh oh!

qiskit-bot commented Apr 9, 2026

Uh oh!

Cryoris commented Apr 10, 2026

Uh oh!

Cryoris Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mtreinish Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Cryoris Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtreinish commented Apr 10, 2026

Uh oh!

Cryoris commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants