Performance: cache ClassDef.ancestors() transitive walk#3048
Open
Pierre-Sassoulas wants to merge 5 commits into
Open
Performance: cache ClassDef.ancestors() transitive walk#3048Pierre-Sassoulas wants to merge 5 commits into
Pierre-Sassoulas wants to merge 5 commits into
Conversation
4160de2 to
eba3439
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## codspeed-wizard-1774989768270 #3048 +/- ##
=================================================================
+ Coverage 93.56% 93.60% +0.03%
=================================================================
Files 92 92
Lines 11345 11397 +52
=================================================================
+ Hits 10615 10668 +53
+ Misses 730 729 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
eba3439 to
d7a22a7
Compare
Merging this PR will improve performance by 9.83%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | test_bench_endtoend_walk_infer_black |
14.9 s | 13.5 s | +10.2% |
| ⚡ | Simulation | test_bench_endtoend_walk_infer_flask |
9.6 s | 8.7 s | +9.79% |
| ⚡ | Simulation | test_bench_endtoend_parse_flask |
2.2 s | 2 s | +9.51% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/cache-classdef-ancestors (4f1ff78) with main (c50a1f4)
2d391a6 to
2ef6d50
Compare
d7a22a7 to
2718c3e
Compare
Pierre-Sassoulas
added a commit
that referenced
this pull request
May 24, 2026
…) bypass Add targeted regression coverage for the four perf commits in #3048 so the corner-case branches don't silently regress. scoped_nodes.py — ClassDef.ancestors() cache (TestAncestorsCaching): * cache hit reuses the materialized tuple across calls * recurs=False bypasses the cache entirely * cyclic class hierarchy (string.Template = A) unwinds via the _COMPUTING_ANCESTORS sentinel without infinite recursion * exception during the walk clears the sentinel so the cache is not poisoned (covers the except BaseException cleanup path) scoped_nodes.py — ClassDef._find_metaclass() cache (TestFindMetaclassCaching): * cached result reused on second no-context call * None result is cached so the MRO walk runs once * explicit context= argument bypasses the cache * re-entry through the _COMPUTING_METACLASS sentinel returns None (covers the cycle-break branch) brain_builtin_inference.py — single Call dispatcher (TestBuiltinDispatcher): * known builtin Name dispatches (bool, dict.fromkeys via Attribute) * unknown Name, non-dict Attribute, dynamic call target are skipped * re.Pattern = type(...) / re.Match = type(...) still excluded so brain_re keeps owning their inference * register_builtin_transform populates _BUILTIN_INFERENCE_FUNCS context.py — InferenceContext.clone() bypass (new tests/test_context.py): * path is an independent deep copy; mutations don't leak back * lookupname is reset to None on the clone * _nodes_inferred is shared (mutable counter preserved across the clone family — required for the max_inferred budget) * callcontext / boundnode / extra_context propagated by identity * constraints is shallow-copied * clone() bypasses __init__ (tracked via temporary patch) * end-to-end inference still resolves through a cloned context Closes the two PR #3048 coverage gaps reported by Codecov (scoped_nodes.py:2218-2220 and :2757).
2fb0a52 to
7e482f9
Compare
The recursive walk in ``ancestors(recurs=True)`` re-resolved shared base classes on every call, amplifying cost on deep MRO chains. Cache the materialized tuple as a ``cached_property`` so each ClassDef pays for its ancestors once, and the cache dies with the instance when the manager drops the AST. ``context`` is intentionally not part of the key — the result is path-independent and the walk's own ``yielded`` set handles cycle prevention. Measured on pandas/core/frame.py (interleaved A/B, n=4): baseline 21.34s ± 0.18 -> patched 20.48s ± 0.13 (-4.0%) Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k distinct ClassDefs). Larger speedups expected on codebases with deeper MROs (SQLAlchemy/Pydantic-shaped projects). Closes #1115
The metaclass lookup walked the full MRO on every call, with each recursive call traversing the same ancestor chain. Cache the per-node result on the instance so recursive entries short-circuit. Cycle protection in cyclic class hierarchies is preserved via a ``_COMPUTING_METACLASS`` sentinel that re-entry treats as the cycle break the ``seen`` set provides in the slow path. Measured on synthetic FastAPI/SQLAlchemy/Pydantic target (5.94x ``ancestors`` recursion factor — matches rgant's #1115 profile shape): baseline 10.37s ± 0.09 -> patched 10.17s ± 0.06 (-1.9%) Reduces ``_find_metaclass`` body executions from ~40k to ~650 on the same run (cache hit rate >98%). Pandas (no metaclass-heavy code) shows no measurable change, as expected. References #1115
The transform visitor walks every node in every parsed module and, for each ``Call`` node, ran all 19 ``_builtin_filter_predicate`` partials in sequence — each repeating the same ``isinstance(node.func, Name)`` and ``node.func.name`` lookups. With ~21k Call nodes per pandas/frame run, that's 375k+ predicate calls duplicating identical work. Replace the 19 list entries with one dispatcher that does the ``isinstance``/name lookup once and routes to the right inference function via dict lookup. ``register_builtin_transform`` keeps the same signature so external callers (if any) are unaffected. Measured wall (interleaved A/B, n=3-4): pandas/frame.py: 20.10s +/- 0.21 -> 19.51s +/- 0.07 (-3.0%) deep-nested target: 9.82s +/- 0.10 -> 9.55s +/- 0.10 (-2.8%) References #1115
``clone()`` is called ~85k times per pandas/frame.py pylint run. The existing implementation went through ``InferenceContext()`` whose ``__init__`` runs conditional defaults for fields the clone immediately overwrites. With ``__slots__`` defined, ``__new__`` + direct slot assignment is strictly cheaper. Measured wall (interleaved A/B, n=4 on pandas/frame.py): 19.77s +/- 0.16 -> 19.58s +/- 0.12 (-0.93%) References #1115
…) bypass Add targeted regression coverage for the four perf commits in #3048 so the corner-case branches don't silently regress. scoped_nodes.py — ClassDef.ancestors() cache (TestAncestorsCaching): * cache hit reuses the materialized tuple across calls * recurs=False bypasses the cache entirely * cyclic class hierarchy (string.Template = A) unwinds via the _COMPUTING_ANCESTORS sentinel without infinite recursion * exception during the walk clears the sentinel so the cache is not poisoned (covers the except BaseException cleanup path) scoped_nodes.py — ClassDef._find_metaclass() cache (TestFindMetaclassCaching): * cached result reused on second no-context call * None result is cached so the MRO walk runs once * explicit context= argument bypasses the cache * re-entry through the _COMPUTING_METACLASS sentinel returns None (covers the cycle-break branch) brain_builtin_inference.py — single Call dispatcher (TestBuiltinDispatcher): * known builtin Name dispatches (bool, dict.fromkeys via Attribute) * unknown Name, non-dict Attribute, dynamic call target are skipped * re.Pattern = type(...) / re.Match = type(...) still excluded so brain_re keeps owning their inference * register_builtin_transform populates _BUILTIN_INFERENCE_FUNCS context.py — InferenceContext.clone() bypass (new tests/test_context.py): * path is an independent deep copy; mutations don't leak back * lookupname is reset to None on the clone * _nodes_inferred is shared (mutable counter preserved across the clone family — required for the max_inferred budget) * callcontext / boundnode / extra_context propagated by identity * constraints is shallow-copied * clone() bypasses __init__ (tracked via temporary patch) * end-to-end inference still resolves through a cloned context Closes the two PR #3048 coverage gaps reported by Codecov (scoped_nodes.py:2218-2220 and :2757).
ea6b8c5 to
4f1ff78
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Type of Changes
Description
The recursive walk in
ancestors(recurs=True)re-resolved shared base classes on every call, amplifying cost on deep MRO chains. Cache the materialized tuple as acached_propertyso each ClassDef pays for its ancestors once, and the cache dies with the instance when the manager drops the AST.contextis intentionally not part of the key: the result is path-independent and the walk's ownyieldedset handles cycle prevention.Measured on pandas/core/frame.py (interleaved A/B, n=4):
baseline 21.34s ± 0.18 -> patched 20.48s ± 0.13 (-4.0%)
Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k distinct ClassDefs). Larger speedups expected on codebases with deeper MROs (SQLAlchemy/Pydantic-shaped projects).
Closes #1115