Skip to content

Performance: cache ClassDef.ancestors() transitive walk#3048

Open
Pierre-Sassoulas wants to merge 5 commits into
mainfrom
perf/cache-classdef-ancestors
Open

Performance: cache ClassDef.ancestors() transitive walk#3048
Pierre-Sassoulas wants to merge 5 commits into
mainfrom
perf/cache-classdef-ancestors

Conversation

@Pierre-Sassoulas

Copy link
Copy Markdown
Member

Type of Changes

Type
✨ New feature

Description

The recursive walk in ancestors(recurs=True) re-resolved shared base classes on every call, amplifying cost on deep MRO chains. Cache the materialized tuple as a cached_property so each ClassDef pays for its ancestors once, and the cache dies with the instance when the manager drops the AST.

context is intentionally not part of the key: the result is path-independent and the walk's own yielded set handles cycle prevention.

Measured on pandas/core/frame.py (interleaved A/B, n=4):
baseline 21.34s ± 0.18 -> patched 20.48s ± 0.13 (-4.0%)

Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k distinct ClassDefs). Larger speedups expected on codebases with deeper MROs (SQLAlchemy/Pydantic-shaped projects).

Closes #1115

@Pierre-Sassoulas Pierre-Sassoulas added this to the 4.2.0 milestone May 10, 2026
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the perf/cache-classdef-ancestors branch 4 times, most recently from 4160de2 to eba3439 Compare May 18, 2026 20:09
@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.60%. Comparing base (2ef6d50) to head (ea6b8c5).

Additional details and impacted files

Impacted file tree graph

@@                        Coverage Diff                        @@
##           codspeed-wizard-1774989768270    #3048      +/-   ##
=================================================================
+ Coverage                          93.56%   93.60%   +0.03%     
=================================================================
  Files                                 92       92              
  Lines                              11345    11397      +52     
=================================================================
+ Hits                               10615    10668      +53     
+ Misses                               730      729       -1     
Flag Coverage Δ
linux 93.47% <100.00%> (+0.03%) ⬆️
pypy 93.60% <100.00%> (+0.03%) ⬆️
windows 93.57% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
astroid/brain/brain_builtin_inference.py 92.87% <100.00%> (+0.18%) ⬆️
astroid/context.py 98.76% <100.00%> (-1.24%) ⬇️
astroid/nodes/scoped_nodes/scoped_nodes.py 93.63% <100.00%> (+0.35%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pierre-Sassoulas Pierre-Sassoulas force-pushed the perf/cache-classdef-ancestors branch from eba3439 to d7a22a7 Compare May 24, 2026 12:06
@Pierre-Sassoulas Pierre-Sassoulas changed the base branch from main to codspeed-wizard-1774989768270 May 24, 2026 12:06
@codspeed-hq

codspeed-hq Bot commented May 24, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 9.83%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation test_bench_endtoend_walk_infer_black 14.9 s 13.5 s +10.2%
Simulation test_bench_endtoend_walk_infer_flask 9.6 s 8.7 s +9.79%
Simulation test_bench_endtoend_parse_flask 2.2 s 2 s +9.51%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/cache-classdef-ancestors (4f1ff78) with main (c50a1f4)

Open in CodSpeed

@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch 2 times, most recently from 2d391a6 to 2ef6d50 Compare May 24, 2026 13:23
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the perf/cache-classdef-ancestors branch from d7a22a7 to 2718c3e Compare May 24, 2026 14:34
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
…) bypass

Add targeted regression coverage for the four perf commits in #3048
so the corner-case branches don't silently regress.

scoped_nodes.py — ClassDef.ancestors() cache (TestAncestorsCaching):
  * cache hit reuses the materialized tuple across calls
  * recurs=False bypasses the cache entirely
  * cyclic class hierarchy (string.Template = A) unwinds via the
    _COMPUTING_ANCESTORS sentinel without infinite recursion
  * exception during the walk clears the sentinel so the cache is
    not poisoned (covers the except BaseException cleanup path)

scoped_nodes.py — ClassDef._find_metaclass() cache (TestFindMetaclassCaching):
  * cached result reused on second no-context call
  * None result is cached so the MRO walk runs once
  * explicit context= argument bypasses the cache
  * re-entry through the _COMPUTING_METACLASS sentinel returns None
    (covers the cycle-break branch)

brain_builtin_inference.py — single Call dispatcher (TestBuiltinDispatcher):
  * known builtin Name dispatches (bool, dict.fromkeys via Attribute)
  * unknown Name, non-dict Attribute, dynamic call target are skipped
  * re.Pattern = type(...) / re.Match = type(...) still excluded so
    brain_re keeps owning their inference
  * register_builtin_transform populates _BUILTIN_INFERENCE_FUNCS

context.py — InferenceContext.clone() bypass (new tests/test_context.py):
  * path is an independent deep copy; mutations don't leak back
  * lookupname is reset to None on the clone
  * _nodes_inferred is shared (mutable counter preserved across the
    clone family — required for the max_inferred budget)
  * callcontext / boundnode / extra_context propagated by identity
  * constraints is shallow-copied
  * clone() bypasses __init__ (tracked via temporary patch)
  * end-to-end inference still resolves through a cloned context

Closes the two PR #3048 coverage gaps reported by Codecov
(scoped_nodes.py:2218-2220 and :2757).
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch 2 times, most recently from 2fb0a52 to 7e482f9 Compare June 6, 2026 05:51
Base automatically changed from codspeed-wizard-1774989768270 to main June 6, 2026 21:41
The recursive walk in ``ancestors(recurs=True)`` re-resolved shared base
classes on every call, amplifying cost on deep MRO chains. Cache the
materialized tuple as a ``cached_property`` so each ClassDef pays for
its ancestors once, and the cache dies with the instance when the
manager drops the AST.

``context`` is intentionally not part of the key — the result is
path-independent and the walk's own ``yielded`` set handles cycle
prevention.

Measured on pandas/core/frame.py (interleaved A/B, n=4):
  baseline 21.34s ± 0.18  ->  patched 20.48s ± 0.13   (-4.0%)

Cache hit rate on the same run: 98% (66k hits / 1.2k misses on 1k
distinct ClassDefs). Larger speedups expected on codebases with
deeper MROs (SQLAlchemy/Pydantic-shaped projects).

Closes #1115
The metaclass lookup walked the full MRO on every call, with each
recursive call traversing the same ancestor chain. Cache the per-node
result on the instance so recursive entries short-circuit. Cycle
protection in cyclic class hierarchies is preserved via a
``_COMPUTING_METACLASS`` sentinel that re-entry treats as the cycle
break the ``seen`` set provides in the slow path.

Measured on synthetic FastAPI/SQLAlchemy/Pydantic target (5.94x
``ancestors`` recursion factor — matches rgant's #1115 profile shape):

  baseline 10.37s ± 0.09  ->  patched 10.17s ± 0.06   (-1.9%)

Reduces ``_find_metaclass`` body executions from ~40k to ~650 on the
same run (cache hit rate >98%). Pandas (no metaclass-heavy code)
shows no measurable change, as expected.

References #1115
The transform visitor walks every node in every parsed module and, for
each ``Call`` node, ran all 19 ``_builtin_filter_predicate`` partials
in sequence — each repeating the same ``isinstance(node.func, Name)``
and ``node.func.name`` lookups. With ~21k Call nodes per pandas/frame
run, that's 375k+ predicate calls duplicating identical work.

Replace the 19 list entries with one dispatcher that does the
``isinstance``/name lookup once and routes to the right inference
function via dict lookup. ``register_builtin_transform`` keeps the
same signature so external callers (if any) are unaffected.

Measured wall (interleaved A/B, n=3-4):
  pandas/frame.py:    20.10s +/- 0.21  ->  19.51s +/- 0.07   (-3.0%)
  deep-nested target:  9.82s +/- 0.10  ->   9.55s +/- 0.10   (-2.8%)

References #1115
``clone()`` is called ~85k times per pandas/frame.py pylint run. The
existing implementation went through ``InferenceContext()`` whose
``__init__`` runs conditional defaults for fields the clone immediately
overwrites. With ``__slots__`` defined, ``__new__`` + direct slot
assignment is strictly cheaper.

Measured wall (interleaved A/B, n=4 on pandas/frame.py):
  19.77s +/- 0.16  ->  19.58s +/- 0.12   (-0.93%)

References #1115
…) bypass

Add targeted regression coverage for the four perf commits in #3048
so the corner-case branches don't silently regress.

scoped_nodes.py — ClassDef.ancestors() cache (TestAncestorsCaching):
  * cache hit reuses the materialized tuple across calls
  * recurs=False bypasses the cache entirely
  * cyclic class hierarchy (string.Template = A) unwinds via the
    _COMPUTING_ANCESTORS sentinel without infinite recursion
  * exception during the walk clears the sentinel so the cache is
    not poisoned (covers the except BaseException cleanup path)

scoped_nodes.py — ClassDef._find_metaclass() cache (TestFindMetaclassCaching):
  * cached result reused on second no-context call
  * None result is cached so the MRO walk runs once
  * explicit context= argument bypasses the cache
  * re-entry through the _COMPUTING_METACLASS sentinel returns None
    (covers the cycle-break branch)

brain_builtin_inference.py — single Call dispatcher (TestBuiltinDispatcher):
  * known builtin Name dispatches (bool, dict.fromkeys via Attribute)
  * unknown Name, non-dict Attribute, dynamic call target are skipped
  * re.Pattern = type(...) / re.Match = type(...) still excluded so
    brain_re keeps owning their inference
  * register_builtin_transform populates _BUILTIN_INFERENCE_FUNCS

context.py — InferenceContext.clone() bypass (new tests/test_context.py):
  * path is an independent deep copy; mutations don't leak back
  * lookupname is reset to None on the clone
  * _nodes_inferred is shared (mutable counter preserved across the
    clone family — required for the max_inferred budget)
  * callcontext / boundnode / extra_context propagated by identity
  * constraints is shallow-copied
  * clone() bypasses __init__ (tracked via temporary patch)
  * end-to-end inference still resolves through a cloned context

Closes the two PR #3048 coverage gaps reported by Codecov
(scoped_nodes.py:2218-2220 and :2757).
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the perf/cache-classdef-ancestors branch from ea6b8c5 to 4f1ff78 Compare June 6, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Astroid calls to ancestors are uncached and slow for templates and generics in ClassDef.ancestors

1 participant