Add CodSpeed CI with end-to-end primer benchmarks (flask + black) and a cold start test using initial imports by Pierre-Sassoulas · Pull Request #3079 · pylint-dev/astroid

Pierre-Sassoulas · 2026-05-23T21:06:53Z

Adds CodSpeed CI to astroid with three end-to-end benchmarks for black and flask, alongside a cold start test to exercice the import time which is important because we often lint single file in parallel inside pre-commit.

Supersedes #3022.

Result can be seen in #3080 (but also on currently opened performance branches : #3069, #3046, #3048). For the lazy loading of brains, the result is not visible because the threshold is at 2% change (already close to noise). We won't see performance change in brains as we don't parse lib specific code in this benchmark.

codecov · 2026-05-23T21:12:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.60%. Comparing base (926f4c9) to head (7e482f9).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3079   +/-   ##
=======================================
  Coverage   93.60%   93.60%           
=======================================
  Files          92       92           
  Lines       11364    11364           
=======================================
  Hits        10637    10637           
  Misses        727      727

Flag	Coverage Δ
linux	`93.47% <ø> (ø)`
pypy	`93.60% <ø> (ø)`
windows	`93.57% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replaces the single Flask benchmark with three primer projects of progressively increasing size, all pinned to the SHAs used by pylint's nightly primer run: - flask (small, ~50 .py files in src/flask) — kept, with both parse-only (isolates rebuilder cost) and full pylint-shaped traversal - black (medium, ~250 files across src/black + src/blackd + src/blib2to3) — classic AST handling, deep class hierarchies - pandas/core (large) — heavy brain_dataclasses / brain_numpy / brain_namedtuple_enum usage; stresses the brain plugins more than any other target The microbenchmarks landed in #3079 show >40% StdDev on CodSpeed CI because they complete in microseconds. The end-to-end benches each take seconds to minutes, so per-run noise becomes a small fraction of the measurement. The fixture uses 'git clone --no-checkout --filter=blob:none --no-tags --single-branch' plus 'sparse-checkout' so the pandas clone doesn't fetch the whole monorepo — just the pandas/core subtree. Refactor: project metadata moved into a _Project NamedTuple keyed by short name, so adding more primer targets is a one-block addition. Local smoke run (one iteration each, no --codspeed): 4 passed in 2:05. Refs #2014. Stacked on #3079.

Adds ~5-10 µs of busywork (sum(range(1000))) at the very top of every NodeNG.infer() call. Since infer() is called O(1M) times during a pylint-shaped traversal, this adds several seconds to each end-to-end benchmark. The microbenchmarks should also show a clear regression. Purpose: verify that CodSpeed CI reports the perf delta vs the baseline (the merged-ahead state of #3079) and renders it correctly on the PR. This is an experiment, not a real change. The branch will be discarded after the comparison is observed. Refs #2014 (the profile-first feasibility study).

codspeed-hq · 2026-05-24T10:40:39Z

Congrats! CodSpeed is installed 🎉

🆕 3 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

test_bench_endtoend_parse_flask (Simulation): 2.2 s
test_bench_endtoend_walk_infer_black (Simulation): 14.9 s
test_bench_endtoend_walk_infer_flask (Simulation): 9.6 s

Adds ~5-10 µs of busywork (sum(range(1000))) at the very top of every NodeNG.infer() call. Since infer() is called O(1M) times during a pylint-shaped traversal, this adds several seconds to each end-to-end benchmark. The microbenchmarks should also show a clear regression. Purpose: verify that CodSpeed CI reports the perf delta vs the baseline (the merged-ahead state of #3079) and renders it correctly on the PR. This is an experiment, not a real change. The branch will be discarded after the comparison is observed. Refs #2014 (the profile-first feasibility study).

… MERGE Eagerly imports nine heavyweight stdlib modules (pydoc, multiprocessing, xmlrpc.server, xml.dom.minidom, unittest.mock, email.mime.multipart, wsgiref.simple_server, http.server, concurrent.futures) at the top of astroid/__init__.py. Local measurement: `import astroid` goes from ~76 ms to ~198 ms (+120 ms per cold start). The cold-lint bench (test_bench_endtoend_cold_lint) shells out `python -m pylint` per iteration and re-pays import cost every run, so it should be the loudest reporter; the in-process benches import once at module load and will only see this on the first iteration. Purpose: verify that CodSpeed's walltime workflow (cold-lint) reports the startup delta vs the baseline (#3079) and that it shows up distinctly from the in-process `infer()` regression added in 19f5c5b. This is an experiment, not a real change. The branch will be discarded after the comparison is observed. Refs #2014 (the profile-first feasibility study).

cdce8p

Left some comments. Similar to the pylint primer, this isn't really my area of expertise, so might be good if someone else could take a look at it as well.

cdce8p · 2026-05-24T20:19:37Z

+permissions:
+  contents: read
+  id-token: write


Is id-token necessary for CodSpeedHQ/action? If that's the case, could you move it to the job itself?

cdce8p · 2026-05-24T20:21:00Z

+      - name: Set up Python 3.13
+        id: python
+        uses: actions/setup-python@v6.2.0
+        with:
+          python-version: "3.13"
+          check-latest: true


Any particular reason to use 3.13 over 3.14? Saw that we recently switch the default in pylint, guess we'll do the same for astroid as well soon.

cdce8p · 2026-05-24T20:22:24Z

+          # astroid with the version under test. pylint is invoked as a
+          # subprocess by the cold-start benchmark.
+          pip install pylint pytest-codspeed
+          pip install -e .


Suggested change

pip install -e .

pip install .

I believe editable installs might even be unnecessary here. Yes, we use them for the other workflows as well, however the astroid files aren't modified in any way after the install.

cdce8p · 2026-05-24T20:24:44Z

+
+permissions:
+  contents: read
+  id-token: write


Similar to earlier, move this to the job itself.

cdce8p · 2026-05-24T20:24:55Z

+      - name: Set up Python 3.13
+        id: python
+        uses: actions/setup-python@v6.2.0
+        with:
+          python-version: "3.13"
+          check-latest: true


cdce8p · 2026-05-24T20:25:10Z

+        run: |
+          python -m pip install -U pip
+          pip install pytest-codspeed
+          pip install -e .


Suggested change

pip install -e .

pip install .

cdce8p · 2026-05-24T20:25:37Z

+is set in ``pyproject.toml``, even a warning would abort collection).
+"""
+
+from __future__ import annotations


No really need for that here.

cdce8p · 2026-05-24T20:27:15Z

+import importlib.util
+
+if importlib.util.find_spec("pytest_codspeed") is None:
+    collect_ignore_glob = ["test_bench_*.py"]


Guess that will work, though in other test files we usually use a try ... except ImportError guard for it.

try: import pytest_codespeed except ImportError: collect_ignore_glob = ["test_bench_*.py"]

cdce8p · 2026-05-24T20:32:06Z

+- ``test_bench_endtoend_cold_lint`` — shells out
+  ``python -m pylint <minimal module>`` per iteration. ~98 % of wall
+  time is startup (Python import, pylint init, ``import astroid``,
+  brain-plugin registration); only ~2 % is actual linting work on the
+  one-line target. This captures cold-start cost that the in-process
+  benches below miss: within a single pytest session ``astroid`` is
+  imported once at module load and stays in ``sys.modules``, so
+  optimizations like deferring brain-plugin registration are invisible
+  to those benches.


I worry that we might end up optimizing the wrong thing. Yes, startup time is important as well, however with lazy imports begin available in 3.15 there seems to be a focus across the ecosystem to use them everywhere. IMO we should be cautious with that when the time comes, especially for modules which get imported anyway.

That's a bit different for brains, etc. which aren't actually needed for some / most projects. Though this might only be a small fraction of the total import time.

We did some lazy import in https://github.com/pylint-dev/astroid/pull/3062/changes, I think those one are useful, and would have been detected because the logging is very seldom used (only if there's C level warning during parsing).

Pierre-Sassoulas · 2026-05-24T21:32:53Z

Thank you for the review Marc ! Appreciate it.

…t, bench scope) Review feedback from cdce8p on #3079: - Move 'id-token: write' from workflow-level permissions down to the jobs.benchmarks.permissions block in both CodSpeed workflows so the token is only minted for the job that actually uploads to CodSpeed. - Bump setup-python to 3.14 (matches pylint's new default; astroid will follow). - Drop the editable install in both workflows: nothing modifies the source tree after install, so 'pip install .' is enough. Updated the inline comment in the walltime workflow accordingly. - conftest.py: switch from importlib.util.find_spec to the try/except ImportError idiom used elsewhere in the test tree, and drop the unused 'from __future__ import annotations' (no annotations in this file). - test_bench_endtoend.py: add a scope note on the cold-start bench. The point is to *protect* targeted lazy imports for rare-path modules (brain plugins not used by every project; debug-only stdlib like pprint / logging deferred in #3062). Those function- local / TYPE_CHECKING imports work on every supported Python and do not depend on PEP 810 lazy imports landing in 3.15. The bench is explicitly *not* an argument for lazifying modules astroid imports unconditionally.

DanielNoord

I don't have much experience with CodSpeed, but most of it looks fine to me I guess?

DanielNoord · 2026-06-03T19:50:37Z

+# For details: https://github.com/pylint-dev/astroid/blob/main/LICENSE
+# Copyright (c) https://github.com/pylint-dev/astroid/blob/main/CONTRIBUTORS.txt
+
+"""End-to-end benchmarks: cold import + parse + walk + infer real projects.


A lot of this docstring will get outdated as soon as we start changing stuff. Is it really necessary to document this in this much detail?

DanielNoord · 2026-06-03T19:51:12Z

+    benchmark(_pylint_one_file, minimal_module)
+
+
+# -- Flask: small, parse + walk_infer (parse isolates rebuilder cost). --


In line comments like this also tend to get out of date

DanielNoord · 2026-06-03T19:51:22Z

+# -- Flask: small, parse + walk_infer (parse isolates rebuilder cost). --
+
+
+def test_bench_endtoend_parse_flask(benchmark, flask_files: list[Path]) -> None:


No types for benchmark?

DanielNoord · 2026-06-03T19:52:08Z

+    timeout-minutes: 15
+    permissions:
+      contents: read
+      id-token: write


Why is this necessary?

DanielNoord · 2026-06-03T19:52:18Z

+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      id-token: write


Why is this necessary?

Might not be in a public repo, I removed, let's see how it goes

Look like it worked. I fixed your other remarks in be68e14

Three benchmarks based on pylint's primer corpus, with SHAs pinned to match what pylint sees nightly: - test_bench_endtoend_parse_flask (rebuilder hot path, ~50 .py) - test_bench_endtoend_walk_infer_flask (pylint-shaped traversal) - test_bench_endtoend_walk_infer_black (medium scale, ~250 .py) The benchmarks call astroid's API directly (ast_from_file + nodes_of_class + safe_infer on Call/Attribute/Name) so they mirror what a pylint checker does without depending on pylint. The fixture uses 'git clone --no-checkout --filter=blob:none --no-tags --single-branch' plus 'sparse-checkout' so only the declared source subdirs are fetched. Workflow uses 'mode: simulation' so the CodSpeed dashboard provides per-function attribution; 'CodSpeedHQ/action' is allowlisted in repo Actions settings. conftest.py uses 'collect_ignore_glob' to skip the directory cleanly when pytest-codspeed is not installed locally. Drops the 19 microbenchmarks the CodSpeed wizard generated: at microsecond scale they showed >40 % StdDev on CI, too noisy for regression detection, and their coverage is subsumed by the end-to-end benches. Local smoke (one iter each, no --codspeed): 3 passed in 11.6s. Supersedes #3022. Refs #2014.

…t, bench scope) Review feedback from cdce8p on #3079: - Move 'id-token: write' from workflow-level permissions down to the jobs.benchmarks.permissions block in both CodSpeed workflows so the token is only minted for the job that actually uploads to CodSpeed. - Bump setup-python to 3.14 (matches pylint's new default; astroid will follow). - Drop the editable install in both workflows: nothing modifies the source tree after install, so 'pip install .' is enough. Updated the inline comment in the walltime workflow accordingly. - conftest.py: switch from importlib.util.find_spec to the try/except ImportError idiom used elsewhere in the test tree, and drop the unused 'from __future__ import annotations' (no annotations in this file). - test_bench_endtoend.py: add a scope note on the cold-start bench. The point is to *protect* targeted lazy imports for rare-path modules (brain plugins not used by every project; debug-only stdlib like pprint / logging deferred in #3062). Those function- local / TYPE_CHECKING imports work on every supported Python and do not depend on PEP 810 lazy imports landing in 3.15. The bench is explicitly *not* an argument for lazifying modules astroid imports unconditionally.

Per Daniel's review on #3079: - Trim the module docstring to what stays true (what the file benches + primer-corpus link); the removed prose (startup percentages, PEP 810, StdDev figures) would rot as the benches change. - Drop the `# -- section --` divider comments; the test names already say what each group does. - Annotate the `benchmark` fixture as `BenchmarkFixture`, imported under TYPE_CHECKING since pytest-codspeed is an optional dependency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

astroid is a public repo, so CodSpeed can upload results without authentication; the id-token: write permission (for OIDC auth) is not required. Removing it also lets the redundant job-level permissions block go, since the top-level `contents: read` already covers the job. Try without it per Daniel's "why is this necessary?" review on #3079; if uploads fail, re-add `id-token: write` at the job level. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per Daniel's review on #3079: - Trim the module docstring to what stays true (what the file benches + primer-corpus link); the removed prose (startup percentages, PEP 810, StdDev figures) would rot as the benches change. - Drop the `# -- section --` divider comments; the test names already say what each group does. - Annotate the `benchmark` fixture as `BenchmarkFixture`, imported under TYPE_CHECKING since pytest-codspeed is an optional dependency.

astroid is a public repo, so CodSpeed can upload results without authentication; the id-token: write permission (for OIDC auth) is not required. Removing it also lets the redundant job-level permissions block go, since the top-level `contents: read` already covers the job. Try without it per Daniel's "why is this necessary?" review on #3079; if uploads fail, re-add `id-token: write` at the job level.

Pierre-Sassoulas marked this pull request as draft May 23, 2026 21:10

Pierre-Sassoulas marked this pull request as ready for review May 23, 2026 21:18

Pierre-Sassoulas marked this pull request as draft May 23, 2026 21:18

Pierre-Sassoulas added Enhancement ✨ Improvement to a component topic-performance Maintenance Discussion or action around maintaining astroid or the dev workflow labels May 23, 2026

Pierre-Sassoulas mentioned this pull request May 24, 2026

TEST: deliberate regression to verify CodSpeed comparison (do not merge) #3080

Closed

Pierre-Sassoulas changed the title ~~Add CodSpeed performance benchmarks and CI workflow~~ Add CodSpeed CI: microbenchmarks + end-to-end primer benchmarks May 24, 2026

Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 9986de3 to 28184b1 Compare May 24, 2026 10:17

Pierre-Sassoulas changed the title ~~Add CodSpeed CI: microbenchmarks + end-to-end primer benchmarks~~ Add CodSpeed CI with end-to-end primer benchmarks (flask + black) May 24, 2026

Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 28184b1 to f8520dd Compare May 24, 2026 10:50

Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch 2 times, most recently from 2d391a6 to 2ef6d50 Compare May 24, 2026 13:23

Pierre-Sassoulas changed the title ~~Add CodSpeed CI with end-to-end primer benchmarks (flask + black)~~ Add CodSpeed CI with end-to-end primer benchmarks (flask + black) and a cold start test using initial imports May 24, 2026

Pierre-Sassoulas marked this pull request as ready for review May 24, 2026 15:19

Pierre-Sassoulas mentioned this pull request May 24, 2026

Add benchmark comparison workflow with PR comments pylint-dev/pylint#10893

Draft

Pierre-Sassoulas requested a review from cdce8p May 24, 2026 19:48

cdce8p reviewed May 24, 2026

View reviewed changes

Pierre-Sassoulas requested a review from DanielNoord May 31, 2026 20:09

DanielNoord reviewed Jun 3, 2026

View reviewed changes

Pierre-Sassoulas added 2 commits June 6, 2026 01:26

Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 49bb76b to 2fb0a52 Compare June 6, 2026 05:48

Pierre-Sassoulas added 2 commits June 6, 2026 07:51

Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 2fb0a52 to 7e482f9 Compare June 6, 2026 05:51

Pierre-Sassoulas requested a review from DanielNoord June 6, 2026 08:04

DanielNoord approved these changes Jun 6, 2026

View reviewed changes

Pierre-Sassoulas merged commit c50a1f4 into main Jun 6, 2026
39 checks passed

Pierre-Sassoulas deleted the codspeed-wizard-1774989768270 branch June 6, 2026 21:41

Pierre-Sassoulas restored the codspeed-wizard-1774989768270 branch June 6, 2026 21:42

Pierre-Sassoulas deleted the codspeed-wizard-1774989768270 branch June 6, 2026 21:44

		benchmark(_pylint_one_file, minimal_module)


		# -- Flask: small, parse + walk_infer (parse isolates rebuilder cost). --

		# -- Flask: small, parse + walk_infer (parse isolates rebuilder cost). --


		def test_bench_endtoend_parse_flask(benchmark, flask_files: list[Path]) -> None:

Uh oh!

Conversation

Pierre-Sassoulas commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Congrats! CodSpeed is installed 🎉

Detected benchmarks

Uh oh!

cdce8p left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdce8p May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pierre-Sassoulas commented May 24, 2026

Uh oh!

DanielNoord left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pierre-Sassoulas commented May 23, 2026 •

edited

Loading

codecov Bot commented May 23, 2026 •

edited

Loading

codspeed-hq Bot commented May 24, 2026 •

edited

Loading

cdce8p May 24, 2026 •

edited

Loading