Skip to content

Add CodSpeed CI with end-to-end primer benchmarks (flask + black) and a cold start test using initial imports#3079

Merged
Pierre-Sassoulas merged 4 commits into
mainfrom
codspeed-wizard-1774989768270
Jun 6, 2026
Merged

Add CodSpeed CI with end-to-end primer benchmarks (flask + black) and a cold start test using initial imports#3079
Pierre-Sassoulas merged 4 commits into
mainfrom
codspeed-wizard-1774989768270

Conversation

@Pierre-Sassoulas

@Pierre-Sassoulas Pierre-Sassoulas commented May 23, 2026

Copy link
Copy Markdown
Member

Adds CodSpeed CI to astroid with three end-to-end benchmarks for black and flask, alongside a cold start test to exercice the import time which is important because we often lint single file in parallel inside pre-commit.

Supersedes #3022.

Result can be seen in #3080 (but also on currently opened performance branches : #3069, #3046, #3048). For the lazy loading of brains, the result is not visible because the threshold is at 2% change (already close to noise). We won't see performance change in brains as we don't parse lib specific code in this benchmark.

@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as draft May 23, 2026 21:10
@codecov

codecov Bot commented May 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.60%. Comparing base (926f4c9) to head (7e482f9).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3079   +/-   ##
=======================================
  Coverage   93.60%   93.60%           
=======================================
  Files          92       92           
  Lines       11364    11364           
=======================================
  Hits        10637    10637           
  Misses        727      727           
Flag Coverage Δ
linux 93.47% <ø> (ø)
pypy 93.60% <ø> (ø)
windows 93.57% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as ready for review May 23, 2026 21:18
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as draft May 23, 2026 21:18
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component topic-performance Maintenance Discussion or action around maintaining astroid or the dev workflow labels May 23, 2026
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
Replaces the single Flask benchmark with three primer projects of
progressively increasing size, all pinned to the SHAs used by pylint's
nightly primer run:

- flask (small, ~50 .py files in src/flask) — kept, with both
  parse-only (isolates rebuilder cost) and full pylint-shaped traversal
- black (medium, ~250 files across src/black + src/blackd +
  src/blib2to3) — classic AST handling, deep class hierarchies
- pandas/core (large) — heavy brain_dataclasses / brain_numpy /
  brain_namedtuple_enum usage; stresses the brain plugins more than
  any other target

The microbenchmarks landed in #3079 show >40% StdDev on CodSpeed CI
because they complete in microseconds. The end-to-end benches each
take seconds to minutes, so per-run noise becomes a small fraction of
the measurement.

The fixture uses 'git clone --no-checkout --filter=blob:none --no-tags
--single-branch' plus 'sparse-checkout' so the pandas clone doesn't
fetch the whole monorepo — just the pandas/core subtree.

Refactor: project metadata moved into a _Project NamedTuple keyed by
short name, so adding more primer targets is a one-block addition.

Local smoke run (one iteration each, no --codspeed): 4 passed in 2:05.

Refs #2014. Stacked on #3079.
@Pierre-Sassoulas Pierre-Sassoulas changed the title Add CodSpeed performance benchmarks and CI workflow Add CodSpeed CI: microbenchmarks + end-to-end primer benchmarks May 24, 2026
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
Adds ~5-10 µs of busywork (sum(range(1000))) at the very top of every
NodeNG.infer() call. Since infer() is called O(1M) times during a
pylint-shaped traversal, this adds several seconds to each end-to-end
benchmark. The microbenchmarks should also show a clear regression.

Purpose: verify that CodSpeed CI reports the perf delta vs the
baseline (the merged-ahead state of #3079) and renders it correctly
on the PR. This is an experiment, not a real change. The branch will
be discarded after the comparison is observed.

Refs #2014 (the profile-first feasibility study).
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 9986de3 to 28184b1 Compare May 24, 2026 10:17
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
Adds ~5-10 µs of busywork (sum(range(1000))) at the very top of every
NodeNG.infer() call. Since infer() is called O(1M) times during a
pylint-shaped traversal, this adds several seconds to each end-to-end
benchmark. The microbenchmarks should also show a clear regression.

Purpose: verify that CodSpeed CI reports the perf delta vs the
baseline (the merged-ahead state of #3079) and renders it correctly
on the PR. This is an experiment, not a real change. The branch will
be discarded after the comparison is observed.

Refs #2014 (the profile-first feasibility study).
@Pierre-Sassoulas Pierre-Sassoulas changed the title Add CodSpeed CI: microbenchmarks + end-to-end primer benchmarks Add CodSpeed CI with end-to-end primer benchmarks (flask + black) May 24, 2026
@codspeed-hq

codspeed-hq Bot commented May 24, 2026

Copy link
Copy Markdown

Congrats! CodSpeed is installed 🎉

🆕 3 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks


Open in CodSpeed

@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 28184b1 to f8520dd Compare May 24, 2026 10:50
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
Adds ~5-10 µs of busywork (sum(range(1000))) at the very top of every
NodeNG.infer() call. Since infer() is called O(1M) times during a
pylint-shaped traversal, this adds several seconds to each end-to-end
benchmark. The microbenchmarks should also show a clear regression.

Purpose: verify that CodSpeed CI reports the perf delta vs the
baseline (the merged-ahead state of #3079) and renders it correctly
on the PR. This is an experiment, not a real change. The branch will
be discarded after the comparison is observed.

Refs #2014 (the profile-first feasibility study).
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch 2 times, most recently from 2d391a6 to 2ef6d50 Compare May 24, 2026 13:23
@Pierre-Sassoulas Pierre-Sassoulas changed the title Add CodSpeed CI with end-to-end primer benchmarks (flask + black) Add CodSpeed CI with end-to-end primer benchmarks (flask + black) and a cold start test using initial imports May 24, 2026
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as ready for review May 24, 2026 15:19
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
Adds ~5-10 µs of busywork (sum(range(1000))) at the very top of every
NodeNG.infer() call. Since infer() is called O(1M) times during a
pylint-shaped traversal, this adds several seconds to each end-to-end
benchmark. The microbenchmarks should also show a clear regression.

Purpose: verify that CodSpeed CI reports the perf delta vs the
baseline (the merged-ahead state of #3079) and renders it correctly
on the PR. This is an experiment, not a real change. The branch will
be discarded after the comparison is observed.

Refs #2014 (the profile-first feasibility study).
Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
… MERGE

Eagerly imports nine heavyweight stdlib modules (pydoc, multiprocessing,
xmlrpc.server, xml.dom.minidom, unittest.mock, email.mime.multipart,
wsgiref.simple_server, http.server, concurrent.futures) at the top of
astroid/__init__.py.

Local measurement: `import astroid` goes from ~76 ms to ~198 ms (+120 ms
per cold start). The cold-lint bench (test_bench_endtoend_cold_lint)
shells out `python -m pylint` per iteration and re-pays import cost
every run, so it should be the loudest reporter; the in-process benches
import once at module load and will only see this on the first iteration.

Purpose: verify that CodSpeed's walltime workflow (cold-lint) reports
the startup delta vs the baseline (#3079) and that it shows up
distinctly from the in-process `infer()` regression added in 19f5c5b.
This is an experiment, not a real change. The branch will be discarded
after the comparison is observed.

Refs #2014 (the profile-first feasibility study).
@Pierre-Sassoulas Pierre-Sassoulas requested a review from cdce8p May 24, 2026 19:48

@cdce8p cdce8p left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Similar to the pylint primer, this isn't really my area of expertise, so might be good if someone else could take a look at it as well.

Comment on lines +26 to +28
permissions:
contents: read
id-token: write

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is id-token necessary for CodSpeedHQ/action? If that's the case, could you move it to the job itself?

Comment on lines +39 to +44
- name: Set up Python 3.13
id: python
uses: actions/setup-python@v6.2.0
with:
python-version: "3.13"
check-latest: true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to use 3.13 over 3.14? Saw that we recently switch the default in pylint, guess we'll do the same for astroid as well soon.

# astroid with the version under test. pylint is invoked as a
# subprocess by the cold-start benchmark.
pip install pylint pytest-codspeed
pip install -e .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install -e .
pip install .

I believe editable installs might even be unnecessary here. Yes, we use them for the other workflows as well, however the astroid files aren't modified in any way after the install.

Comment thread .github/workflows/codspeed.yaml Outdated

permissions:
contents: read
id-token: write

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to earlier, move this to the job itself.

Comment thread .github/workflows/codspeed.yaml Outdated
Comment on lines +39 to +44
- name: Set up Python 3.13
id: python
uses: actions/setup-python@v6.2.0
with:
python-version: "3.13"
check-latest: true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.14?

Comment thread .github/workflows/codspeed.yaml Outdated
run: |
python -m pip install -U pip
pip install pytest-codspeed
pip install -e .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install -e .
pip install .

Comment thread tests/benchmarks/conftest.py Outdated
is set in ``pyproject.toml``, even a warning would abort collection).
"""

from __future__ import annotations

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No really need for that here.

Comment thread tests/benchmarks/conftest.py Outdated
Comment on lines +17 to +20
import importlib.util

if importlib.util.find_spec("pytest_codspeed") is None:
collect_ignore_glob = ["test_bench_*.py"]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess that will work, though in other test files we usually use a try ... except ImportError guard for it.

try:
    import pytest_codespeed
except ImportError:
    collect_ignore_glob = ["test_bench_*.py"]

Comment thread tests/benchmarks/test_bench_endtoend.py Outdated
Comment on lines +11 to +19
- ``test_bench_endtoend_cold_lint`` — shells out
``python -m pylint <minimal module>`` per iteration. ~98 % of wall
time is startup (Python import, pylint init, ``import astroid``,
brain-plugin registration); only ~2 % is actual linting work on the
one-line target. This captures cold-start cost that the in-process
benches below miss: within a single pytest session ``astroid`` is
imported once at module load and stays in ``sys.modules``, so
optimizations like deferring brain-plugin registration are invisible
to those benches.

@cdce8p cdce8p May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that we might end up optimizing the wrong thing. Yes, startup time is important as well, however with lazy imports begin available in 3.15 there seems to be a focus across the ecosystem to use them everywhere. IMO we should be cautious with that when the time comes, especially for modules which get imported anyway.

That's a bit different for brains, etc. which aren't actually needed for some / most projects. Though this might only be a small fraction of the total import time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did some lazy import in https://github.com/pylint-dev/astroid/pull/3062/changes, I think those one are useful, and would have been detected because the logging is very seldom used (only if there's C level warning during parsing).

@Pierre-Sassoulas

Copy link
Copy Markdown
Member Author

Thank you for the review Marc ! Appreciate it.

Pierre-Sassoulas added a commit that referenced this pull request May 24, 2026
…t, bench scope)

Review feedback from cdce8p on #3079:

- Move 'id-token: write' from workflow-level permissions down to the
  jobs.benchmarks.permissions block in both CodSpeed workflows so the
  token is only minted for the job that actually uploads to CodSpeed.
- Bump setup-python to 3.14 (matches pylint's new default; astroid
  will follow).
- Drop the editable install in both workflows: nothing modifies the
  source tree after install, so 'pip install .' is enough. Updated
  the inline comment in the walltime workflow accordingly.
- conftest.py: switch from importlib.util.find_spec to the
  try/except ImportError idiom used elsewhere in the test tree, and
  drop the unused 'from __future__ import annotations' (no
  annotations in this file).
- test_bench_endtoend.py: add a scope note on the cold-start bench.
  The point is to *protect* targeted lazy imports for rare-path
  modules (brain plugins not used by every project; debug-only
  stdlib like pprint / logging deferred in #3062). Those function-
  local / TYPE_CHECKING imports work on every supported Python and
  do not depend on PEP 810 lazy imports landing in 3.15. The bench
  is explicitly *not* an argument for lazifying modules astroid
  imports unconditionally.

@DanielNoord DanielNoord left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much experience with CodSpeed, but most of it looks fine to me I guess?

Comment thread tests/benchmarks/test_bench_endtoend.py Outdated
# For details: https://github.com/pylint-dev/astroid/blob/main/LICENSE
# Copyright (c) https://github.com/pylint-dev/astroid/blob/main/CONTRIBUTORS.txt

"""End-to-end benchmarks: cold import + parse + walk + infer real projects.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this docstring will get outdated as soon as we start changing stuff. Is it really necessary to document this in this much detail?

Comment thread tests/benchmarks/test_bench_endtoend.py Outdated
benchmark(_pylint_one_file, minimal_module)


# -- Flask: small, parse + walk_infer (parse isolates rebuilder cost). --

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line comments like this also tend to get out of date

Comment thread tests/benchmarks/test_bench_endtoend.py Outdated
# -- Flask: small, parse + walk_infer (parse isolates rebuilder cost). --


def test_bench_endtoend_parse_flask(benchmark, flask_files: list[Path]) -> None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No types for benchmark?

timeout-minutes: 15
permissions:
contents: read
id-token: write

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Comment thread .github/workflows/codspeed.yaml Outdated
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be in a public repo, I removed, let's see how it goes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like it worked. I fixed your other remarks in be68e14

Three benchmarks based on pylint's primer corpus, with SHAs pinned to
match what pylint sees nightly:

- test_bench_endtoend_parse_flask        (rebuilder hot path, ~50 .py)
- test_bench_endtoend_walk_infer_flask   (pylint-shaped traversal)
- test_bench_endtoend_walk_infer_black   (medium scale, ~250 .py)

The benchmarks call astroid's API directly (ast_from_file +
nodes_of_class + safe_infer on Call/Attribute/Name) so they mirror
what a pylint checker does without depending on pylint.

The fixture uses 'git clone --no-checkout --filter=blob:none --no-tags
--single-branch' plus 'sparse-checkout' so only the declared source
subdirs are fetched.

Workflow uses 'mode: simulation' so the CodSpeed dashboard provides
per-function attribution; 'CodSpeedHQ/action' is allowlisted in repo
Actions settings.

conftest.py uses 'collect_ignore_glob' to skip the directory cleanly
when pytest-codspeed is not installed locally.

Drops the 19 microbenchmarks the CodSpeed wizard generated: at
microsecond scale they showed >40 % StdDev on CI, too noisy for
regression detection, and their coverage is subsumed by the
end-to-end benches.

Local smoke (one iter each, no --codspeed): 3 passed in 11.6s.

Supersedes #3022. Refs #2014.
…t, bench scope)

Review feedback from cdce8p on #3079:

- Move 'id-token: write' from workflow-level permissions down to the
  jobs.benchmarks.permissions block in both CodSpeed workflows so the
  token is only minted for the job that actually uploads to CodSpeed.
- Bump setup-python to 3.14 (matches pylint's new default; astroid
  will follow).
- Drop the editable install in both workflows: nothing modifies the
  source tree after install, so 'pip install .' is enough. Updated
  the inline comment in the walltime workflow accordingly.
- conftest.py: switch from importlib.util.find_spec to the
  try/except ImportError idiom used elsewhere in the test tree, and
  drop the unused 'from __future__ import annotations' (no
  annotations in this file).
- test_bench_endtoend.py: add a scope note on the cold-start bench.
  The point is to *protect* targeted lazy imports for rare-path
  modules (brain plugins not used by every project; debug-only
  stdlib like pprint / logging deferred in #3062). Those function-
  local / TYPE_CHECKING imports work on every supported Python and
  do not depend on PEP 810 lazy imports landing in 3.15. The bench
  is explicitly *not* an argument for lazifying modules astroid
  imports unconditionally.
Pierre-Sassoulas added a commit that referenced this pull request Jun 6, 2026
Per Daniel's review on #3079:
- Trim the module docstring to what stays true (what the file benches
  + primer-corpus link); the removed prose (startup percentages, PEP
  810, StdDev figures) would rot as the benches change.
- Drop the `# -- section --` divider comments; the test names already
  say what each group does.
- Annotate the `benchmark` fixture as `BenchmarkFixture`, imported
  under TYPE_CHECKING since pytest-codspeed is an optional dependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pierre-Sassoulas added a commit that referenced this pull request Jun 6, 2026
astroid is a public repo, so CodSpeed can upload results without
authentication; the id-token: write permission (for OIDC auth) is not
required. Removing it also lets the redundant job-level permissions
block go, since the top-level `contents: read` already covers the job.

Try without it per Daniel's "why is this necessary?" review on #3079;
if uploads fail, re-add `id-token: write` at the job level.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 49bb76b to 2fb0a52 Compare June 6, 2026 05:48
Per Daniel's review on #3079:
- Trim the module docstring to what stays true (what the file benches
  + primer-corpus link); the removed prose (startup percentages, PEP
  810, StdDev figures) would rot as the benches change.
- Drop the `# -- section --` divider comments; the test names already
  say what each group does.
- Annotate the `benchmark` fixture as `BenchmarkFixture`, imported
  under TYPE_CHECKING since pytest-codspeed is an optional dependency.
astroid is a public repo, so CodSpeed can upload results without
authentication; the id-token: write permission (for OIDC auth) is not
required. Removing it also lets the redundant job-level permissions
block go, since the top-level `contents: read` already covers the job.

Try without it per Daniel's "why is this necessary?" review on #3079;
if uploads fail, re-add `id-token: write` at the job level.
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the codspeed-wizard-1774989768270 branch from 2fb0a52 to 7e482f9 Compare June 6, 2026 05:51
@Pierre-Sassoulas Pierre-Sassoulas merged commit c50a1f4 into main Jun 6, 2026
39 checks passed
@Pierre-Sassoulas Pierre-Sassoulas deleted the codspeed-wizard-1774989768270 branch June 6, 2026 21:41
@Pierre-Sassoulas Pierre-Sassoulas restored the codspeed-wizard-1774989768270 branch June 6, 2026 21:42
@Pierre-Sassoulas Pierre-Sassoulas deleted the codspeed-wizard-1774989768270 branch June 6, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement ✨ Improvement to a component Maintenance Discussion or action around maintaining astroid or the dev workflow topic-performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants