Skip to content

Conversation

@pieterh-oai
Copy link
Contributor

@pieterh-oai pieterh-oai commented Nov 12, 2025

Summary

Note

This is a stacked PR; the two base commits are PRs #21413 and #21412 . The [ruff][ext-lint] set up linter runtime commit is the actual delta for this PR.

This PR brings in the PyO3 dependencies and sets up per-Rayon-thread, per-distinct-config environments to actually run external linters. The build requires --features ext-lint; without that it does not include the PyO3 dependency.

The AST that gets ‘projected’ to the individual external linters is still quite minimal, but it gets us an end-to-end example (a G004-like “don’t log eagerly interpolated strings” rule).

There are a bunch of constraints here that restrict what we can do:

  • PyO3 doesn’t have support for sub interpreters (or multiprocessing), which would be an easy way to ensure isolation between Rayon threads.
  • If we run a single interpreter, we can access it from multiple Rust threads, but parallelism is limited by having to acquire and release the GIL, and we end up serializing the Python execution across worker threads. (In addition, the overhead of locking and unlocking is significant when running short spurts of Python code, which is exactly our workload if we run individual rules on individual AST nodes.) This is so slow as to be somewhat pointless.

The easiest way to get to a working state is to build PyO3 >= 0.23 against a CPython build with freethreading support. In that case, it’s well-defined for multiple Rust threads to call into the same interpreter and execute code, so long as each OS thread is ‘attached.’

The main downside is that we have limited isolation; we do some module name mangling, but a misbehaving rule could modify Python-side global state that would affect other rules. On the plus side, performance for this implementation is (surprisingly) good.

Python interface: For now, we send over a very barebones node object, and require external rules to implement check_stmt or check_expr that take a node and a context object as input. The goal was to get to a somewhat-minimal running example; this interface will need more work.

Deployment: this requires building against a freethreading CPython (3.13 with --disable-gil or 3.14). To avoid making that a prerequisite for every Ruff build, the ext-lint feature is disabled by default. I added some scripts for doing a CPython 3.13 --disable-gil build; see the test plan.

Test Plan

# manually check Cargo.lock changes
git checkout HEAD^ Cargo.lock
cargo check
# (spot check Cargo.lock for updated version pins)

# test the regular/default build
RUFF_UPDATE_SCHEMA=1 cargo test --workspace --exclude ty
cargo build -p ruff
uvx pre-commit run --all-files --show-diff-on-failure

# if necessary, install and build Python 3.13 --disable-gil
# (from Ruff repo root)
./scripts/nogil/build_python_3_13_nogil.sh ~/cpython_3_13_nogil
source ./scripts/nogil/nogil_env.sh ~/cpython_3_13_nogil

# build and test with feature enabled; then run on some larger repo
cargo build -p ruff --features ext-lint
cargo test --workspace --features ext-lint --exclude ty

cargo build -p ruff --features ext-lint --release
cd large_monorepo
$RUFF_ROOT/target/release/ruff check --select RUFF300 --select-external RLI001 --no-cache .

# ecosystem delta
uvx --from ./python/ruff-ecosystem ruff-ecosystem check "../ruff_baseline/target/debug/ruff" "./target/debug/ruff"

@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 12, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 12, 2025

mypy_primer results

No ecosystem changes detected ✅

No memory usage changes detected ✅

}
})
}
#[cfg(not(Py_GIL_DISABLED))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a holdover from trying to support both gil and non-gil builds. The current state is that only non-GIL really works, so will clean this up.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Nov 12, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

SOURCE_DIR="Python-${PYTHON_VERSION}"
DOWNLOAD_URL="https://www.python.org/ftp/python/${PYTHON_VERSION}/${TARBALL}"

echo "Downloading ${DOWNLOAD_URL}..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use pbs instead of building Python manually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd probably make more sense. Ddoes PBS have configs that include freethreading as a flag already?

@MichaReiser
Copy link
Member

MichaReiser commented Nov 13, 2025

Thank you, this is great. Thanks for taking the time to hack on a prototype. I haven't read the code in detail, so forgive me if any of my feedback or questions should be obvious from the code.

On the plus side, performance for this implementation is (surprisingly) good.

Do you have any numbers you can share. How does the performance of this branch compare to:

  • Ruff without external linters
  • This Ruff version but without any external linters enabled
  • This Ruff version with one external linter enabled

My biggest concern (other than performance) are:

  • The plugin API, specifically the AST API and how to expose semantic information. We are considering a fundamental rewrite of the AST to make Ruff's parser much faster (and also traversing). Exposing Ruff's AST transparently will likely make this much harder or prevent us from doing this refactor at all. There are also some structural changes that we considered making to our AST that would simplify some handling within Ruff (e.g. make suite its own AST node over just a Vec<Stmt>). Again, a public API might prevent us from making those changes in the future. Do you have a sense of how expensive it would be to expose a normalized AST (e.g. a Python ast.parse compatible AST)? I don't consider this to be strictly blocking. E.g we could version our plugin API but exposing plugins now certainly would increase the upfront cost for any such changes (to a point where they simply become infeasible)
  • I believe the current implementation only passes the current AST node without allowing any form of traversal. I think it's essential for plugins to freely traverse the AST (at least downwards)
  • The entire: How to integrate plugins into our settings/CLI is a big open question to me. How do we allow plugin-specific options?
  • I think there's some appetite within Astral to compare different plugin systems. E.g. how do Rust-, WASM-, Python-, and Starlark plugin system compare in performance, expressivness, etc. I think we can only answer this after building a set of prototypes.

@pieterh-oai
Copy link
Contributor Author

pieterh-oai commented Nov 13, 2025

@MichaReiser Thanks for the initial look! I'll keep cleaning this up a little, since it (this PR in particular) is a bit rougher than the
other two.

Performance stuff

I can post some flamegraphs that compare running G004 (with --features ext-lint and without) and logging_linter.py on the same codebase, along with some kinda-anonymized stats. I can also just post up some builds for Mac+Linux so that you can experiment.

AST and general API stuff

The plugin API, specifically the AST API and how to expose semantic information.
[and]
only passes the current AST node

I can put up more code to show what this might look like, but it'll take me a few days, most likely. What I have cooking currently is:

  • Add additional metadata to ast.toml regarding (1) which node types to convert to PyO3+Python classes, and (2) any specific configuration on how to do the conversion (e.g. this field should be eagerly populated; this field is populated lazily only through an attribute access).
  • Extend generate.py to generate the .pyi stubs and the PyO3-build-only intermediate types, as well as the Rustland AST -> Pythonland AST. This is tricky but fairly mechanical once it works for ~20 node types.
  • Most field accesses can probably be 'lazy', but we will need to do some profiling to see how expensive each call from Python back into Rust actually ends up being.
  • Any node type that isn't allowlisted is converted to a RawNode class (which is similar to the current Node in that it just has some basic fields and no ability to traverse parents/children).

...but let me put up the code once it's semi-presentable.

+1 on versioning the API. I actually had a note to add that to the TOML (linters specify which version of the API they were built again).

Integration/Distribution

The entire: How to integrate plugins into our settings/CLI is a big open question to me. How do we allow plugin-specific options?

I don't think I know enough about the whole ecosystem to be helpful here. At minimum, doing --all-features for certain CI steps may become challenging if there are gated builds with different deps.

Aside from that, from my/our point of view, we really want each rule to have some tests, so ideally Ruff would have specific options for that.

Other languages

I think there's some appetite within Astral to compare different plugin systems.

I think that makes sense. We can chat about what are the best options to try. I have code lying around from when I tried RHAI (1 month ago) and RustPython (a few weeks ago).

@pieterh-oai pieterh-oai force-pushed the pieterh/ext-lint-3-interpreter branch from cea3810 to 57f73c9 Compare November 19, 2025 20:20
@pieterh-oai
Copy link
Contributor Author

Rebase; some cleanup:

  • Kept build.rs but got rid of RUFF_PYTHON_HOME and just rely on PyO3 to find a CPython to use
  • Some fixes to nogil_en.sh to fail more cleanly
  • Removed alias for call-callee-regex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants