Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process superclass methods before subclass methods in semanal #18723

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ilevkivskyi
Copy link
Member

Fixes #7162

See also discussion in #18674 for another situation when this causes problems (deferrals). In general this problem is probably quite rare, but it bugs me, so I decided to go ahead with a simple and explicit (even though a bit ugly) solution.

@ilevkivskyi
Copy link
Member Author

Hm, for some reason tests didn't start, I will try closing and re-opening.

@ilevkivskyi ilevkivskyi reopened this Feb 22, 2025

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

Oh well, it looks like this PR causes mypyc compiled mypy to segfault when running some tests.

@ilevkivskyi
Copy link
Member Author

And as I guessed the error happens in one of those Bogus things, more precisely in CPyDef_semanal___SemanticAnalyzer___qualified_name (in PyUnicode_Concat one of the args is NULL or something).

@ilevkivskyi
Copy link
Member Author

Actually it is much more tricky than that, fullname etc. are no longer Bogus instead empty strings are used. Looking at gdb, it seems that two totally valid strings are passed to PyUnicode_Concat but it segfaults. Maybe something is wrong with refcounting? I will try to dig a bit more with Python debug build.

return -1
if right_info in left_info.mro:
return 1
return 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to change the order of processing targets, even if derived classes are always after base classes (i.e. current ordering is already fine). I suspect that this will break the current SCC ordering algorithm, which we probably rely on in a bunch of places, and it could explain why things are failing. I think we must mostly follow the SCC ordering or we will have a bunch of weird regressions and generally a bad time.

Here's one potential way to fix this so this only changes the order when necessary:

  • Create a linear list of targets, similar to what you currently have.
  • Collect a set of all TypeInfos in the targets (e.g. all active_type values).
  • Iterate over the targets, and keep track of which TypeInfo's we've processed by removing the TypeInfo set created in the previous step. If we encounter a TypeInfo which has some MRO item that is in the set of TypeInfos, move that to a separately list (deferreds) instead of processing now.
  • After having iterated all targets, iterate over the deferred items.

The above approach could possibly be made even better by processing deferred nodes immediately after all the MRO entries have been processed, instead of waiting for all targets to be processed.

This has the benefit of not changing the processing order if it's already correct, and if it's incorrect, only the impacted targets will get rescheduled. This also could be a bit faster, since we perform a linear scan instead of a sort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JukkaL

This seems to change the order of processing targets, even if derived classes are always after base classes (i.e. current ordering is already fine)

I don't think I am following. Can you give an example of when this happens? I actually did a diff on full target list for mypy self check (including stdlib), and it is tiny, only few things that actually matter were changed (like e.g. couple visitors in mypy.types vs mypy.type_visitor).

Even then, how order of processing of method bodies can be so important? (All the top levels, including ClassDefs, are already processed at this point).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I misunderstood how the ordering works. So it's probably fine. Changing the ordering of methods "shouldn't" change much, but it's just a very scary change that could trigger some pre-existing bugs or limitations. But if this only changes ordering very slightly, it should be fine.

Can you also manually test this when you import torch and numpy? At least torch has a massive import cycle which should be a good test case.

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 24, 2025

Maybe something is wrong with refcounting? I will try to dig a bit more with Python debug build.

Using a debug build is a good idea. Reference counting has been quite stable for a long time, but it's possible that something is still misbehaving.

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@ilevkivskyi
Copy link
Member Author

@JukkaL

Using a debug build is a good idea. Reference counting has been quite stable for a long time, but it's possible that something is still misbehaving.

It looks like something is wrong with unpacking of tuples. Replacing unpacking with indexing fixes the segfaults (see last commit). I still don't have any small repro, but looking at this comment

# Special-case multiple assignments like 'x, y = expr' to reduce refcount ops.

it seems to me this may be caused by #16022

@ilevkivskyi
Copy link
Member Author

@JukkaL somewhat weird test to reproduce the segfault

[case testTupleUnpackingInCallback]
def f(x: tuple[str, int], y: tuple[str, int]) -> int:
    _, xi = x
    _, yi = y
    return 0

[file driver.py]
from native import f
from functools import cmp_to_key

xs = [("x" * i, i) for i in range(100)]
assert sorted(xs, key=cmp_to_key(f))[-1] == 99

@ilevkivskyi
Copy link
Member Author

Another test case (a bit less sketchy) shows that the problem appears if one of the unpacking targets is unused in the function

[case testTupleUnpackingInCallback]
def f(x: tuple[str, int], y: tuple[str, int]) -> int:
    a, xi = x
    _, yi = y
    if a == "":
        return 0
    return 0

[file driver.py]
from native import f
from functools import cmp_to_key

xs = [("x" * i, i) for i in range(100)]
xs = sorted(xs, key=cmp_to_key(f))
print(xs[1])
print(xs[2])

@ilevkivskyi
Copy link
Member Author

OK, sorry for spamming, last message until I (or you) fix this, finally a self-container repro for the segfault

[case testTupleUnpackingInCallback]
def f(x: tuple[str, int]) -> int:
    a, xi = x
    return 0

[file driver.py]
from native import f

xs = [("x" * i, i) for i in range(100)]
xs = [x for x in xs if f(x) == 0]
print(xs[1])
print(xs[2])

@ilevkivskyi
Copy link
Member Author

@JukkaL unless I am missing some other edge case, I think #18732 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Processing order of methods affects inferred attribute types
2 participants