-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ty] replace FxHash{Map, Set} with footgun-mitigated APIs
#21686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
|
|
006a794 to
a1eb5bd
Compare
|
|
||
| struct AllMembers<'db> { | ||
| members: FxHashSet<Member<'db>>, | ||
| members: FxIndexSet<Member<'db>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to use FxIndexSet here.
| @@ -32,7 +31,7 @@ pub(super) fn check_class<'db>(context: &InferContext<'db, '_>, class: ClassLite | |||
| } | |||
|
|
|||
| let class_specialized = class.identity_specialization(db); | |||
| let own_class_members: FxHashSet<_> = | |||
| let own_class_members: FxIndexSet<_> = | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use FxIndexSet here.
|
|
||
| let mut typevars = FxHashSet::default(); | ||
| // We should use `FxIndexSet` here since `BoundTypeVarInstance::{valid, required}_specializations` is query-dependent. | ||
| let mut typevars = FxIndexSet::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use FxIndexSet here.
The rest of this module seems okay to use unstable iterators, unless I'm overlooking something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more what query-dependent means?
Looking at the loop below, it doesn't seem to depend on ordering as it returns true only if all typevars satisfy the constraints and there's no state between the type var checking, as far as I can tell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{valid, required}_specialization uses lazy_{bound, constraints} internally. I meant that the order in which these queries are called is non-deterministic.
crates/ty_project/src/files.rs
Outdated
| #[derive(Debug, get_size2::GetSize)] | ||
| struct IndexedInner { | ||
| files: FxHashSet<File>, | ||
| files: FxIndexSet<File>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use FxIndexSet here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should because there's nothing guaranteeing that we insert files in a deterministic order. See my other comments.
| pub(crate) enum InferableTypeVars<'a, 'db> { | ||
| None, | ||
| One(&'a FxHashSet<BoundTypeVarIdentity<'db>>), | ||
| One(&'a FxIndexSet<BoundTypeVarIdentity<'db>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use FxIndexSet here.
|
Unfortunately, it appears that non-determinism remains due to factors entirely different from those considered in this PR. Anyway, I'm marking this PR as ready for review. |
|
I like the approach, but I don't think using an IndexMap is the solution. Iterating over two The issue we see with fix point is that it's no longer guaranteed that the elements are inserted in the same order. However, we'll have the exact same issue when using I'm not sure what the solution here is, other than applying some sort of sorting (somewhere?). |
|
There's also one flaky diagnostic. What I suspect is that our convergence functions are now sensitive to which query is the outer-most cycle or some query that bails early as soon as it sees the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed all the changes but I don't think the StableKey assumption is safe.
I do think it makes sense to have a hash map wrapper and this is also what rustc does https://github.com/rust-lang/rust/blob/ae90dcf0207c57c3034f00b07048d63f8b2363c8/compiler/rustc_data_structures/src/stable_map.rs#L45
Another solution (maybe less invasive?) could be to initialize the hashser function with a random state (instead of 0) in debug builds, so that iteration order is guaranteed to be different across runs (or have a feature flag that we can turn on)
crates/ty_project/src/files.rs
Outdated
| #[derive(Debug, get_size2::GetSize)] | ||
| struct IndexedInner { | ||
| files: FxHashSet<File>, | ||
| files: FxIndexSet<File>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should because there's nothing guaranteeing that we insert files in a deterministic order. See my other comments.
|
re: randomizing layouts: if you want to go down that route I suggest cribbing from rustc's |
1f605d3 to
1966a1c
Compare
1966a1c to
47570b9
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm leaning towards changing the hash maps in separate PRs and more explicitly talk about why the changes are necessary. I'm not convinced that it's necessary to use FxIndexMap in many cases.
|
|
||
| let mut typevars = FxHashSet::default(); | ||
| // We should use `FxIndexSet` here since `BoundTypeVarInstance::{valid, required}_specializations` is query-dependent. | ||
| let mut typevars = FxIndexSet::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more what query-dependent means?
Looking at the loop below, it doesn't seem to depend on ordering as it returns true only if all typevars satisfy the constraints and there's no state between the type var checking, as far as I can tell
| /// List all members of a given type: anything that would be valid when accessed | ||
| /// as an attribute on an object of the given type. | ||
| pub fn all_members<'db>(db: &'db dyn Db, ty: Type<'db>) -> FxHashSet<Member<'db>> { | ||
| pub fn all_members<'db>(db: &'db dyn Db, ty: Type<'db>) -> FxIndexSet<Member<'db>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this anywhere outside the LSP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be used in ty_python_semantic as well (https://github.com/mtshiba/ruff/blob/stable-iteration/crates/ty_python_semantic/src/types/function.rs#L1464-L1467). In that case, however, an unstable iterator is not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is just for internal use so that we can test the function itself in our mdtests; it's not a public-facing part of ty_python_semantic
0b6f880 to
64267bf
Compare
FxHash{Map, Set} as query inputFxHash{Map, Set} with footgun-mitigated APIs
Summary
from astral-sh/ty#1670
If a data structure that depends on salsa IDs is used as the key for an
FxHashMapor the value of anFxHashSet, the output order of the iterator will be unstable. If a query depends on this order, the results of fixed-point iteration will be unstable.In this case,
Fx{Index, Order}{Map, Set}should be used, but it seems that this is not being done thoroughly.Simply replacing all
FxHash{Map, Set}withFx{Index, Order}{Map, Set}would solve the problem, but it is generally believed that the former has slightly better performance if we are only using insertion and retrieval without using a set/map as an iterator.Therefore, this PR proposes a compromise.
That is, replace all
FxHash{Map, Set}used within ty with wrapper structs that does not implement(Into)Iterator, and instead define methods likeunstable_iterto make users of these structs aware of whether iteration operations are safe.After performing this refactoring, I discovered some suspicious parts. I hope this fix will help to resolve the issue.
Test Plan