-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect repeated STL container lookups #123376
Comments
@llvm/issue-subscribers-clang-tidy Author: Oliver Stöneberg (firewave)
@kazutakahirata has been fixing a lot of duplicated lookups with STL containers in recent months - like
c531255
94e9813
I do not know how these were found but it would be great if clang-tidy could detect these. See the history for way more examples: https://github.com/llvm/llvm-project/commits?author=kazutakahirata. |
I think this is a job for the static analyzer. It can keep track of the last lookup for each map, and on lookup check if the saved value equals the new value. I wrote something similar in Infer for a different purpose (removing some false positives). I actually had started writing a CSA |
Yeah, automated discovery would be nice. I'm using There are quite a few variations, but here are some common ones:
"then" and "else" may be flipped:
Checking for membership in advance:
One challenge is to examine the side effects between the membership check and insertion:
Here, |
This is why I think inter-procedural analysis is necessary. Plus all the logic regarding methods invalidating references would probably be easier to implement. |
@llvm/issue-subscribers-clang-static-analyzer Author: Oliver Stöneberg (firewave)
@kazutakahirata has been fixing a lot of duplicated lookups with STL containers in recent months - like
c531255
94e9813
I do not know how these were found but it would be great if clang-tidy could detect these. See the history for way more examples: https://github.com/llvm/llvm-project/commits?author=kazutakahirata. |
Uh, that should be possible to do, but in general really hard to achieve. That said, I hacked together a checker that should work in general - although I don't reason about the number of contained elements in a container. My idea was to look for set/map "lookups", while looking at who could access and mutate the container between the "lookups". If it was mutated or escaped, then conservatively assume that the second "lookup" makes sense and not warn. You can find my alpha checker at my branch. I don't think I'll have the time to ever come back to this checker, so anyone willing to push it forward and publish it as a PR is more than welcomed. Have you used weggli for coming up with a query to find refactoring opportunities? |
Absolutely, this is the biggest challenge. It is very easy to run into iterator invalidation or similar errors when the map ends up being mutated between the lookups. On the other hand, if Clang ever adopts something akin to a borrow checker and that information is available to tidy or the static analyzer, for lifetime annotated code this question will be trivial to answer. This direction is not really relevant now but wanted to mention it just in case it happens in the next couple of years and someone takes a stab at this problem after that. |
Now that I think about it, in my prototype, I should have restricted the checker to only consider double lookups if they are present in the same function body (aka. LocarionContext). |
This is what I was playing around with a couple months ago: MapDoubleLookup.cpp. It was specialized to maps and handling the different methods one-by-one. I might get back to it but not in the immediate future and take things from / build upon your draft.
That's one thing I wanted to do but couldn't figure out how (first time touching CSA). |
@kazutakahirata I was wondering if the query logic could possibly "just" be encapsulated in code to get this somehow started. And then suddenly this appeared: #123734 (I am always amazed by such coincidences). |
No, I've never heard of it, but I just tried it. It's super fast! I'll keep it in my tool box. Thanks! To find repeated map lookups, however, I do care about the type of |
I'm happy to share my clang-query script below, but it just finds
|
I've just pushed a commit restricting the issues only for a single stack frame. So we shouldn't have redundant lookup reports from different stack frames. |
I gave a bit of a polish to my diff --git a/clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp b/clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp
--- a/clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp
@@ -488,15 +488,17 @@ ExplodedGraph::trim(ArrayRef<const NodeTy *> Sinks,
while (!WL2.empty()) {
const ExplodedNode *N = WL2.pop_back_val();
+ auto [Place, Inserted] = Pass2.try_emplace(N);
+
// Skip this node if we have already processed it.
- if (Pass2.contains(N))
+ if (!Inserted)
continue;
// Create the corresponding node in the new graph and record the mapping
// from the old node to the new node.
ExplodedNode *NewN = G->createUncachedNode(N->getLocation(), N->State,
N->getID(), N->isSink());
- Pass2[N] = NewN;
+ Place->second = NewN;
// Also record the reverse mapping from the new node to the old node.
if (InverseMap) (*InverseMap)[NewN] = N; It takes ages to run it on my personal computer, so anyone interested could give it a try.
Given this outcome so far, I'd be interested in running this on the whole clang/llvm. |
This comment has been minimized.
This comment has been minimized.
It would probably be interesting to run it on a version of the code before Kazu started with his cleanups. I think they started around September 3 2024.
Depending on the code CSA can be really slow. For Cppcheck it takes about 2.5x the time the clang-tidy checks need (and the latter might still have one or two hot spots as well). |
I found this using my experimental checker present at: https://github.com/steakhal/llvm-project/tree/bb/add-redundant-lookup-checker The idea for looking for redundant container lookups was inspired by #123376 If there is interest, I could think of upstreaming this alpha checker. (For the StaticAnalyzer sources it was the only TP, and I had no FPs from the checker btw.)
Thanks for the suggestion, I was also thinking about it but you convinced me to actually do it. I reverted the following 19 commits, and then created a small Then, I ran my checker using CodeChecker, as I explained earlier, and looked at the findings:
Overall, I'm content with the checker. |
I've pushed my final changes to the checker. Now, it stringifies the lookup "key" to see if it's any different across the two lookups. Not pretty, but gets the job done for now. I also checked the reporting, to only emit the two notes for the two lookups if finds. I'm convinced that the Static Analyzer is not the right tool for catching these bugs, as in my implementation I gradually relaxed basically everything that would it tie to the symbolic execution engine. Consequently, it would be likely easier to implement the same using a well crafted AST-based checker in clang-tidy. That would also solve the performance problems by not paying for what you don't use - unlike using a CSA checker, where you need to do the full-blown exploded graph exploration to then later just discard the benefits it would bring - like in my checker. So, all in all, I'm happy with my checker (finds TPs, without any FPs), but it's really heavy weight, and this should be done in clang-tidy. |
Frankly, I couldn't hold myself back. :D In theory, this could have FPs due to control-flow or mutation to the container between the lookups, but in practice that doesn't seem to happen too often once I looked at some of the reports my checker produced. I ignore "lookups" within macros, for example I did the validation again on the same set of commits, and my checker could find all of them and more! I pushed my code to the same branch. |
Wow! Thank you so much for all of your efforts into this! Yes, I'm interested in the clang-tidy check. Dramatically narrowing down the scope is very useful in general (that is, all the functions with at least two lookups). I'm not worried too much about false positives. While I haven't tried your code, I am wondering if it's helpful to be able to specify the maximum distance between two lookups in terms of line numbers (say, at most 10 lines apart or something). Let me take some time to catch up on this issue. Thanks again! |
I think it might sense to publish this as a PR to get more eyes on it and gather more feedback. |
I think will be good idea to add next pattern, which I observed on code base at my work:
or
Same for map-like containers with value. |
Posted the PR at #125420. Follow the technical discussion there about the check. |
@kazutakahirata has been fixing a lot of duplicated lookups with STL containers in recent months - like
c531255
94e9813
I do not know how these were found but it would be great if clang-tidy could detect these.
See the history for way more examples: https://github.com/llvm/llvm-project/commits?author=kazutakahirata.
The text was updated successfully, but these errors were encountered: