-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Switch to the shared Guards library #20485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR switches the C++ CodeQL library from its own guards library to the shared guards library that was introduced previously. The main purpose is to reduce C++-specific QL code maintenance and enable C++ to use the shared "nullness" library. This change also improves query results on existing queries, particularly reducing false positives in the cpp/missing-check-scanf
query.
Key changes include:
- Replacing the old C++-specific guards implementation with the shared guards library
- Updating guard value types from
AbstractValue
toGuardValue
- Making guards extend
Element
instead ofExpr
to support parameters as guards - Improving the guards logic to provide better query results
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
GuardsEnsure.expected | Updated test expectations showing improved guard inference with more comprehensive results |
GuardsControl.ql | Changed parameter type from AbstractValue to GuardValue |
GuardsControl.expected | Expanded test results with additional guard control relationships |
GuardsCompare.ql | Updated parameter type from AbstractValue to GuardValue |
GuardsCompare.expected | Extended comparison results with more comprehensive guard comparisons |
Guards.ql | Removed simple guard selection query |
Guards.expected | Removed corresponding test expectations |
tests.ql | Updated parameter types to use GuardValue |
TOCTOUFilesystemRace.ql | Fixed guard child access by casting to Expr |
SSLResultConflation.ql | Fixed guard child access by casting to Expr |
SemanticExprSpecific.qll | Updated method call from controlsEdge to controlsBranchEdge |
Instruction.qll | Added getAnInput() method to BinaryInstruction |
EdgeKind.qll | Enhanced switch edge handling with better value range support |
SsaImpl.qll | Major updates to SSA implementation with improved guard integration |
IRGuards.qll | Complete rewrite using shared guards library with expanded functionality |
RangeAnalysis.qll | Updated method call to use controlsBranchEdge |
} | ||
|
||
private import semmle.code.cpp.dataflow.new.DataFlow::DataFlow as DataFlow | ||
private import semmle.code.cpp.ir.dataflow.internal.DataFlowPrivate as Private |
Check warning
Code scanning / CodeQL
Names only differing by case Warning
…nIndirectUse'. This will solve a non-monotonic recursion issue later.
…rk after instantiating the shared guards library.
c99ab52
to
6fe3e83
Compare
… a guard condition.
No. Parameters can't be Guards. But |
Oh. I guess there's a subtle thing for C/C++ here then: In the IR world a parameter is an Is that ... bad? |
@aschackmull thanks for all the review comments so far! I think I've fixed all of them, and DCA still looks fine. Do you have any other comments before I hand it over to the C team? |
* Holds if this expression is a C/C++ specific constant value such as | ||
* a GCC case range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This qldoc needs a tweak, as we've purged case ranges from constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed in ca53a8e
No, I don't think so (apart from the two minor qldoc comments), so please pass it on. I've only reviewed the parts that integrate directly with the shared library, so the C-specific bits and the tests still need a proper review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look, skipping over 840097f. This looks reasonable to me.
I was not really able to follow e22d665. I'm not sure if @aschackmull looked at that commit?
Regarding DCA:
- Did you look at all the alert changes. In particular, why do we gain an alert in one place?
- neovim seems quite a bit slower, do we understand why?
One small question below.
| test.cpp:177:10:177:10 | Load: i | test.cpp:175:23:175:23 | ValueNumberBound | 1 | false | CompareLT: ... < ... | test.cpp:176:7:176:11 | test.cpp:176:7:176:11 | | ||
| test.cpp:179:10:179:10 | Load: i | test.cpp:175:23:175:23 | ValueNumberBound | 0 | true | CompareLT: ... < ... | test.cpp:176:7:176:11 | test.cpp:176:7:176:11 | | ||
| test.cpp:183:10:183:10 | Load: i | test.cpp:175:23:175:23 | ValueNumberBound | -1 | true | CompareLT: ... < ... | test.cpp:182:9:182:13 | test.cpp:182:9:182:13 | | ||
| test.cpp:185:10:185:10 | Load: i | test.cpp:175:23:175:23 | ValueNumberBound | 0 | true | CompareLT: ... < ... | test.cpp:176:7:176:11 | test.cpp:176:7:176:11 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the only place where we lose a result, why's that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. Since this tests irreducible control-flow there's probably something going wrong with back-edge detection inside the shared library. I'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this wasn't related to irreducible control-flow at all. There's a slight difference in which basic block is picked as directly controlling another block when we have empty blocks. This happens when there are empty blocks (or a block containing only a goto) just like we have in this test:
// (0)
if (x < i) {
// (1)
} else {
// (2)
goto inLoop;
}
// (3)
The old library concluded that the edge from (1)
to (3)
was controlled by x < i
, whereas the new library only concludes that the edge from (0)
to (1)
is controlled by x < i
.
The new range analysis library handles this perfectly well (as seen in the test I added in 353ee8b), but the old experimental one apparently depends on this behavior.
I don't think this is worth digging more into if that's okay with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is worth digging more into if that's okay with you.
That's fine!
It looks reasonable to me. The purpose of that commit is to connect the Guards library to BarrierGuards such that data flow sanitizers hidden in a wrapping function now are recognized. I believe this is verified with a newly added qltest. |
Co-authored-by: Anders Schack-Mulligen <aschackmull@users.noreply.github.com>
Yep, we gain a new TP on
Thanks for pointing that out. This seems to be from these two antijoins (which ultimately come from the same piece of QL):
the relevant part of the evaluator log is here:
So it seems like the forex here |
I can't seem to find a proper join order that doesn't lead to a join between the sets private predicate foo(SsaDefinition v, int n) {
n =
strictcount(Expr e, GuardValue k, Expr other, GuardValue otherval |
possibleValue(v, false, e, k) and
not possibleValue(v, true, e, k) and
possibleValue(v, _, other, otherval)
) and
n > 1
} (that's the stupid lexer in neovim showing up again 😂) @aschackmull do you have any ideas? Otherwise, I think we just have to live with it for now. (If you want to repro it you can run |
…s in the non-experimental "new" range analysis.
So I think I have a partial fix - it doesn't exactly solve the problem, but it does cut down on one aspect of the cartesian join. Since we're doing a forall over all other expressions to check whether they have disjoint values, then we can split this into a case for those other expressions with the same value and those with different values. The former can be expressed as a
|
Hold on! This predicate is built on top of |
Even the first fix improves things quite a lot:
I'll create a local branch containing this PR and #20569 and do another DCA run to see if this fixes things 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In fine with this getting merged. I do observe that this is another case where we do see longer analysis times, and we've been creeping up over the last few months for various reasons. Here zeek-spicy is close to 10%. I wonder if there's something we can do to regain at least some performance.
Totally agreed. I'll take a look at zeek-spicy as well once this is merged 👍 |
… Notice that this leaves a missing result because the guard condition logic is now better.
… Notice that this leaves a missing result because the guard condition logic is now interprocedural.
…guards C++: Fix queries I forgot after merging github/codeql#20485.
This PR switches C/C++ away from our own guards library and over to the shared guards library which was introduced in #19573.
The main motivation for this is two-fold: It reduces the amount of C/C++ specific QL code that needs to be maintained 1. It also enables C/C++ to instantiate the newly shared "nullness" library which was introduced in #20367. This should hopefully open the doors to better null dereference / use-after-free / double free queries.
On top of that, the improvements to the internal guards logic also means much better query results on existing queries. In particular, the
cpp/missing-check-scanf
query has lost a tons of FPs 🎉.Commit-by-commit review extremely recommended. It's a large PR, but I actually think I managed to make it fairly reviewable 🤞.
This PR does bring a syntactic breaking changes which was just decided a couple of weeks ago would be okay for CodeQL going forward. The breaking change is that the AST wrapper around the IR-based guards library now extends
Element
instead ofExpr
. This is because the shared library also infers thatParameter
s (which are notExpr
s in the AST) can be guards if the parameter determines a condition.In fact, whereas the old library only made the condition (and certain subexpressions inside the condition) were
Guard
s, the new library makes every expression aGuard
, but only some of thoseGuard
s actually controls conditions. This is a fundamental difference that's always been between the C/C++ guards library and the Java/C# guards library, and we're finally getting rid of that difference which brings some more language consistency 🎉.Footnotes
Although, there is still lots of maintenance required because C/C++ still relies on our own implication of
ensuresEq
andensuresLt
. Getting rid of those requires switching C/C++ fully over to the shared range analysis library. ↩