Port SVF to LLVM 21#1813
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1813 +/- ##
==========================================
+ Coverage 64.07% 64.19% +0.12%
==========================================
Files 249 249
Lines 24901 25037 +136
Branches 4712 4744 +32
==========================================
+ Hits 15955 16073 +118
- Misses 8946 8964 +18
🚀 New features to boost your workflow:
|
|
Thanks for the contribution. Could you also add a github action to trigger the build and tests? |
|
one important CI concern I want to flag about the Test-Suite:
however, if so the situation is: |
|
@bjjwwang could you help take a look at this pull request? |
|
Test-Suite PR @SVF-tools/Test-Suite#184 has been updated and should now show CI directly. relevant commits on the Test-Suite PR branch: could you please verify this SVF PR against the updated Test-Suite PR branch |
If test-suite bcs have been updated to llvm-21, we could use SVF's original CIs, and no need for a separate llvm-21.yml? |
Make the normal SVF GitHub Actions path check out Test-Suite commit 91519b72, the merge commit for Test-Suite PR SVF-tools#185, before running build.sh. That keeps SVF PR SVF-tools#1813 tied to the exact upstream Test-Suite state that switched CI bitcode generation to LLVM 21.1.0, instead of depending on whatever happens to be at Test-Suite master later.
|
this PR has been updated to support the latest upstream Test-Suite on LLVM 21, and the branch has been revalidated end to end,
please review the updated LLVM 21 compatibility changes and the analysis-side fixes. the branch is ready for another pass. |
|
Thanks for the updates. The external API changes look quite big. Could you please explain why those changes are necessary and possibly simplify the current pull request (as it involves 18 files). |
|
Thanks for the review @yuleisui — agreed the diff is too broad. The current PR mixes: I'll restructure as follows over the next few days:
What remains will be the real LLVM 21 surface: StringRef::equals removal, the CloneFunctionInto parent-attachment fix, the byte-offset rewrites in getBaseValueForExtArg / addComplexConsForExt / computeGepOffset (required because LLVM 21 emits opaque-pointer i8-byte GEPs), and the build/CI bumps. |
| } | ||
| else | ||
| { | ||
| entryFunctions.push_back(fun); |
There was a problem hiding this comment.
This step introduces functions that have external callers, meaning functions that should not be treated as analysis entries.
The table funToWTO contains:
-
All non‑recursive functions : Functions that do not belong to any recursive SCC in the call graph.
-
Recursive functions that have at least one external caller: These are functions inside a recursive SCC but are invoked from outside that SCC. They serve as entry points into recursive components and therefore require their own WTO construction.
There was a problem hiding this comment.
you're right. Iterating preAnalysis->getFuncToWTO() picks up SCC-entry recursive functions (which have external callers from outside their SCC) and treats them as analysis entries, which is wrong for whole-program AE.
I've reworked this in the latest revision: collectProgEntryFuns now walks the call graph directly and only collects functions with empty in-edges (no callers at all), prioritizing main() at the front of the duque. That matches the program-entry semantics the function name implies, with no SCC contamination.
There was a problem hiding this comment.
Thanks for the review. Just a heads-up — this file is no longer in the PR diff after the simplification..
The earlier collectProgEntryFuns rework turned out to be addressing the wrong layer. The real failure on safe_ptr_array_access was that LLVM 21 emits the first element of a pointer-array initialiser as a direct opaque-pointer store (store ptr %v, ptr %arr, no GEP), so SVF was writing the array base while later loop GEPs read field object base_0 under -ff-eq-base=false. That's now fixed at the SVFIRBuilder layer in 46e9a345 ("Handle LLVM 21 opaque-pointer array accesses in SVFIRBuilder") via a synthetic field-zero GEP value when the access matches an inferred [K x ptr] base. AE entry-point semantics are untouched in this PR.
Your funToWTO observation still stands as a separate concern in AbstractInterpretation.cpp and would be worth a follow-up issue/PR on master — outside the LLVM 21 port scope.
Update Dockerfile, build.sh, and setup.sh to point at LLVM 21.1.0 prebuilt artifacts. The dyn_lib/RTTI default path on Ubuntu (x86_64 and aarch64) now resolves to the 21.1.0 RTTI tarballs; the source- only fallback URL is also retained for non-Ubuntu hosts.
Adapt the svf-llvm bridge to LLVM 21's API surface while keeping existing semantics intact. Touches only build glue, headers, and mechanical API call sites; no analysis behavior changes. * BasicTypes.h / LLVMUtil.h: include the headers LLVM 21 no longer pulls in transitively, and update signatures whose argument or return types changed. * LLVMModule.cpp: replace removed StringRef::equals with operator==, switch DataLayout(Module*) construction to Module::getDataLayout(), and detach the CloneFunctionInto destination function before cloning (LLVM 21 requires the destination to be parentless or share the source's parent module). The cloned function is reattached to the app module via getFunctionList().push_back() afterwards. The UnifyFunctionExitNodesPass include path moved on LLVM 17+. * LLVMUtil.cpp / svf-ex.cpp: minor signature alignment and an llvm_shutdown guard. * svf-llvm/CMakeLists.txt: bump the supported LLVM major to 21 and add a scoped -Wno-maybe-uninitialized for a known GCC false positive in LLVMModule.cpp.
LLVM 21's opaque-pointer IR loses the destination element type at the GEP/load/store level, so pointer-array initialisation and access are emitted in three shapes that pre-21 IR never exhibited. Each shape needs explicit modelling in SVFIRBuilder: 1. One-index pointer-typed GEPs into an inferred [K x ptr] base. visitGetElementPtrInst now emits a copy edge for the constant-zero case (gep ptr, ptr %arr, i64 0) and falls through to the normal array-element path for non-zero indices. 2. Byte-offset GEPs into globals. computeGepOffset walks the StructLayout of the inferred object type plus the DataLayout stride to recover the pointed field for accesses that LLVM 21 collapses to flat i8/byte offsets. 3. Direct loads/stores through the array base with no GEP at all. Under -model-arrays=true, LLVM 21 emits the first element of an array initialiser as `store ptr %v, ptr %arr` (no GEP for index zero). A new helper synthesises a field-zero GEP value when the pointer operand is an inferred [K x ptr], the access type matches the element pointer type, and the operand is not already a GEP, so the access lands on field object base_0 instead of the base object. A guarded memcpy-derived base-recovery fallback is also added for the canonical funptr-nested-struct shape, where LLVM 21 lowers a nested struct copy to a byte-layout memcpy and the loaded function pointer would otherwise read an empty points-to set. The fallback only fires when the loaded pointer comes from an alloca whose only relevant initialiser is a memcpy in the same basic block, the copy covers the loaded field, the destination is the alloca, the length is a constant, and there is no intervening write between the memcpy and the load. The CallBase iteration also guards arg_size() before indexing argument operands. Anything more complex falls back to the ordinary loaded value.
memcpy/memmove under LLVM 21 opaque pointers no longer carries a destination element type, so the prior addComplexConsForExt logic that walked source/destination by element index falls back to a single base-to-base copy and drops field-level constraints. This commit re-derives the field constraints by walking the byte layout of the inferred source and destination object types using the module DataLayout. For each byte offset within the copy length the routine resolves the source field, the destination field, and emits the corresponding copy edge. Aggregate types are recursed into via StructLayout; arrays use the element stride. When the inferred type is a single pointer or simple scalar the routine collapses to the original single-edge behaviour, so non-aggregate memcpy patterns are unchanged.
0e11974 to
34153b6
Compare
|
closing this pr(collateral damage from the rename), PR #1815 is OPEN |
Summary
This PR ports SVF to LLVM 21 and restores full test-suite parity on a clean LLVM 21 host.
The changes are intentionally scoped to SVF:
svf-llvmfor LLVM 21 API/build compatibility-read-anderdeserialization so analysis-state restoration no longer crashes on malformed or absent node recordsWhat changed
LLVM 21 compatibility in
svf-llvmStringRef::equals()calls withoperator==DataLayout(Module*)usage toModule::getDataLayout()llvm_shutdown()insvf-exLLVMSvf*libraries when building standalone SVF against an LLVM tree that already contains an in-tree SVF integrationLLVMModule.cppcleanup for the LLVM 21 pathRobust
-read-anderrestorationValidation
Validated on the clean LLVM 21 host/build:
1528 / 1528passedCommits
b8beb7d7Update svf-llvm for LLVM 216ccf4b43Harden read-ander result restorationNotes
This PR is limited to the LLVM 21 port and test-suite correctness. The larger extapi or LTO import redesign can build on the helper seams introduced here.