Port SVF to LLVM 21#1815
Conversation
Update Dockerfile, build.sh, and setup.sh to point at LLVM 21.1.0 prebuilt artifacts. The dyn_lib/RTTI default path on Ubuntu (x86_64 and aarch64) now resolves to the 21.1.0 RTTI tarballs; the source- only fallback URL is also retained for non-Ubuntu hosts.
Adapt the svf-llvm bridge to LLVM 21's API surface while keeping existing semantics intact. Touches only build glue, headers, and mechanical API call sites; no analysis behavior changes. * BasicTypes.h / LLVMUtil.h: include the headers LLVM 21 no longer pulls in transitively, and update signatures whose argument or return types changed. * LLVMModule.cpp: replace removed StringRef::equals with operator==, switch DataLayout(Module*) construction to Module::getDataLayout(), and detach the CloneFunctionInto destination function before cloning (LLVM 21 requires the destination to be parentless or share the source's parent module). The cloned function is reattached to the app module via getFunctionList().push_back() afterwards. The UnifyFunctionExitNodesPass include path moved on LLVM 17+. * LLVMUtil.cpp / svf-ex.cpp: minor signature alignment and an llvm_shutdown guard. * svf-llvm/CMakeLists.txt: bump the supported LLVM major to 21 and add a scoped -Wno-maybe-uninitialized for a known GCC false positive in LLVMModule.cpp.
LLVM 21's opaque-pointer IR loses the destination element type at the GEP/load/store level, so pointer-array initialisation and access are emitted in three shapes that pre-21 IR never exhibited. Each shape needs explicit modelling in SVFIRBuilder: 1. One-index pointer-typed GEPs into an inferred [K x ptr] base. visitGetElementPtrInst now emits a copy edge for the constant-zero case (gep ptr, ptr %arr, i64 0) and falls through to the normal array-element path for non-zero indices. 2. Byte-offset GEPs into globals. computeGepOffset walks the StructLayout of the inferred object type plus the DataLayout stride to recover the pointed field for accesses that LLVM 21 collapses to flat i8/byte offsets. 3. Direct loads/stores through the array base with no GEP at all. Under -model-arrays=true, LLVM 21 emits the first element of an array initialiser as `store ptr %v, ptr %arr` (no GEP for index zero). A new helper synthesises a field-zero GEP value when the pointer operand is an inferred [K x ptr], the access type matches the element pointer type, and the operand is not already a GEP, so the access lands on field object base_0 instead of the base object. A guarded memcpy-derived base-recovery fallback is also added for the canonical funptr-nested-struct shape, where LLVM 21 lowers a nested struct copy to a byte-layout memcpy and the loaded function pointer would otherwise read an empty points-to set. The fallback only fires when the loaded pointer comes from an alloca whose only relevant initialiser is a memcpy in the same basic block, the copy covers the loaded field, the destination is the alloca, the length is a constant, and there is no intervening write between the memcpy and the load. The CallBase iteration also guards arg_size() before indexing argument operands. Anything more complex falls back to the ordinary loaded value.
memcpy/memmove under LLVM 21 opaque pointers no longer carries a destination element type, so the prior addComplexConsForExt logic that walked source/destination by element index falls back to a single base-to-base copy and drops field-level constraints. This commit re-derives the field constraints by walking the byte layout of the inferred source and destination object types using the module DataLayout. For each byte offset within the copy length the routine resolves the source field, the destination field, and emits the corresponding copy edge. Aggregate types are recursed into via StructLayout; arrays use the element stride. When the inferred type is a single pointer or simple scalar the routine collapses to the original single-edge behaviour, so non-aggregate memcpy patterns are unchanged.
|
|
We could use bjjwwang for now and later it could change the rtti under SVF-3.3. |
|
--- /data1/wjw/SVF-Big/results/wpa_llvm18_vs_llvm21/zstd.llvm18.stats 2026-04-30 15:39:57.469848646 +1000 |
|
--- /data1/wjw/SVF-Big/results/wpa_llvm18_vs_llvm21/sqlite.llvm18.stats 2026-04-30 15:39:20.906705257 +1000 |
|
--- /data1/wjw/SVF-Big/results/wpa_llvm18_vs_llvm21/lua.llvm18.stats 2026-04-30 15:40:27.850247449 +1000 |
|
--- /data1/wjw/SVF-Big/results/wpa_llvm18_vs_llvm21/curl.llvm18.stats 2026-04-30 15:40:12.931701860 +1000 |
|
Summary
Ports SVF to LLVM 21 and restores full Test-Suite parity on a clean LLVM 21 host. Supersedes the now-closed #1813 with a simplified diff (
18 → 4 commits,18 → 12 files,+759/-247 → +333/-37).Commits
What's in each
7908f205— Toolchain bump. Dockerfile / build.sh / setup.sh point at LLVM 21.1.0 prebuilts. (See note below on the prebuilt URL host.)eefcc703— svf-llvm API port. Mechanical only:StringRef::equals → ==,DataLayout(Module*) → Module::getDataLayout(),CloneFunctionIntoparent-attachment fix (build clone detached, attach viagetFunctionList().push_back()afterwards — LLVM 21 requires the destination to be parentless or share the source's parent module while cloning), header includes that LLVM 21 no longer pulls in transitively, and a scoped-Wno-maybe-uninitializedfor a known GCC false positive inLLVMModule.cpp. No analysis behavior changes.46e9a345— Opaque-pointer array accesses in SVFIRBuilder. LLVM 21 emits pointer-array initialisation/access in three shapes that pre-21 IR never produced:[K x ptr]base (gep ptr, ptr %arr, i64 0collapses to a copy edge),store ptr %v, ptr %arrfor element zero — synthesised field-zero GEP value when the pointer operand is an inferred[K x ptr]and the access type matches the pointer element type).Also includes a guarded memcpy-derived base-recovery fallback for the canonical
funptr-nested-structshape: only fires when the loaded pointer comes from an alloca whose only relevant initialiser is a memcpy in the same basic block, the copy covers the loaded field, the destination is the alloca, the length is constant, and there is no intervening write between memcpy and load.34153b67— Byte-layout memcpy in SVFIRExtAPI. Opaque-pointer memcpy/memmove walks source/destination by byte offset using the inferred object type plus DataLayout, re-deriving the field-level constraints that pre-21 element-typed memcpy carried implicitly.Notes
Supersedes #1813. The branch was originally named
llvm21-llvmmodule-lto-safetybecause an earlier draft included LTO-related helper extraction; that work was dropped per @yuleisui's review and the branch is now correctly namedllvm21-port-slimto reflect the actual scope.