Skip to content

Port SVF to LLVM 21#1813

Closed
cjsrxzdyzds wants to merge 4 commits intoSVF-tools:masterfrom
cjsrxzdyzds:llvm21-llvmmodule-lto-safety
Closed

Port SVF to LLVM 21#1813
cjsrxzdyzds wants to merge 4 commits intoSVF-tools:masterfrom
cjsrxzdyzds:llvm21-llvmmodule-lto-safety

Conversation

@cjsrxzdyzds
Copy link
Copy Markdown
Contributor

@cjsrxzdyzds cjsrxzdyzds commented Apr 21, 2026

Summary

This PR ports SVF to LLVM 21 and restores full test-suite parity on a clean LLVM 21 host.

The changes are intentionally scoped to SVF:

  • update svf-llvm for LLVM 21 API/build compatibility
  • harden -read-ander deserialization so analysis-state restoration no longer crashes on malformed or absent node records
  • keep the current extapi import path working under LLVM 21 while factoring the logic into dedicated helpers for later LTO-safety follow-up

What changed

LLVM 21 compatibility in svf-llvm

  • replace removed StringRef::equals() calls with operator==
  • migrate DataLayout(Module*) usage to Module::getDataLayout()
  • guard LLVM APIs and types that are no longer available in newer releases
  • guard llvm_shutdown() in svf-ex
  • avoid accidental linkage against in-tree LLVMSvf* libraries when building standalone SVF against an LLVM tree that already contains an in-tree SVF integration

LLVMModule.cpp cleanup for the LLVM 21 path

  • split extapi import and cloning logic into dedicated helpers
  • preserve current behavior while making the import path easier to replace in the planned LLVM 21 LTO-safety follow-up

Robust -read-ander restoration

  • skip empty or malformed serialized records
  • skip missing base or var nodes where appropriate during restore
  • preserve valid restored points-to targets required by later GEP-object restoration
  • eliminate the crash path previously hit by standalone read/write regression tests

Validation

Validated on the clean LLVM 21 host/build:

  • build: passed
  • Test-Suite: 1528 / 1528 passed

Commits

  • b8beb7d7 Update svf-llvm for LLVM 21
  • 6ccf4b43 Harden read-ander result restoration

Notes

This PR is limited to the LLVM 21 port and test-suite correctness. The larger extapi or LTO import redesign can build on the helper seams introduced here.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.19%. Comparing base (08657ad) to head (34153b6).

Files with missing lines Patch % Lines
svf-llvm/lib/LLVMModule.cpp 44.44% 5 Missing ⚠️
svf-llvm/lib/SVFIRExtAPI.cpp 94.02% 4 Missing ⚠️
svf-llvm/lib/SVFIRBuilder.cpp 96.20% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1813      +/-   ##
==========================================
+ Coverage   64.07%   64.19%   +0.12%     
==========================================
  Files         249      249              
  Lines       24901    25037     +136     
  Branches     4712     4744      +32     
==========================================
+ Hits        15955    16073     +118     
- Misses       8946     8964      +18     
Files with missing lines Coverage Δ
svf-llvm/include/SVF-LLVM/LLVMUtil.h 76.27% <100.00%> (ø)
svf-llvm/include/SVF-LLVM/SVFIRBuilder.h 89.81% <ø> (ø)
svf-llvm/lib/LLVMUtil.cpp 72.45% <ø> (-2.65%) ⬇️
svf-llvm/tools/Example/svf-ex.cpp 96.77% <ø> (-0.06%) ⬇️
svf-llvm/lib/SVFIRBuilder.cpp 88.93% <96.20%> (+0.76%) ⬆️
svf-llvm/lib/SVFIRExtAPI.cpp 87.76% <94.02%> (+2.64%) ⬆️
svf-llvm/lib/LLVMModule.cpp 76.11% <44.44%> (+0.03%) ⬆️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjsrxzdyzds cjsrxzdyzds changed the title Port standalone SVF to LLVM 21 Port SVF to LLVM 21 Apr 21, 2026
@cjsrxzdyzds cjsrxzdyzds marked this pull request as ready for review April 21, 2026 05:16
@yuleisui
Copy link
Copy Markdown
Collaborator

Thanks for the contribution. Could you also add a github action to trigger the build and tests?

@cjsrxzdyzds
Copy link
Copy Markdown
Contributor Author

one important CI concern I want to flag about the Test-Suite:

SVF#1813 and SVF-tools/Test-Suite#184 are a matched cross-repo update for the LLVM 21 port. The Test-Suite PR updates committed test fixtures (test_cases_bc/*.bc) to LLVM-21-compatible IR. That is why I temporarily pinned the LLVM 21 workflow in the SVF PR to the Test-Suite PR branch: otherwise Actions was testing the LLVM 21 SVF changes against stale default-branch fixtures;

however, if Test-Suite#184 is merged directly into SVF-tools/Test-Suite master right now, the older SVF CI jobs will likely start failing. The reason is that the non-LLVM21 workflows still clone Test-Suite default branch, but they do not run with LLVM 21. Some of the updated .bc fixtures now contain LLVM-21-only textual IR syntax (for example inrange in constant GEPs), which older LLVM parsers reject.

so the situation is:
  - the Test-Suite changes are needed for correct LLVM 21 validation,
  - but merging them into the shared default fixture set may break the legacy CI jobs unless the workflows are version-isolated first.

@yuleisui
Copy link
Copy Markdown
Collaborator

@bjjwwang could you help take a look at this pull request?

@cjsrxzdyzds
Copy link
Copy Markdown
Contributor Author

Test-Suite PR @SVF-tools/Test-Suite#184 has been updated and should now show CI directly.

relevant commits on the Test-Suite PR branch:

  • 659e2991: switch CI bitcode generation to LLVM 21.1.0
  • 58f95cbd: run Test-Suite CI on pull requests

could you please verify this SVF PR against the updated Test-Suite PR branch

@yuleisui
Copy link
Copy Markdown
Collaborator

Test-Suite PR @SVF-tools/Test-Suite#184 has been updated and should now show CI directly.

relevant commits on the Test-Suite PR branch:

  • 659e2991: switch CI bitcode generation to LLVM 21.1.0
  • 58f95cbd: run Test-Suite CI on pull requests

could you please verify this SVF PR against the updated Test-Suite PR branch

If test-suite bcs have been updated to llvm-21, we could use SVF's original CIs, and no need for a separate llvm-21.yml?

cjsrxzdyzds added a commit to cjsrxzdyzds/SVF that referenced this pull request Apr 26, 2026
Make the normal SVF GitHub Actions path check out Test-Suite commit 91519b72, the merge commit for Test-Suite PR SVF-tools#185, before running build.sh.

That keeps SVF PR SVF-tools#1813 tied to the exact upstream Test-Suite state that switched CI bitcode generation to LLVM 21.1.0, instead of depending on whatever happens to be at Test-Suite master later.
@cjsrxzdyzds
Copy link
Copy Markdown
Contributor Author

cjsrxzdyzds commented Apr 27, 2026

this PR has been updated to support the latest upstream Test-Suite on LLVM 21, and the branch has been revalidated end to end,
what changed at a high level:

  • updated the local validation target to the latest upstream Test-Suite
  • fixed LLVM 21 text-IR compatibility issues in the loader/normalization path
  • fixed extapi cloning compatibility for the newer LLVM path
  • fixed pointer-array / AE regressions exposed by the newer suite
  • fixed CFL virtual-call handling for the affected C++ cases
  • fixed the AE multi-entry regression for recursive library-style entry SCCs
  • removed the std::regex-based IR normalization so GCC 13 no longer fails on a libstdc++ false positive with -Werror
    validation:
  • full LLVM 21 suite passed locally in Release-build-llvm21
  • result: 1959/1959 passed, 0 failed

please review the updated LLVM 21 compatibility changes and the analysis-side fixes. the branch is ready for another pass.

@yuleisui
Copy link
Copy Markdown
Collaborator

Thanks for the updates.

The external API changes look quite big. Could you please explain why those changes are necessary and possibly simplify the current pull request (as it involves 18 files).

@cjsrxzdyzds
Copy link
Copy Markdown
Contributor Author

Thanks for the review @yuleisui — agreed the diff is too broad. The current PR mixes:
(a) the actual LLVM 21 API/IR-shape adjustments,
(b) a LLVMModule::buildFunToFunMap helper extraction I had staged for a follow-up LTO-safety PR,
and (c) AE/CFL/read-ander correctness fixes that the new Test-Suite fixtures exposed but which are independent of the LLVM 21 port itself.

I'll restructure as follows over the next few days:

  • Drop the buildFunToFunMap extraction and the LLVMModuleSet ownership-mode rename — both preserve current behavior and were prep for the LTO follow-up. They'll land with that follow-up instead.
  • Drop the parseIRFileCompat text-IR normalization fallback now that Test-Suite Fix dockerfile #185 ships LLVM 21-native bitcode.
  • Move the AE entry-point / AbstractStateManager / CFL / read-ander / extapi-path fixes into a separate, independently reviewable PR.
  • Squash the CI commits and the three pointer-array commits.

What remains will be the real LLVM 21 surface: StringRef::equals removal, the CloneFunctionInto parent-attachment fix, the byte-offset rewrites in getBaseValueForExtArg / addComplexConsForExt / computeGepOffset (required because LLVM 21 emits opaque-pointer i8-byte GEPs), and the build/CI bumps.
I'll re-validate against the full Test-Suite at every step and ping you when the slimmed branch is ready.

@cjsrxzdyzds cjsrxzdyzds marked this pull request as draft April 28, 2026 05:00
}
else
{
entryFunctions.push_back(fun);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step introduces functions that have external callers, meaning functions that should not be treated as analysis entries.

The table funToWTO contains:

  1. All non‑recursive functions : Functions that do not belong to any recursive SCC in the call graph.

  2. Recursive functions that have at least one external caller: These are functions inside a recursive SCC but are invoked from outside that SCC. They serve as entry points into recursive components and therefore require their own WTO construction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. Iterating preAnalysis->getFuncToWTO() picks up SCC-entry recursive functions (which have external callers from outside their SCC) and treats them as analysis entries, which is wrong for whole-program AE.
I've reworked this in the latest revision: collectProgEntryFuns now walks the call graph directly and only collects functions with empty in-edges (no callers at all), prioritizing main() at the front of the duque. That matches the program-entry semantics the function name implies, with no SCC contamination.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Just a heads-up — this file is no longer in the PR diff after the simplification..

The earlier collectProgEntryFuns rework turned out to be addressing the wrong layer. The real failure on safe_ptr_array_access was that LLVM 21 emits the first element of a pointer-array initialiser as a direct opaque-pointer store (store ptr %v, ptr %arr, no GEP), so SVF was writing the array base while later loop GEPs read field object base_0 under -ff-eq-base=false. That's now fixed at the SVFIRBuilder layer in 46e9a345 ("Handle LLVM 21 opaque-pointer array accesses in SVFIRBuilder") via a synthetic field-zero GEP value when the access matches an inferred [K x ptr] base. AE entry-point semantics are untouched in this PR.
Your funToWTO observation still stands as a separate concern in AbstractInterpretation.cpp and would be worth a follow-up issue/PR on master — outside the LLVM 21 port scope.

Update Dockerfile, build.sh, and setup.sh to point at LLVM 21.1.0
prebuilt artifacts. The dyn_lib/RTTI default path on Ubuntu (x86_64
and aarch64) now resolves to the 21.1.0 RTTI tarballs; the source-
only fallback URL is also retained for non-Ubuntu hosts.
Adapt the svf-llvm bridge to LLVM 21's API surface while keeping
existing semantics intact. Touches only build glue, headers, and
mechanical API call sites; no analysis behavior changes.

* BasicTypes.h / LLVMUtil.h: include the headers LLVM 21 no longer
  pulls in transitively, and update signatures whose argument or
  return types changed.
* LLVMModule.cpp: replace removed StringRef::equals with operator==,
  switch DataLayout(Module*) construction to Module::getDataLayout(),
  and detach the CloneFunctionInto destination function before cloning
  (LLVM 21 requires the destination to be parentless or share the
  source's parent module). The cloned function is reattached to the
  app module via getFunctionList().push_back() afterwards. The
  UnifyFunctionExitNodesPass include path moved on LLVM 17+.
* LLVMUtil.cpp / svf-ex.cpp: minor signature alignment and an
  llvm_shutdown guard.
* svf-llvm/CMakeLists.txt: bump the supported LLVM major to 21 and
  add a scoped -Wno-maybe-uninitialized for a known GCC false
  positive in LLVMModule.cpp.
LLVM 21's opaque-pointer IR loses the destination element type at
the GEP/load/store level, so pointer-array initialisation and access
are emitted in three shapes that pre-21 IR never exhibited. Each
shape needs explicit modelling in SVFIRBuilder:

1. One-index pointer-typed GEPs into an inferred [K x ptr] base.
   visitGetElementPtrInst now emits a copy edge for the constant-zero
   case (gep ptr, ptr %arr, i64 0) and falls through to the normal
   array-element path for non-zero indices.

2. Byte-offset GEPs into globals. computeGepOffset walks the
   StructLayout of the inferred object type plus the DataLayout
   stride to recover the pointed field for accesses that LLVM 21
   collapses to flat i8/byte offsets.

3. Direct loads/stores through the array base with no GEP at all.
   Under -model-arrays=true, LLVM 21 emits the first element of an
   array initialiser as `store ptr %v, ptr %arr` (no GEP for index
   zero). A new helper synthesises a field-zero GEP value when the
   pointer operand is an inferred [K x ptr], the access type matches
   the element pointer type, and the operand is not already a GEP,
   so the access lands on field object base_0 instead of the base
   object.

A guarded memcpy-derived base-recovery fallback is also added for
the canonical funptr-nested-struct shape, where LLVM 21 lowers a
nested struct copy to a byte-layout memcpy and the loaded function
pointer would otherwise read an empty points-to set. The fallback
only fires when the loaded pointer comes from an alloca whose only
relevant initialiser is a memcpy in the same basic block, the copy
covers the loaded field, the destination is the alloca, the length
is a constant, and there is no intervening write between the memcpy
and the load. The CallBase iteration also guards arg_size() before
indexing argument operands. Anything more complex falls back to the
ordinary loaded value.
memcpy/memmove under LLVM 21 opaque pointers no longer carries a
destination element type, so the prior addComplexConsForExt logic
that walked source/destination by element index falls back to a
single base-to-base copy and drops field-level constraints.

This commit re-derives the field constraints by walking the byte
layout of the inferred source and destination object types using
the module DataLayout. For each byte offset within the copy length
the routine resolves the source field, the destination field, and
emits the corresponding copy edge. Aggregate types are recursed
into via StructLayout; arrays use the element stride.

When the inferred type is a single pointer or simple scalar the
routine collapses to the original single-edge behaviour, so
non-aggregate memcpy patterns are unchanged.
@cjsrxzdyzds cjsrxzdyzds force-pushed the llvm21-llvmmodule-lto-safety branch from 0e11974 to 34153b6 Compare April 29, 2026 20:21
@cjsrxzdyzds cjsrxzdyzds deleted the llvm21-llvmmodule-lto-safety branch April 29, 2026 20:54
@cjsrxzdyzds cjsrxzdyzds mentioned this pull request Apr 29, 2026
@cjsrxzdyzds
Copy link
Copy Markdown
Contributor Author

cjsrxzdyzds commented Apr 29, 2026

closing this pr(collateral damage from the rename), PR #1815 is OPEN
@JoelYYoung — flagging that AbstractInterpretation.cpp is no longer in the PR diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants