Skip to content

[WIP] Relative VTables for Rust#144973

Draft
PiJoules wants to merge 1 commit intorust-lang:mainfrom
PiJoules:WIP-relative-vtables
Draft

[WIP] Relative VTables for Rust#144973
PiJoules wants to merge 1 commit intorust-lang:mainfrom
PiJoules:WIP-relative-vtables

Conversation

@PiJoules
Copy link
Copy Markdown
Contributor

@PiJoules PiJoules commented Aug 5, 2025

This is a WIP patch for implementing rust-lang/compiler-team#903. It adds a new unstable flag -Zexperimental-relative-rust-abi-vtables that makes vtables PIC-friendly. This is only supported for LLVM codegen and not supported for other backends.

Early feedback on this is welcome. I'm not sure if how I implemented it is the best way of doing so since much of the actual vtable emission is heavily done during LLVM codegen. That is, the vtable to MIR looks like a normal table of pointers and byte arrays and I really only make the vtables relative on the codegen level.

Locally, I can build the stage 1 compiler and runtimes with relative vtables, but I couldn't figure out how to tell the build system to only build stage 1 binaries with this flag, so I work around this by unconditionally enabling relative vtables in rustc. The end goal I think we'd like is either something akin to multilibs in clang where the compiler chooses which runtimes to use based off compilation flags, or binding this ABI to the target and have it be part of the default ABI for that target (just like how relative vtables are the default for Fuchsia in C++ with Clang). I think the later is what target modifiers do (#136966).

Action Items:

  • I'm still experimenting with building Fuchsia with this to assert it works e2e and I still need to do some measurements to see if this is still worth pursuing.
  • More work will still be needed to ensure the correct relative intrinsics are emitted with CFI and LTO. Rn I'm experimenting on a normal build.

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 5, 2025
@rust-log-analyzer

This comment has been minimized.

@PiJoules PiJoules force-pushed the WIP-relative-vtables branch from 5217fd7 to 0ace3e7 Compare August 5, 2025 22:31
@rust-log-analyzer

This comment has been minimized.

@bjorn3
Copy link
Copy Markdown
Member

bjorn3 commented Aug 6, 2025

I wonder how hard it would be to store true 32bit pointers in the const eval allocation for the vtable. That would avoid all hacks elsewhere around the size mismatch between const eval and runtime.

@PiJoules PiJoules force-pushed the WIP-relative-vtables branch from 0ace3e7 to d58809f Compare August 7, 2025 21:48
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Copy Markdown
Collaborator

bors commented Oct 8, 2025

☔ The latest upstream changes (presumably #147475) made this pull request unmergeable. Please resolve the merge conflicts.

This is a WIP patch for implementing
rust-lang/compiler-team#903. It adds a new
unstable flag `-Zexperimental-relative-rust-abi-vtables` that makes
vtables PIC-friendly. This is only supported for LLVM codegen and
not supported for other backends.

Early feedback on this is welcome. I'm not sure if how I implemented it
is the best way of doing so since much of the actual vtable emission is
heavily done during LLVM codegen. That is, the vtable to MIR looks like
a normal table of pointers and byte arrays and I really only make the
vtables relative on the codegen level.

Locally, I can build the stage 1 compiler and runtimes with relative
vtables, but I couldn't figure out how to tell the build system to only
build stage 1 binaries with this flag, so I work around this by
unconditionally enabling relative vtables in rustc. The end goal I think
we'd like is either something akin to multilibs in clang where the
compiler chooses which runtimes to use based off compilation flags, or
binding this ABI to the target and have it be part of the default ABI
for that target (just like how relative vtables are the default for
Fuchsia in C++ with Clang). I think the later is what target modifiers
do (rust-lang#136966).

Action Items:

- I'm still experimenting with building Fuchsia with this to assert it
  works e2e and I still need to do some measurements to see if this is
  still worth pursuing.
- More work will still be needed to ensure the correct relative
  intrinsics are emitted with CFI and LTO. Rn I'm experimenting on a normal
  build.
@PiJoules PiJoules force-pushed the WIP-relative-vtables branch from d58809f to 6ff8b5f Compare October 30, 2025 22:36
@rust-log-analyzer
Copy link
Copy Markdown
Collaborator

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
All checks passed!
checking python file formatting
27 files already formatted
checking C++ file formatting
/checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp:600:64: error: code should be clang-formatted [-Wclang-format-violations]
extern "C" LLVMValueRef LLVMBuildLoadRelative(LLVMBuilderRef B, LLVMValueRef Ptr,
                                                               ^
/checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp:603:44: error: code should be clang-formatted [-Wclang-format-violations]
  Value *call = unwrap(B)->CreateIntrinsic(
                                           ^
/checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp:604:43: error: code should be clang-formatted [-Wclang-format-violations]
      Intrinsic::load_relative, {Int32Ty}, {unwrap(Ptr), unwrap(ByteOffset)});
                                          ^

clang-format linting failed! Printing diff suggestions:
--- /checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp (actual)
+++ /checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp (formatted)
@@ -596,13 +596,14 @@
     I->setHasAllowReassoc(true);
   }
 }
 
-extern "C" LLVMValueRef LLVMBuildLoadRelative(LLVMBuilderRef B, LLVMValueRef Ptr,
+extern "C" LLVMValueRef LLVMBuildLoadRelative(LLVMBuilderRef B,
+                                              LLVMValueRef Ptr,
                                               LLVMValueRef ByteOffset) {
   Type *Int32Ty = Type::getInt32Ty(unwrap(B)->getContext());
-  Value *call = unwrap(B)->CreateIntrinsic(
-      Intrinsic::load_relative, {Int32Ty}, {unwrap(Ptr), unwrap(ByteOffset)});
+  Value *call = unwrap(B)->CreateIntrinsic(Intrinsic::load_relative, {Int32Ty},
+                                           {unwrap(Ptr), unwrap(ByteOffset)});
   return wrap(call);
 }
 
 extern "C" uint64_t LLVMRustGetArrayNumElements(LLVMTypeRef Ty) {

rerun tidy with `--extra-checks=cpp:fmt --bless` to reformat C++ code
tidy [extra_checks]: checks with external tool 'clang-format' failed
tidy [extra_checks]: FAIL
tidy: The following check failed: extra_checks
Bootstrap failed while executing `test src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Command `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools-bin/rust-tidy /checkout /checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo /checkout/obj/build 4 /node/bin/npm --extra-checks=py,cpp,js,spellcheck` failed with exit code 1
Created at: src/bootstrap/src/core/build_steps/tool.rs:1549:23
Executed at: src/bootstrap/src/core/build_steps/test.rs:1280:29

Command has failed. Rerun with -v to see more details.
Build completed unsuccessfully in 0:01:18
  local time: Thu Oct 30 22:41:38 UTC 2025
  network time: Thu, 30 Oct 2025 22:41:38 GMT
##[error]Process completed with exit code 1.
##[group]Run echo "disk usage:"

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors bot commented Feb 21, 2026

☔ The latest upstream changes (presumably #152934) made this pull request unmergeable. Please resolve the merge conflicts.

@oxalica
Copy link
Copy Markdown
Contributor

oxalica commented Mar 27, 2026

I'm testing this patch on my random crates including some vtable-heavy ones. It reduces binary size from 1% to 5%, mainly from cutting down dynamic relocations (.rela.dyn).

However, I got some SEGFAULT at runtime due to vtable layout mismatch between const-eval and runtime (as mentioned above). That is,

fn main() {
    const X: &dyn std::fmt::Display = &42i32; // absolute fnptr vtable
    println!("{X}"); // assume it is relative, oops
}

I also got a weird compile error with no further information when compiling rust-analyzer 2026-03-23, not sure if it is also caused by the layout mismatch.

error: failed to parse bitcode for LTO module: Invalid cast (Producer: 'LLVM22.1.0-rust-1.95.0-nightly' Reader: 'LLVM 22.1.0-rust-1.95.0-nightly')

error: could not compile `hir-def` (lib) due to 1 previous error

@PiJoules
Copy link
Copy Markdown
Contributor Author

I'm testing this patch on my random crates including some vtable-heavy ones. It reduces binary size from 1% to 5%, mainly from cutting down dynamic relocations (.rela.dyn).

However, I got some SEGFAULT at runtime due to vtable layout mismatch between const-eval and runtime (as mentioned above). That is,

fn main() {
    const X: &dyn std::fmt::Display = &42i32; // absolute fnptr vtable
    println!("{X}"); // assume it is relative, oops
}

I also got a weird compile error with no further information when compiling rust-analyzer 2026-03-23, not sure if it is also caused by the layout mismatch.

error: failed to parse bitcode for LTO module: Invalid cast (Producer: 'LLVM22.1.0-rust-1.95.0-nightly' Reader: 'LLVM 22.1.0-rust-1.95.0-nightly')

error: could not compile `hir-def` (lib) due to 1 previous error

My colleague Erick has a more up-to-date verison of this at main...erickt:rust:relative-vtables which should include support for building the runtimes and (hopefully) has fixes for the merge conflicts I didn't have time to address here, so you might get more luck trying that out. (Fair warning: some of those updates there were vibe-coded, but they do seem to get rustc and runtime tests passing and we can build a bunch of downstream rust projects with it.) I'll eventually come back and clean this PR up, but we're still trying to collect some numbers on the side.

It could be we missed a few cases though. If there are any runtime assumptions about the vtable ABI, then those will need to be changed as well. Same for const-eval which I'm not sure if I remember tackling in my initial patch.

@erickt
Copy link
Copy Markdown
Contributor

erickt commented Mar 28, 2026

@oxalica - thanks for trying it out! Which crates are you testing it with? As @PiJoules said, we've got this patch passing the Rust test suite, and working with servo, tokio, ripgrep, and chrome, and also showing between 0.25 to 4%-ish savings. I just need to get come performance numbers before resuming talks with the compiler team. I'd be happy to see if I can reproduce the segfaults.

@oxalica
Copy link
Copy Markdown
Contributor

oxalica commented Mar 28, 2026

Thanks both of you for the work!
@erickt

@oxalica - thanks for trying it out! Which crates are you testing it with?

A public one is palc which use &dyn heavily for field dispatching, and this patch gives 3-4% size reduction in benchmarks that only does parsing. I also want to test rust-analyzer due to their extensive usage of &dyn Database but I got the compiler error above.

As @PiJoules said, we've got this patch passing the Rust test suite, and working with servo, tokio, ripgrep, and chrome, and also showing between 0.25 to 4%-ish savings.
I'd be happy to see if I can reproduce the segfaults.

I'm testing this PR rebased onto 99246f4 which is the latest non-conflicting commit. It may be a bit out-of-date. If there are more updates (that fixes merge conflict), it would be good to push into this PR to make testing easier.

The crash happens on reqwest::get("https://[..]").await.unwrap(), during rustls initialization. I minimized it to the code snippet above, which is a mismatch between const-time vtable and runtime dyn method call.

I just need to get come performance numbers before resuming talks with the compiler team.

To me the runtime cost of vtable calls does not matter much. It is already assumed that vtable call would be slow due to the non-inline-able call and branch misprediction, and another <1cycle add instruction is nothing. The main intention of my use case is to reduce code size and the startup cost. Absolute relocations increase the work to be done during dynamic linking before main, and reduce memory share (more data in process-private .data.rel.ro instead of system-wide-shared .rodata). These footprints are not usually measured by performance tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants