Apply RemoveNoopLandingPads post-monomorphization #143208

Mark-Simulacrum · 2025-06-29T17:22:48Z

On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes).

Mark-Simulacrum · 2025-06-29T17:23:25Z

@bors2 try @rust-timer queue

rust-bors · 2025-06-29T17:23:28Z

⌛ Trying commit e0423d5 with merge fb9de75…

To cancel the try build, run the command @bors2 try cancel.

Remove no-op cleanups as post-mono MIR opt On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes). Opening to assess performance.

rust-timer · 2025-07-01T15:56:56Z

Finished benchmarking commit (e3a4b05): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.4%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.0%	[-4.6%, -0.2%]	22
Improvements ✅ (secondary)	-2.8%	[-18.0%, -0.8%]	17
All ❌✅ (primary)	-1.9%	[-4.6%, 0.4%]	23

Max RSS (memory usage)

Results (primary -2.6%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.6%	[-3.8%, -1.2%]	6
Improvements ✅ (secondary)	-1.3%	[-2.8%, -0.5%]	3
All ❌✅ (primary)	-2.6%	[-3.8%, -1.2%]	6

Cycles

Results (primary -2.4%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	1.1%	[1.1%, 1.1%]	1
Regressions ❌ (secondary)	3.2%	[0.8%, 7.1%]	4
Improvements ✅ (primary)	-2.6%	[-4.3%, -1.2%]	22
Improvements ✅ (secondary)	-5.0%	[-20.0%, -0.5%]	11
All ❌✅ (primary)	-2.4%	[-4.3%, 1.1%]	23

Binary size

Results (primary -0.6%, secondary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-1.4%, -0.0%]	66
Improvements ✅ (secondary)	-0.8%	[-3.6%, -0.0%]	30
All ❌✅ (primary)	-0.6%	[-1.4%, -0.0%]	66

Bootstrap: 461.485s -> 461.37s (-0.02%)
Artifact size: 372.20 MiB -> 371.78 MiB (-0.11%)

rust-timer · 2025-07-03T20:22:17Z

Finished benchmarking commit (d12ed3f): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.1%	[-5.1%, -0.2%]	28
Improvements ✅ (secondary)	-3.2%	[-18.1%, -0.4%]	19
All ❌✅ (primary)	-2.1%	[-5.1%, -0.2%]	28

Max RSS (memory usage)

Results (primary -2.0%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.4%	[2.6%, 4.2%]	2
Regressions ❌ (secondary)	2.4%	[0.7%, 4.1%]	2
Improvements ✅ (primary)	-4.2%	[-5.6%, -3.4%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.0%	[-5.6%, 4.2%]	7

Cycles

Results (primary -2.6%, secondary -4.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.0%	[2.1%, 3.8%]	2
Regressions ❌ (secondary)	0.7%	[0.7%, 0.7%]	1
Improvements ✅ (primary)	-3.2%	[-5.6%, -1.5%]	18
Improvements ✅ (secondary)	-5.2%	[-19.9%, -2.0%]	14
All ❌✅ (primary)	-2.6%	[-5.6%, 3.8%]	20

Binary size

Results (primary -0.7%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-1.4%, -0.0%]	67
Improvements ✅ (secondary)	-1.1%	[-3.7%, -0.0%]	30
All ❌✅ (primary)	-0.7%	[-1.4%, -0.0%]	67

Bootstrap: 461.735s -> 459.425s (-0.50%)
Artifact size: 372.23 MiB -> 372.22 MiB (-0.00%)

Kobzol · 2025-07-03T20:33:49Z

Awesome wins! I wonder why it only works for debug - is it some interaction with -Zshare-generics?

Mark-Simulacrum · 2025-07-04T15:48:05Z

It's not clear to me. The pre-LLVM IR we emit is just as improved in either opt or debug builds, so my best guess is that LLVM's optimization passes (SimplifyCFG, if I'm reading the code right?) are able to clean up the IR we generate quickly enough that it doesn't make an impact, and I guess in debug mode those either run later or don't run at all.

I think SimplifyCFG doesn't run at all in debug:

$ rustc +nightly t.rs -Zautodiff=PrintPasses
function(ee-instrument<>),always-inline,coro-cond(coro-early,cgscc(coro-split),coro-cleanup,globaldce),function(annotation-remarks),canonicalize-aliases,name-anon-globals

I won't copy paste the extremely big opt-level=3 pipeline, but it has simplify-cfg pretty early:

annotation2metadata,forceattrs,inferattrs,coro-early,function<eager-inv>(ee-instrument<>,lower-expect,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>, ...

So my guess is in opt builds LLVM is partially able to clean this up early on and as a result we don't have much benefit from doing this there. But also no loss, so I'm inclined to keep it, it doesn't hurt based on our benchmarks and there are a few opt binary size wins (https://perf.rust-lang.org/compare.html?start=6677875279b560442a07a08d5119b4cd6b3c5593&end=d12ed3fc0cb645af2b945d13048aba82f574ee91&stat=size%3Alinked_artifact&debug=false).

klensy · 2025-07-04T16:19:06Z

rustc +nightly t.rs -Zautodiff=PrintPasses

Weird, why this works

    -Z                                      autodiff=val -- a list of autodiff flags to enable
        Mandatory setting:
        `=Enable`

rust/compiler/rustc_codegen_llvm/src/builder/autodiff.rs

Lines 473 to 480 in 556d20a

    
           // First of all, did the user try to use autodiff without using the -Zautodiff=Enable flag? 
        
           if !diff_items.is_empty() 
        
               && !cgcx.opts.unstable_opts.autodiff.contains(&rustc_session::config::AutoDiff::Enable) 
        
           { 
        
               return Err(diag_handler.handle().emit_almost_fatal(AutoDiffWithoutEnable)); 
        
           }

Kobzol · 2025-07-04T16:45:58Z

It's not clear to me. The pre-LLVM IR we emit is just as improved in either opt or debug builds, so my best guess is that LLVM's optimization passes (SimplifyCFG, if I'm reading the code right?) are able to clean up the IR we generate quickly enough that it doesn't make an impact, and I guess in debug mode those either run later or don't run at all.

I think SimplifyCFG doesn't run at all in debug:
$ rustc +nightly t.rs -Zautodiff=PrintPasses
function(ee-instrument<>),always-inline,coro-cond(coro-early,cgscc(coro-split),coro-cleanup,globaldce),function(annotation-remarks),canonicalize-aliases,name-anon-globals
I won't copy paste the extremely big opt-level=3 pipeline, but it has simplify-cfg pretty early:
annotation2metadata,forceattrs,inferattrs,coro-early,function<eager-inv>(ee-instrument<>,lower-expect,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>, ...
So my guess is in opt builds LLVM is partially able to clean this up early on and as a result we don't have much benefit from doing this there. But also no loss, so I'm inclined to keep it, it doesn't hurt based on our benchmarks and there are a few opt binary size wins (https://perf.rust-lang.org/compare.html?start=6677875279b560442a07a08d5119b4cd6b3c5593&end=d12ed3fc0cb645af2b945d13048aba82f574ee91&stat=size%3Alinked_artifact&debug=false).

Maybe we could try what happens if we run SimplifyCFG in debug?

Mark-Simulacrum · 2025-07-04T18:05:05Z

I think that could make sense but I'd see it as potentially separate -- this change feels non-invasive to me (e.g., not hurting debuginfo quality) whereas SimplifyCFG seems like it could do more than that. I'm also not sure how much we want to get into changing the LLVM pass infrastructure...

oli-obk · 2025-07-07T12:54:33Z

@bors r+

bors · 2025-07-07T12:54:35Z

📌 Commit ec26dde has been approved by oli-obk

It is now in the queue for this repository.

bors · 2025-07-07T23:38:32Z

⌛ Testing commit ec26dde with merge 56e872c...

Apply RemoveNoopLandingPads post-monomorphization On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes).

rust-log-analyzer · 2025-07-08T00:08:46Z

The job aarch64-apple failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

failures:

---- [codegen] tests/codegen/mem-replace-big-type.rs stdout ----

error: verification with 'FileCheck' failed
status: exit status: 1
command: "/Users/runner/work/rust/rust/build/aarch64-apple-darwin/ci-llvm/bin/FileCheck" "--input-file" "/Users/runner/work/rust/rust/build/aarch64-apple-darwin/test/codegen/mem-replace-big-type/mem-replace-big-type.ll" "/Users/runner/work/rust/rust/tests/codegen/mem-replace-big-type.rs" "--check-prefix=CHECK" "--allow-unused-prefixes" "--dump-input-context" "100"
stdout: none
--- stderr -------------------------------
/Users/runner/work/rust/rust/tests/codegen/mem-replace-big-type.rs:25:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: call void @llvm.memcpy
              ^
/Users/runner/work/rust/rust/build/aarch64-apple-darwin/test/codegen/mem-replace-big-type/mem-replace-big-type.ll:10:2: note: found here
 call void @llvm.memcpy.p0.p0.i64(ptr align 8 %_0, ptr align 8 %dest, i64 56, i1 false)
 ^~~~~~~~~~~~~~~~~~~~~~
/Users/runner/work/rust/rust/tests/codegen/mem-replace-big-type.rs:29:16: error: CHECK-SAME: expected string not found in input
// CHECK-SAME: sret([56 x i8]){{.+}}[[RESULT:%.+]], ptr{{.+}}%dest, ptr{{.+}}%src)
               ^
/Users/runner/work/rust/rust/build/aarch64-apple-darwin/test/codegen/mem-replace-big-type/mem-replace-big-type.ll:17:76: note: scanning from here
define void @_ZN20mem_replace_big_type11replace_big17h3d93b760ccc44c91E(ptr dead_on_unwind noalias nocapture noundef writable sret([56 x i8]) align 8 dereferenceable(56) %_0, ptr noalias noundef align 8 dereferenceable(56) %dst, ptr noalias nocapture noundef align 8 dereferenceable(56) %src) unnamed_addr #1 {
                                                                           ^
/Users/runner/work/rust/rust/build/aarch64-apple-darwin/test/codegen/mem-replace-big-type/mem-replace-big-type.ll:17:127: note: possible intended match here
define void @_ZN20mem_replace_big_type11replace_big17h3d93b760ccc44c91E(ptr dead_on_unwind noalias nocapture noundef writable sret([56 x i8]) align 8 dereferenceable(56) %_0, ptr noalias noundef align 8 dereferenceable(56) %dst, ptr noalias nocapture noundef align 8 dereferenceable(56) %src) unnamed_addr #1 {
                                                                                                                              ^

Input file: /Users/runner/work/rust/rust/build/aarch64-apple-darwin/test/codegen/mem-replace-big-type/mem-replace-big-type.ll
Check file: /Users/runner/work/rust/rust/tests/codegen/mem-replace-big-type.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = 'mem_replace_big_type.917c66af1ea2490d-cgu.0' 
           2: source_filename = "mem_replace_big_type.917c66af1ea2490d-cgu.0" 
           3: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" 
           4: target triple = "arm64-apple-macosx11.0.0" 
           5:  
           6: ; core::mem::replace 
           7: ; Function Attrs: inlinehint uwtable 
           8: define internal void @_ZN4core3mem7replace17hbe6bc792fb1e0158E(ptr dead_on_unwind noalias nocapture noundef writable sret([56 x i8]) align 8 dereferenceable(56) %_0, ptr noalias noundef align 8 dereferenceable(56) %dest, ptr noalias nocapture noundef align 8 dereferenceable(56) %src) unnamed_addr #0 { 
           9: start: 
          10:  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %_0, ptr align 8 %dest, i64 56, i1 false) 
not:25         !~~~~~~~~~~~~~~~~~~~~~                                                                  error: no match expected
          11:  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dest, ptr align 8 %src, i64 56, i1 false) 
          12:  ret void 
          13: } 
          14:  
          15: ; mem_replace_big_type::replace_big 
          16: ; Function Attrs: uwtable 
          17: define void @_ZN20mem_replace_big_type11replace_big17h3d93b760ccc44c91E(ptr dead_on_unwind noalias nocapture noundef writable sret([56 x i8]) align 8 dereferenceable(56) %_0, ptr noalias noundef align 8 dereferenceable(56) %dst, ptr noalias nocapture noundef align 8 dereferenceable(56) %src) unnamed_addr #1 { 
same:29'0                                                                                X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
same:29'1                                                                                                                                   ?                                                                                                                                                                                         possible intended match
          18: start: 
same:29'0     ~~~~~~~
          19: ; call core::mem::replace 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
          20:  call void @_ZN4core3mem7replace17hbe6bc792fb1e0158E(ptr noalias nocapture noundef sret([56 x i8]) align 8 dereferenceable(56) %_0, ptr noalias noundef align 8 dereferenceable(56) %dst, ptr noalias nocapture noundef align 8 dereferenceable(56) %src) 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          21:  ret void 
same:29'0     ~~~~~~~~~~
          22: } 
same:29'0     ~~
          23:  
same:29'0     ~
          24: ; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite) 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          25: declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #2 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          26:  
same:29'0     ~
          27: attributes #0 = { inlinehint uwtable "frame-pointer"="non-leaf" "probe-stack"="inline-asm" "target-cpu"="apple-m1" } 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          28: attributes #1 = { uwtable "frame-pointer"="non-leaf" "probe-stack"="inline-asm" "target-cpu"="apple-m1" } 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          29: attributes #2 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) } 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          30:  
same:29'0     ~
          31: !llvm.module.flags = !{!0} 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
          32: !llvm.ident = !{!1} 
same:29'0     ~~~~~~~~~~~~~~~~~~~~
          33:  
same:29'0     ~
          34: !0 = !{i32 8, !"PIC Level", i32 2} 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35: !1 = !{!"rustc version 1.90.0-nightly (56e872c04 2025-07-07)"} 
same:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
------------------------------------------

bors · 2025-07-08T00:10:06Z

💔 Test failed - checks-actions

… r=oli-obk Apply RemoveNoopLandingPads post-monomorphization On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes).

oli-obk · 2025-07-15T08:24:35Z

@bors r-

jdonszelmann · 2025-07-15T08:24:41Z

@bors r- seems to have failed ci already, unsure why r+

bors · 2025-07-22T22:54:39Z

☔ The latest upstream changes (presumably #144249) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 29, 2025