-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Apply RemoveNoopLandingPads post-monomorphization #143208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Apply RemoveNoopLandingPads post-monomorphization #143208
Conversation
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Remove no-op cleanups as post-mono MIR opt On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes). Opening to assess performance.
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
e0423d5
to
676eed3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Remove no-op cleanups as post-mono MIR opt On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes). Opening to assess performance.
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e3a4b05): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -2.6%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.4%, secondary -2.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.6%, secondary -0.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 461.485s -> 461.37s (-0.02%) |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
@bors r+ |
Is there an opportunity to reuse code from |
Given that comments and variable names don't line up, seems better to |
676eed3
to
c04a255
Compare
This comment was marked as resolved.
This comment was marked as resolved.
@bors2 try @rust-timer queue This is essentially a re-write with @cjgillot's excellent suggestion to reuse the analysis in RemoveNoopLandingPads. Also adjusted some of the naming per @RalfJung's comments -- I'm not perfectly happy with the names, some of this is sort of quasi-true (e.g., reachable is only accurate if you apply the right filters while you traverse the MIR, since we currently can't edit it in-place). |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Remove no-op cleanups as post-mono MIR opt On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes). Opening to assess performance.
This comment has been minimized.
This comment has been minimized.
This speeds up LLVM and improves codegen overall. As an example, for cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes).
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
c04a255
to
ec26dde
Compare
Finished benchmarking commit (d12ed3f): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -2.0%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.6%, secondary -4.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.7%, secondary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 461.735s -> 459.425s (-0.50%) |
Awesome wins! I wonder why it only works for |
It's not clear to me. The pre-LLVM IR we emit is just as improved in either opt or debug builds, so my best guess is that LLVM's optimization passes (SimplifyCFG, if I'm reading the code right?) are able to clean up the IR we generate quickly enough that it doesn't make an impact, and I guess in debug mode those either run later or don't run at all. I think SimplifyCFG doesn't run at all in debug:
I won't copy paste the extremely big opt-level=3 pipeline, but it has simplify-cfg pretty early:
So my guess is in opt builds LLVM is partially able to clean this up early on and as a result we don't have much benefit from doing this there. But also no loss, so I'm inclined to keep it, it doesn't hurt based on our benchmarks and there are a few opt binary size wins (https://perf.rust-lang.org/compare.html?start=6677875279b560442a07a08d5119b4cd6b3c5593&end=d12ed3fc0cb645af2b945d13048aba82f574ee91&stat=size%3Alinked_artifact&debug=false). |
On cargo this cuts ~5% of the LLVM IR lines we generate (measured with -Cno-prepopulate-passes).