Skip to content

Commit f090992

Browse files
Kenoaviatesk
andauthored
Take purity modeling seriously (#43852)
* Implement new effect system * TLDR Before: ``` julia> let b = Expr(:block, (:(y += sin($x)) for x in randn(1000))...) @eval function f_sin_perf() y = 0.0 $b y end end f_sin_perf (generic function with 1 method) julia> @time @code_typed f_sin_perf() 15.707267 seconds (25.95 M allocations: 1.491 GiB, 3.30% gc time) [lots of junk] ``` After: ``` julia> @time @code_typed f_sin_perf() 0.016818 seconds (187.35 k allocations: 7.901 MiB, 99.73% compilation time) CodeInfo( 1 ─ return 27.639138714768546 ) => Float64 ``` so roughly a 1000x improvement in compile time performance for const-prop heavy functions. There are also run time improvements for functions that have patterns like: ``` function some_function_to_big_to_be_inlined_but_pure(x) .... end function foo(x) some_function_to_big_to_be_inlined_but_pure(x) return x end ``` The inliner will now be able to see that some_function_to_big_to_be_inlined_but_pure is effect free, even without inlining it and just delete it, improving runtime performance (if some_function_to_big_to_be_inlined_but_pure is small enough to be inlined, there is a small compile time throughput win, by being able to delete it without inlining, but that's a smaller gain than the compile time gain above). * Motivation / Overview There are two motivations for this work. The first is the above mentioned improvement in compiler performance for const-prop heavy functions. This comes up a fair bit in various Modeling & Simulation codes we have where Julia code is often auto-generated from some combination of parameterized model codes and data. This ends up creating enormous functions with significant need for constant propagation (~50k statements with ~20k constant calls are not uncommon). Our current compiler was designed for people occasionally throwing a `sqrt(2)` or something in a function, not 20k of them, so performance is quite bad. The second motivation is to have finer grained control over our purity modeling. We have `@Base.pure`, but that has somewhat nebulous semantics and is quite a big hammer that is not appropriate in most situations. These may seem like orthogonal concerns at first, but they are not. The compile time issues fundamentally stem from us running constant propagation in inference's abstract interpreter. However, for simple, pure functions, that is entirely unnecessary, because we have a super-fast, JIT compiler version of that function just laying around in general. The issue is that we currently, we generally do not know when it is legal to run the JIT-compiled version of the function and when we need to abstractly interpret it. However, if the compiler were able to figure out an appropriate notion of purity, it could start doing that (which is what it does now for `@Base.pure` functions). This PR adds that kind of notion of purity, converges it along with type information during inference and then makes use of it to speed up evaluation of constant propagation (where it is legal to do so), as well as improving the inliner. * The new purity notions The new purity model consists of four different kinds flags per code instance. For builtins and intrinsics the existing effect free and nothrow models are re-used. There is also a new macro `@Base.assume_effects` available, which can set the purity base case for methods or `:foreigncall`s. Here is the docstring for that macro, which also explains the semantics of the new purity flags: ``` @assume_effects setting... ex @assume_effects(setting..., ex) `@assume_effects` overrides the compiler's effect modeling for the given method. `ex` must be a method definition. WARNING: Improper use of this macro causes undefined behavior (including crashes, incorrect answers, or other hard to track bugs). Use with care an only if absolutely required. In general, each `setting` value makes an assertion about the behavior of the function, without requiring the compiler to prove that this behavior is indeed true. These assertions are made for all world ages. It is thus advisable to limit the use of generic functions that may later be extended to invalidate the assumption (which would cause undefined behavior). The following `settings` are supported. ** `:idempotent` The `:idempotent` setting asserts that for egal inputs: - The manner of termination (return value, exception, non-termination) will always be the same. - If the method returns, the results will always be egal. Note: This in particular implies that the return value of the method must be immutable. Multiple allocations of mutable objects (even with identical contents) are not egal. Note: The idempotency assertion is made world-arge wise. More formally, write fₐ for the evaluation of `f` in world-age `a`, then we require: ∀ a, x, y: x === y → fₐ(x) === fₐ(y) However, for two world ages `a, b` s.t. `a != b`, we may have `fₐ(x) !== fₐ(y)`` Note: A further implication is that idempontent functions may not make their return value dependent on the state of the heap or any other global state that is not constant for a given world age. Note: The idempontency includes all legal rewrites performed by the optimizizer. For example, floating-point fastmath operations are not considered idempotent, because the optimizer may rewrite them causing the output to not be idempotent, even for the same world age (e.g. because one ran in the interpreter, while the other was optimized). ** `:effect_free` The `:effect_free` setting asserts that the method is free of externally semantically visible side effects. The following is an incomplete list of externally semantically visible side effects: - Changing the value of a global variable. - Mutating the heap (e.g. an array or mutable value), except as noted below - Changing the method table (e.g. through calls to eval) - File/Network/etc. I/O - Task switching However, the following are explicitly not semantically visible, even if they may be observable: - Memory allocations (both mutable and immutable) - Elapsed time - Garbage collection - Heap mutations of objects whose lifetime does not exceed the method (i.e. were allocated in the method and do not escape). - The returned value (which is externally visible, but not a side effect) The rule of thumb here is that an externally visible side effect is anything that would affect the execution of the remainder of the program if the function were not executed. Note: The effect free assertion is made both for the method itself and any code that is executed by the method. Keep in mind that the assertion must be valid for all world ages and limit use of this assertion accordingly. ** `:nothrow` The `:nothrow` settings asserts that this method does not terminate abnormally (i.e. will either always return a value or never return). Note: It is permissible for :nothrow annotated methods to make use of exception handling internally as long as the exception is not rethrown out of the method itself. Note: MethodErrors and similar exceptions count as abnormal termination. ** `:terminates_globally` The `:terminates_globally` settings asserts that this method will eventually terminate (either normally or abnormally), i.e. does not infinite loop. Note: The compiler will consider this a strong indication that the method will terminate relatively *quickly* and may (if otherwise legal), call this method at compile time. I.e. it is a bad idea to annotate this setting on a method that *technically*, but not *practically*, terminates. Note: The `terminates_globally` assertion, covers any other methods called by the annotated method. ** `:terminates_locally` The `:terminates_locally` setting is like `:terminates_globally`, except that it only applies to syntactic control flow *within* the annotated method. It is this a much weaker (and thus safer) assertion that allows for the possibility of non-termination if the method calls some other method that does not terminate. Note: `terminates_globally` implies `terminates_locally`. * `:total` The `setting` combines the following other assertions: - `:idempotent` - `:effect_free` - `:nothrow` - `:terminates_globally` and is a convenient shortcut. Note: `@assume_effects :total` is similar to `@Base.pure` with the primary distinction that the idempotency requirement applies world-age wise rather than globally as described above. However, in particular, a method annotated `@Base.pure` is always total. ``` * Changes to data structures - Each CodeInstance gains two sets of four flags corresponding to the notions above (except terminates_locally, which is just a type inference flag). One set of flags tracks IPO-valid information (as determined by inference), the other set of flags tracks optimizer-valid information (as determined after optimization). Otherwise they have identical semantics. - Method and CodeInfo each gain 5 bit flags corresponding 1:1 to the purity notions defined above. No separate distinction is made between IPO valid and optimizer valid flags here. We might in the future want such a distinction, but I'm hoping to get away without it for now, since the IPO-vs-optimizer distinction is a bit subtle and I don't really want to expose that to the user. - `:foreigncall` gains an extra argument (after `cconv`) to describe the effects of the call. * Algorithm Relatively straightforward. - Every call or builtin accumulates its effect information into the current frame. - Finding an effect (throw/global side effect/non-idempotenct, etc.) taints the entire frame. Idempotency is technically a dataflow property, but that is not modeled here and any non-idempotent intrinsic will taint the idempotency flag, even if it does not contribute to the return value. I don't think that's a huge problem in practice, because currently we only use idempotency if effect-free is also set and in effect-free functions you'd generally expect every statement to contribute to the return value. - Any backedge taints the termination effect, as does any recursion - Unknown statements (generic calls, things I haven't gotten around to) taint all effects * Make INV_2PI a tuple Without this, the compiler cannot assume that the range reduction is idempotent to make use of the new fast constprop code path. In the future this could potentially be an ImmutableArray, but since this is relatively small, a tuple is probably fine. * Evalute :total function in the proper world * Finish effects implementation for ccall * Add missing `esc` * Actually make use of terminates_locally override * Mark ^(x::Float64, n::Integer) as locally terminating * Shove effects into calling convention field * Make inbounds taint consistency Inbounds and `--check-bounds=no` basically make the assertion: If this is dynamically reached during exceution then the index will be inbounds. However, effects on a function are a stronger statement. In particular, for *any* input values (not just the dynamically reached ones), the effects need to hold. This is in particular true, because inference can run functions that are dynamically dead, e.g. ``` if unknown_bool_return() # false at runtime, but inference doesn't know x = sin(1.0) end ``` Inference will (and we want it to) run the `sin(1.0)` even though it is not dynamically reached. For the moment, make any use of `--check-bounds=no` or `@inbounds` taint the consistency effect, which is semantically meaningful and prevents inference from running the function. In the future, we may want more precise tracking of inbounds that would let us recover some precision here. * Allow constprop to refine effects * Properly taint unknown call in apply * Add NEWS and doc anchor * Correct effect modeling for arraysize * Address Shuhei's review * Fix regression on inference time benchmark The issue wasn't actually the changes here, they just added additional error paths which bridged inference into the Base printing code, which as usual takes a fairly long time to infer. Add some judicious barriers and nospecialize statements to bring inference time back down. * refine docstrings of `@assume_effects` This commit tries to render the docstring of `@assume_effects` within Documenter.jl-generated HTML: - render bullet points - codify the names of settings - use math syntax - use note admonitions * improve effect analysis on allocation Improves `:nothrow` assertion for mutable allocations. Also adds missing `IR_FLAG_EFFECT_FREE` flagging for non-inlined callees in `handle_single_case!` so that we can do more dead code elimination. * address some reviews * Address Jameson's review feedback * Fix tests - address rebase issues Co-authored-by: Shuhei Kadowaki <aviatesk@gmail.com>
1 parent 76fa182 commit f090992

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1143
-186
lines changed

NEWS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ Compiler/Runtime improvements
5757
* Abstract callsite can now be inlined or statically resolved as far as the callsite has a single
5858
matching method ([#43113]).
5959
* Builtin function are now a bit more like generic functions, and can be enumerated with `methods` ([#43865]).
60+
* Inference now tracks various effects such as sideeffectful-ness and nothrow-ness on a per-specialization basis. Code heavily dependent on constant propagation should see significant compile-time performance improvements and certain cases (e.g. calls to uninlinable functions that are nevertheless effect free) should see runtime performance improvements. Effects may be overwritten manually with the `@Base.assume_effects` macro. (#43852).
6061

6162
Command-line option changes
6263
---------------------------

base/abstractarray.jl

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1238,10 +1238,16 @@ function unsafe_getindex(A::AbstractArray, I...)
12381238
r
12391239
end
12401240

1241+
struct CanonicalIndexError
1242+
func::String
1243+
type::Any
1244+
CanonicalIndexError(func::String, @nospecialize(type)) = new(func, type)
1245+
end
1246+
12411247
error_if_canonical_getindex(::IndexLinear, A::AbstractArray, ::Int) =
1242-
error("getindex not defined for ", typeof(A))
1248+
throw(CanonicalIndexError("getindex", typeof(A)))
12431249
error_if_canonical_getindex(::IndexCartesian, A::AbstractArray{T,N}, ::Vararg{Int,N}) where {T,N} =
1244-
error("getindex not defined for ", typeof(A))
1250+
throw(CanonicalIndexError("getindex", typeof(A)))
12451251
error_if_canonical_getindex(::IndexStyle, ::AbstractArray, ::Any...) = nothing
12461252

12471253
## Internal definitions
@@ -1333,9 +1339,9 @@ function unsafe_setindex!(A::AbstractArray, v, I...)
13331339
end
13341340

13351341
error_if_canonical_setindex(::IndexLinear, A::AbstractArray, ::Int) =
1336-
error("setindex! not defined for ", typeof(A))
1342+
throw(CanonicalIndexError("setindex!", typeof(A)))
13371343
error_if_canonical_setindex(::IndexCartesian, A::AbstractArray{T,N}, ::Vararg{Int,N}) where {T,N} =
1338-
error("setindex! not defined for ", typeof(A))
1344+
throw(CanonicalIndexError("setindex!", typeof(A)))
13391345
error_if_canonical_setindex(::IndexStyle, ::AbstractArray, ::Any...) = nothing
13401346

13411347
## Internal definitions

base/array.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ function bitsunionsize(u::Union)
213213
end
214214

215215
length(a::Array) = arraylen(a)
216-
elsize(::Type{<:Array{T}}) where {T} = aligned_sizeof(T)
216+
elsize(@nospecialize _::Type{A}) where {T,A<:Array{T}} = aligned_sizeof(T)
217217
sizeof(a::Array) = Core.sizeof(a)
218218

219219
function isassigned(a::Array, i::Int...)

base/boot.jl

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -418,9 +418,10 @@ eval(Core, :(LineInfoNode(mod::Module, @nospecialize(method), file::Symbol, line
418418
$(Expr(:new, :LineInfoNode, :mod, :method, :file, :line, :inlined_at))))
419419
eval(Core, :(CodeInstance(mi::MethodInstance, @nospecialize(rettype), @nospecialize(inferred_const),
420420
@nospecialize(inferred), const_flags::Int32,
421-
min_world::UInt, max_world::UInt, relocatability::UInt8) =
422-
ccall(:jl_new_codeinst, Ref{CodeInstance}, (Any, Any, Any, Any, Int32, UInt, UInt, UInt8),
423-
mi, rettype, inferred_const, inferred, const_flags, min_world, max_world, relocatability)))
421+
min_world::UInt, max_world::UInt, ipo_effects::UInt8, effects::UInt8,
422+
relocatability::UInt8) =
423+
ccall(:jl_new_codeinst, Ref{CodeInstance}, (Any, Any, Any, Any, Int32, UInt, UInt, UInt8, UInt8, UInt8),
424+
mi, rettype, inferred_const, inferred, const_flags, min_world, max_world, ipo_effects, effects, relocatability)))
424425
eval(Core, :(Const(@nospecialize(v)) = $(Expr(:new, :Const, :v))))
425426
eval(Core, :(PartialStruct(@nospecialize(typ), fields::Array{Any, 1}) = $(Expr(:new, :PartialStruct, :typ, :fields))))
426427
eval(Core, :(PartialOpaque(@nospecialize(typ), @nospecialize(env), isva::Bool, parent::MethodInstance, source::Method) = $(Expr(:new, :PartialOpaque, :typ, :env, :isva, :parent, :source))))

base/c.jl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -733,3 +733,7 @@ name, if desired `"libglib-2.0".g_uri_escape_string(...`
733733
macro ccall(expr)
734734
return ccall_macro_lower(:ccall, ccall_macro_parse(expr)...)
735735
end
736+
737+
macro ccall_effects(effects, expr)
738+
return ccall_macro_lower((:ccall, effects), ccall_macro_parse(expr)...)
739+
end

base/compiler/abstractinterpretation.jl

Lines changed: 220 additions & 38 deletions
Large diffs are not rendered by default.

base/compiler/inferencestate.jl

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ mutable struct InferenceState
5959
inferred::Bool
6060
dont_work_on_me::Bool
6161

62+
# Inferred purity flags
63+
ipo_effects::Effects
64+
6265
# The place to look up methods while working on this function.
6366
# In particular, we cache method lookup results for the same function to
6467
# fast path repeated queries.
@@ -113,6 +116,16 @@ mutable struct InferenceState
113116
valid_worlds = WorldRange(src.min_world,
114117
src.max_world == typemax(UInt) ? get_world_counter() : src.max_world)
115118

119+
# TODO: Currently, any :inbounds declaration taints consistency,
120+
# because we cannot be guaranteed whether or not boundschecks
121+
# will be eliminated and if they are, we cannot be guaranteed
122+
# that no undefined behavior will occur (the effects assumptions
123+
# are stronger than the inbounds assumptions, since the latter
124+
# requires dynamic reachability, while the former is global).
125+
inbounds = inbounds_option()
126+
inbounds_taints_consistency = !(inbounds === :on || (inbounds === :default && !any_inbounds(code)))
127+
consistent = inbounds_taints_consistency ? TRISTATE_UNKNOWN : ALWAYS_TRUE
128+
116129
@assert cache === :no || cache === :local || cache === :global
117130
frame = new(
118131
params, result, linfo,
@@ -126,13 +139,26 @@ mutable struct InferenceState
126139
Vector{InferenceState}(), # callers_in_cycle
127140
#=parent=#nothing,
128141
cache === :global, false, false,
142+
Effects(consistent, ALWAYS_TRUE, ALWAYS_TRUE, ALWAYS_TRUE,
143+
inbounds_taints_consistency),
129144
CachedMethodTable(method_table(interp)),
130145
interp)
131146
result.result = frame
132147
cache !== :no && push!(get_inference_cache(interp), result)
133148
return frame
134149
end
135150
end
151+
Effects(state::InferenceState) = state.ipo_effects
152+
153+
function any_inbounds(code::Vector{Any})
154+
for i=1:length(code)
155+
stmt = code[i]
156+
if isa(stmt, Expr) && stmt.head === :inbounds
157+
return true
158+
end
159+
end
160+
return false
161+
end
136162

137163
function compute_trycatch(code::Vector{Any}, ip::BitSet)
138164
# The goal initially is to record the frame like this for the state at exit:

base/compiler/optimize.jl

Lines changed: 9 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -149,16 +149,6 @@ const IR_FLAG_THROW_BLOCK = 0x01 << 3
149149
# thus be both pure and effect free.
150150
const IR_FLAG_EFFECT_FREE = 0x01 << 4
151151

152-
# known to be always effect-free (in particular nothrow)
153-
const _PURE_BUILTINS = Any[tuple, svec, ===, typeof, nfields]
154-
155-
# known to be effect-free if the are nothrow
156-
const _PURE_OR_ERROR_BUILTINS = [
157-
fieldtype, apply_type, isa, UnionAll,
158-
getfield, arrayref, const_arrayref, arraysize, isdefined, Core.sizeof,
159-
Core.kwfunc, Core.ifelse, Core._typevar, (<:),
160-
]
161-
162152
const TOP_TUPLE = GlobalRef(Core, :tuple)
163153

164154
#########
@@ -225,7 +215,7 @@ function stmt_effect_free(@nospecialize(stmt), @nospecialize(rt), src::Union{IRC
225215
M, s = argextype(args[2], src), argextype(args[3], src)
226216
return get_binding_type_effect_free(M, s)
227217
end
228-
contains_is(_PURE_OR_ERROR_BUILTINS, f) || return false
218+
contains_is(_EFFECT_FREE_BUILTINS, f) || return false
229219
rt === Bottom && return false
230220
return _builtin_nothrow(f, Any[argextype(args[i], src) for i = 2:length(args)], rt)
231221
elseif head === :new
@@ -297,12 +287,14 @@ function alloc_array_ndims(name::Symbol)
297287
return nothing
298288
end
299289

290+
const FOREIGNCALL_ARG_START = 6
291+
300292
function alloc_array_no_throw(args::Vector{Any}, ndims::Int, src::Union{IRCode,IncrementalCompact})
301-
length(args) ndims+6 || return false
302-
atype = instanceof_tfunc(argextype(args[6], src))[1]
293+
length(args) ndims+FOREIGNCALL_ARG_START || return false
294+
atype = instanceof_tfunc(argextype(args[FOREIGNCALL_ARG_START], src))[1]
303295
dims = Csize_t[]
304296
for i in 1:ndims
305-
dim = argextype(args[i+6], src)
297+
dim = argextype(args[i+FOREIGNCALL_ARG_START], src)
306298
isa(dim, Const) || return false
307299
dimval = dim.val
308300
isa(dimval, Int) || return false
@@ -312,9 +304,9 @@ function alloc_array_no_throw(args::Vector{Any}, ndims::Int, src::Union{IRCode,I
312304
end
313305

314306
function new_array_no_throw(args::Vector{Any}, src::Union{IRCode,IncrementalCompact})
315-
length(args) 7 || return false
316-
atype = instanceof_tfunc(argextype(args[6], src))[1]
317-
dims = argextype(args[7], src)
307+
length(args) FOREIGNCALL_ARG_START+1 || return false
308+
atype = instanceof_tfunc(argextype(args[FOREIGNCALL_ARG_START], src))[1]
309+
dims = argextype(args[FOREIGNCALL_ARG_START+1], src)
318310
isa(dims, Const) || return dims === Tuple{}
319311
dimsval = dims.val
320312
isa(dimsval, Tuple{Vararg{Int}}) || return false
@@ -621,21 +613,6 @@ function slot2reg(ir::IRCode, ci::CodeInfo, sv::OptimizationState)
621613
return ir
622614
end
623615

624-
# whether `f` is pure for inference
625-
function is_pure_intrinsic_infer(f::IntrinsicFunction)
626-
return !(f === Intrinsics.pointerref || # this one is volatile
627-
f === Intrinsics.pointerset || # this one is never effect-free
628-
f === Intrinsics.llvmcall || # this one is never effect-free
629-
f === Intrinsics.arraylen || # this one is volatile
630-
f === Intrinsics.sqrt_llvm_fast || # this one may differ at runtime (by a few ulps)
631-
f === Intrinsics.have_fma || # this one depends on the runtime environment
632-
f === Intrinsics.cglobal) # cglobal lookup answer changes at runtime
633-
end
634-
635-
# whether `f` is effect free if nothrow
636-
intrinsic_effect_free_if_nothrow(f) = f === Intrinsics.pointerref ||
637-
f === Intrinsics.have_fma || is_pure_intrinsic_infer(f)
638-
639616
## Computing the cost of a function body
640617

641618
# saturating sum (inputs are nonnegative), prevents overflow with typemax(Int) below

0 commit comments

Comments
 (0)