diff --git a/reports/Overload-Resolution.md b/reports/Overload-Resolution.md new file mode 100644 index 00000000..7857b986 --- /dev/null +++ b/reports/Overload-Resolution.md @@ -0,0 +1,252 @@ +# Overload Resolution + +HLSL supports argument dependent lookup of function overload candidates, and an +algorithm to determine the best match from a set of possible candidates. + +## DXC's Approach + +The algorithm used in DXC is different for resolving user-defined functions than +for HLSL standard library functions which are compiler intrinsics rather than +true functions. + +For true-functions, when processing a call expression, DXC generates a list of +all functions with the specified name. Any functions with an incorrect number of +parameters for the provided argument list are discarded as invalid. For each +remaining function, a conversion sequence is generated from each argument to its +corresponding parameter. The conversion sequences are scored, and the sum of all +conversion sequences for a given overload are compared against the sums of the +conversion sequences for other overloads. If one overload has the lowest score, +it is the best match. + +The scoring based solution works based on assigning certain conversions as being +an order of magnitude worse than other conversions. For example, the score for a +float to integer conversion is orders of magnitude greater than the score for a +float to float-vector extension, so an overload candidate with two +float to float-vector extension conversion sequences will be chosen over an +overload candidate with one float to integer conversion. + +```hlsl +bool fn(int, float) { + return true; +} + +bool fn(float2, float2) { + return false; +} + +export bool call() { + float F = 1; + return fn(F, F); // resolves to fn(float2,float2) +} +``` +[Compiler Explorer](https://godbolt.org/z/KnKePeqob) + +The second part of DXC's approach is how DXC selects truncated overload sets for +standard library functions. DXC processes overload candidates for intrisncs +outside the overload sets, and only puts one matching overload candidate in +the set (note the [function returning after encountering a +match](https://github.com/microsoft/DirectXShaderCompiler/blob/main/tools/clang/lib/Sema/SemaHLSL.cpp#L4828)). +This prevents ambiguous cases, where ambiguity would exist in standalone +functions. In the example below, `WaveActiveMax` is a standard library function +with both `float` and `int` overloads (among others). Both should be valid in +the overload set, however only the `float` overload is added to the set. + +```hlsl +export bool callW() { + bool B = true; + return WaveActiveMax(B); // resolves float. +} + +bool fn(int) { + return true; +} + +bool fn(float) { + return false; +} + +export bool call() { + bool B = true; + return fn(B); // Ambiguous +} +``` +[Compiler Explorer](https://godbolt.org/z/ccTM63GGs) + +### Problems With DXC's Approach + +First and foremost the inconsistency between standard library functions and +user-defined functions is a source of confusion and bugs. + +DXC's custom overload resolution also does not handle cases that can arise in +HLSL like constant implicit objects in member functions, non-member overloaded +operators, and user-defined conversion functions. + +Additionally, an algorithm that depends on scores is subject to computing +limitations like numeric precision and overflow. As such the algorithm cannot +solve some case when a large number of parameters are involved. + +## Specified Behavior: Clang Approach + +In the draft language specification and in Clang's implementation, HLSL adopts +C++'s best match algorithm and candidate selection algorithms. HLSL extends the +set of conversion ranks defined by C++ to included vector and matrix dimension +and element conversions. The conversions are ranked in accordance with how DXC +assigned scores. With this approach for a given overload set, any overload that +Clang resolves will match the overload that DXC resolves. + +Clang will not always resolve an overload in cases that DXC would, this is true +of user-defined functions, but even more true when the difference in standard +library function resolution is taken into account. + +### Problems with the Specified Behavior + +The specified behavior reports ambiguity in more cases than DXC. While in some +cases this is a benefit, and adapting existing code is generally simple, it +may impose a significant burden for some adopters. + +## Experiment! + +A clear impediment to decision making is insufficient data about the impact of +these changes on user code. Compiling proprietary HLSL sources give us +visibility into how these decisions may impact users. + +The codebase used to test is comprised of 188 base shaders which are +compiled to 4757 unique shader variants making heavy use of the C preprocessor +to differentiate shader variations. The shaders contain cumulatively over 400000 +lines of HLSL. Important caveat: these sources may not be representative of a +majority of HLSL users as these shaders do not come from video games. + +Migration of these shaders to build with Clang with updates to both the shader +source and Clang's defined HLSL intrinsics header took about half a day of +concerted effort. Expectation is that would be reduced for software developers +due to mitigations implemented in Clang, although the degree of reduction +depends on the mitigation strategy. + +This effort identified three classes of overload failures: + +* Missing overloads in Clang that DXC supports. +* Overloads that DXC implicitly resolves to reasonable results. +* Overloads that DXC implicitly resolves to potentially bad results. + +### Missing Overloads + +An example of a missing overload is the unsigned integer overloads for `abs`. +The [official +docs](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-abs) +define support for `abs` overloads for `float` and `int`, however in practice +DXC supports valid lowering for all 16, 32 and 64 bit arithmetic types signed +and unsigned. The unsigned integer case is a no-op. + +```hlsl +export uint call(uint I) { + return abs(I); // no-op +} +``` +[Compiler Explorer](https://godbolt.org/z/MsrEceM4E ) + +### Implicitly resolved; reasonable results + +In DXC, math operations which are only supported for floating point types where +DXC will resolve the type as `float` when integer types are provided. Some +examples of such operations are `rcp`, `exp`, `pow` and `sqrt`. None of these +operations really make sense on integers, so it is unlikely that the user +expected integer lower. + +One important note here is that these operations produce implicit conversions, +which DXC may not warn about: + +```hlsl +export float call(int I) { + return sqrt(I); // calls sqrt(float) +} +``` +[Compiler Explorer](https://godbolt.org/z/cf8W7455s) + +Another good example of reasonable results are cases like `max`, `min`, +`select`, and `clamp` when applied to mixes of vector and scalar operands: + +```hlsl +export int4 call(int4 I, int Min, int Max) { + return clamp(I, Min, Max); // calls clamp(int4, int4, int4) +} +``` +[Compiler Explorer](https://godbolt.org/z/9jTe1Pq4x) + +### Implicitly resolved; potentially bad results + +Some math operations where DXC matches intrinsics come with significant +performance impacts. For example consider the `lerp` intrinsic which only has +floating point implementations. This case also emits no user diagnostics, and +produces _a lot_ of implicit conversions: + +```hlsl +export float call(int I, int J, int K) { + return lerp(I, J, K); // calls lerp(float, float, float) +} +``` +[Compiler Explorer](https://godbolt.org/z/aWET9WTEc) + +## Possible Solutions + +Any solution should involve adding missing overloads that DXC lowers to valid +and efficient IR. The absence of those overloads should be viewed as a bug in +Clang. The following possible solutions should be viewed as applying only to +cases where DXC resolves overloads through implicit conversion of arguments to +cases that Clang identifies as ambiguous. + +### DXC-like scoring algorithm + +One potential "full" mitigation strategy is to align Clang with DXC's best-match +resolution algorithm. This could be done in a less-invasive method than DXC's +implementation however it has significant drawbacks. Since this is unlikely to +be the correct path for the language long-term it would be a significant +investment for a short-lived solution. + +### Make all builtins argument independent templates + +Another possible "full" mitigation strategy is to use C++ templates to more +flexibly design intrinsic declarations. This would likely involve declaring +intrinsics as open-ended templates where the return type and argument types +could be resolved independently, and relying on builtin function validation and +codegen to generate legalized IR. + +This is the worst possible solution both in terms of the additional accrued +technical debt in both semantic analysis and codegen, and because it would +dramatically reduce cases where Clang could emit useful diagnostics for implicit +conversions. + +This option is included only because it is technically feasible, not because it +is a good idea. + +### Recommended Approach: Careful Curation of Overloads + +A more measured partial mitigation is to add additional overloads to builtin +functions to resolve ambiguity in some but not all cases. Following this model +overloads are grouped into three categories: + +* Overloads that can be lowered to efficient IR (e.g. `select`, `clamp`, `min` + and `max` for mixed vectors and scalars). +* Overloads that can be applied predictably to reasonable IR with conversion + diagnostics (e.g. `rcp`, `exp`, `pow` and `sqrt`). +* Overloads that resolve to significantly inferior IR (e.g. `lerp`, `step`). + +This proposal suggests that: +* The first category of overloads should be added to HLSL as official language + overloads. +* The second category of overloads should be added as HLSL 202x features to be + removed in HLSL 202y, and Clang should diagnose the implicit conversions and + warn about deprecation in 202x pending removal in 202y. +* The third category of overloads should be left out from Clang and produce + errors when migrating from DXC to Clang. + +As the recommended approach the following issues have been filed: +* [[HLSL] max and min overloads with mixed scalar and vector arguments](https://github.com/llvm/llvm-project/issues/128231) +* [[HLSL] clamp overloads with mixed scalar vector operands](https://github.com/llvm/llvm-project/issues/128230) +* [[HLSL] Compatibility overloads for integer operations](https://github.com/llvm/llvm-project/issues/128229) +* [[HLSL] hlsl202x double compatability overloads](https://github.com/llvm/llvm-project/issues/128228) +* [[HLSL] Add bool compatibility overloads](https://github.com/llvm/llvm-project/issues/92941) +* [[HLSL] select not resolving correctly for complex cases](https://github.com/llvm/llvm-project/issues/126570) + +Should the missing overloads for the third class of methods prove to be a +significant burden for adoption they could be added under an additional opt-in +compatibility mode.