-
Couldn't load subscription status.
- Fork 51
Report on impact of overload resolution behavior changes #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| # Overload Resolution | ||
|
|
||
| HLSL supports argument dependent lookup of function overload candidates, and an | ||
| algorithm to determine the best match from a set of possible candidates. | ||
|
|
||
| ## DXC's Approach | ||
|
|
||
| The algorithm used in DXC is different for resolving user-defined functions than | ||
| for HLSL standard library functions which are compiler intrinsics rather than | ||
| true functions. | ||
|
|
||
| For true-functions, when processing a call expression, DXC generates a list of | ||
| all functions with the specified name. Any functions with an incorrect number of | ||
| parameters for the provided argument list are discarded as invalid. For each | ||
| remaining function, a conversion sequence is generated from each argument to its | ||
| corresponding parameter. The conversion sequences are scored, and the sum of all | ||
| conversion sequences for a given overload are compared against the sums of the | ||
| conversion sequences for other overloads. If one overload has the lowest score, | ||
| it is the best match. | ||
|
|
||
| The scoring based solution works based on assigning certain conversions as being | ||
| an order of magnitude worse than other conversions. For example, the score for a | ||
| float to integer conversion is orders of magnitude greater than the score for a | ||
| float to float-vector extension, so an overload candidate with two | ||
| float to float-vector extension conversion sequences will be chosen over an | ||
| overload candidate with one float to integer conversion. | ||
|
|
||
| ```hlsl | ||
| bool fn(int, float) { | ||
| return true; | ||
| } | ||
|
|
||
| bool fn(float2, float2) { | ||
| return false; | ||
| } | ||
|
|
||
| export bool call() { | ||
| float F = 1; | ||
| return fn(F, F); // resolves to fn(float2,float2) | ||
| } | ||
| ``` | ||
| [Compiler Explorer](https://godbolt.org/z/KnKePeqob) | ||
|
|
||
| The second part of DXC's approach is how DXC selects truncated overload sets for | ||
| standard library functions. DXC processes overload candidates for intrisncs | ||
| outside the overload sets, and only puts one matching overload candidate in | ||
| the set (note the [function returning after encountering a | ||
| match](https://github.com/microsoft/DirectXShaderCompiler/blob/main/tools/clang/lib/Sema/SemaHLSL.cpp#L4828)). | ||
| This prevents ambiguous cases, where ambiguity would exist in standalone | ||
| functions. In the example below, `WaveActiveMax` is a standard library function | ||
| with both `float` and `int` overloads (among others). Both should be valid in | ||
| the overload set, however only the `float` overload is added to the set. | ||
|
|
||
| ```hlsl | ||
| export bool callW() { | ||
| bool B = true; | ||
| return WaveActiveMax(B); // resolves float. | ||
| } | ||
|
|
||
| bool fn(int) { | ||
| return true; | ||
| } | ||
|
|
||
| bool fn(float) { | ||
| return false; | ||
| } | ||
|
|
||
| export bool call() { | ||
| bool B = true; | ||
| return fn(B); // Ambiguous | ||
| } | ||
| ``` | ||
| [Compiler Explorer](https://godbolt.org/z/ccTM63GGs) | ||
|
|
||
| ### Problems With DXC's Approach | ||
|
|
||
| First and foremost the inconsistency between standard library functions and | ||
| user-defined functions is a source of confusion and bugs. | ||
|
|
||
| DXC's custom overload resolution also does not handle cases that can arise in | ||
| HLSL like constant implicit objects in member functions, non-member overloaded | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HLSL, like |
||
| operators, and user-defined conversion functions. | ||
|
|
||
| Additionally, an algorithm that depends on scores is subject to computing | ||
| limitations like numeric precision and overflow. As such the algorithm cannot | ||
| solve some case when a large number of parameters are involved. | ||
|
|
||
| ## Specified Behavior: Clang Approach | ||
|
|
||
| In the draft language specification and in Clang's implementation, HLSL adopts | ||
| C++'s best match algorithm and candidate selection algorithms. HLSL extends the | ||
| set of conversion ranks defined by C++ to included vector and matrix dimension | ||
| and element conversions. The conversions are ranked in accordance with how DXC | ||
| assigned scores. With this approach for a given overload set, any overload that | ||
| Clang resolves will match the overload that DXC resolves. | ||
|
|
||
| Clang will not always resolve an overload in cases that DXC would, this is true | ||
| of user-defined functions, but even more true when the difference in standard | ||
| library function resolution is taken into account. | ||
|
|
||
| ### Problems with the Specified Behavior | ||
|
|
||
| The specified behavior reports ambiguity in more cases than DXC. While in some | ||
| cases this is a benefit, and adapting existing code is generally simple, it | ||
| may impose a significant burden for some adopters. | ||
|
|
||
| ## Experiment! | ||
|
|
||
| A clear impediment to decision making is insufficient data about the impact of | ||
| these changes on user code. Compiling proprietary HLSL sources give us | ||
| visibility into how these decisions may impact users. | ||
|
|
||
| The codebase used to test is comprised of 188 base shaders which are | ||
| compiled to 4757 unique shader variants making heavy use of the C preprocessor | ||
| to differentiate shader variations. The shaders contain cumulatively over 400000 | ||
| lines of HLSL. Important caveat: these sources may not be representative of a | ||
| majority of HLSL users as these shaders do not come from video games. | ||
|
|
||
| Migration of these shaders to build with Clang with updates to both the shader | ||
| source and Clang's defined HLSL intrinsics header took about half a day of | ||
| concerted effort. Expectation is that would be reduced for software developers | ||
| due to mitigations implemented in Clang, although the degree of reduction | ||
| depends on the mitigation strategy. | ||
|
|
||
| This effort identified three classes of overload failures: | ||
|
|
||
| * Missing overloads in Clang that DXC supports. | ||
| * Overloads that DXC implicitly resolves to reasonable results. | ||
| * Overloads that DXC implicitly resolves to potentially bad results. | ||
|
|
||
| ### Missing Overloads | ||
|
|
||
| An example of a missing overload is the unsigned integer overloads for `abs`. | ||
| The [official | ||
| docs](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-abs) | ||
| define support for `abs` overloads for `float` and `int`, however in practice | ||
| DXC supports valid lowering for all 16, 32 and 64 bit arithmetic types signed | ||
| and unsigned. The unsigned integer case is a no-op. | ||
|
|
||
| ```hlsl | ||
| export uint call(uint I) { | ||
| return abs(I); // no-op | ||
| } | ||
| ``` | ||
| [Compiler Explorer](https://godbolt.org/z/MsrEceM4E ) | ||
|
|
||
| ### Implicitly resolved; reasonable results | ||
|
|
||
| In DXC, math operations which are only supported for floating point types where | ||
| DXC will resolve the type as `float` when integer types are provided. Some | ||
| examples of such operations are `rcp`, `exp`, `pow` and `sqrt`. None of these | ||
| operations really make sense on integers, so it is unlikely that the user | ||
| expected integer lower. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know what 'expected integer lower' means so I'm unsure if this is what you meant to type. |
||
|
|
||
| One important note here is that these operations produce implicit conversions, | ||
| which DXC may not warn about: | ||
|
|
||
| ```hlsl | ||
| export float call(int I) { | ||
| return sqrt(I); // calls sqrt(float) | ||
| } | ||
| ``` | ||
| [Compiler Explorer](https://godbolt.org/z/cf8W7455s) | ||
|
|
||
| Another good example of reasonable results are cases like `max`, `min`, | ||
| `select`, and `clamp` when applied to mixes of vector and scalar operands: | ||
|
|
||
| ```hlsl | ||
| export int4 call(int4 I, int Min, int Max) { | ||
| return clamp(I, Min, Max); // calls clamp(int4, int4, int4) | ||
| } | ||
| ``` | ||
| [Compiler Explorer](https://godbolt.org/z/9jTe1Pq4x) | ||
|
|
||
| ### Implicitly resolved; potentially bad results | ||
|
|
||
| Some math operations where DXC matches intrinsics come with significant | ||
| performance impacts. For example consider the `lerp` intrinsic which only has | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what are the performance impacts? |
||
| floating point implementations. This case also emits no user diagnostics, and | ||
| produces _a lot_ of implicit conversions: | ||
|
|
||
| ```hlsl | ||
| export float call(int I, int J, int K) { | ||
| return lerp(I, J, K); // calls lerp(float, float, float) | ||
| } | ||
| ``` | ||
| [Compiler Explorer](https://godbolt.org/z/aWET9WTEc) | ||
|
|
||
| ## Possible Solutions | ||
|
|
||
| Any solution should involve adding missing overloads that DXC lowers to valid | ||
| and efficient IR. The absence of those overloads should be viewed as a bug in | ||
| Clang. The following possible solutions should be viewed as applying only to | ||
| cases where DXC resolves overloads through implicit conversion of arguments to | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence is very confusing to me. |
||
| cases that Clang identifies as ambiguous. | ||
|
|
||
| ### DXC-like scoring algorithm | ||
|
|
||
| One potential "full" mitigation strategy is to align Clang with DXC's best-match | ||
| resolution algorithm. This could be done in a less-invasive method than DXC's | ||
| implementation however it has significant drawbacks. Since this is unlikely to | ||
| be the correct path for the language long-term it would be a significant | ||
| investment for a short-lived solution. | ||
|
|
||
| ### Make all builtins argument independent templates | ||
|
|
||
| Another possible "full" mitigation strategy is to use C++ templates to more | ||
| flexibly design intrinsic declarations. This would likely involve declaring | ||
| intrinsics as open-ended templates where the return type and argument types | ||
| could be resolved independently, and relying on builtin function validation and | ||
| codegen to generate legalized IR. | ||
|
|
||
| This is the worst possible solution both in terms of the additional accrued | ||
| technical debt in both semantic analysis and codegen, and because it would | ||
| dramatically reduce cases where Clang could emit useful diagnostics for implicit | ||
| conversions. | ||
|
|
||
| This option is included only because it is technically feasible, not because it | ||
| is a good idea. | ||
|
|
||
| ### Recommended Approach: Careful Curation of Overloads | ||
|
|
||
| A more measured partial mitigation is to add additional overloads to builtin | ||
| functions to resolve ambiguity in some but not all cases. Following this model | ||
| overloads are grouped into three categories: | ||
|
|
||
| * Overloads that can be lowered to efficient IR (e.g. `select`, `clamp`, `min` | ||
| and `max` for mixed vectors and scalars). | ||
| * Overloads that can be applied predictably to reasonable IR with conversion | ||
| diagnostics (e.g. `rcp`, `exp`, `pow` and `sqrt`). | ||
| * Overloads that resolve to significantly inferior IR (e.g. `lerp`, `step`). | ||
|
|
||
| This proposal suggests that: | ||
| * The first category of overloads should be added to HLSL as official language | ||
| overloads. | ||
| * The second category of overloads should be added as HLSL 202x features to be | ||
| removed in HLSL 202y, and Clang should diagnose the implicit conversions and | ||
| warn about deprecation in 202x pending removal in 202y. | ||
| * The third category of overloads should be left out from Clang and produce | ||
| errors when migrating from DXC to Clang. | ||
|
|
||
| As the recommended approach the following issues have been filed: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. approach, the folllowing |
||
| * [[HLSL] max and min overloads with mixed scalar and vector arguments](https://github.com/llvm/llvm-project/issues/128231) | ||
| * [[HLSL] clamp overloads with mixed scalar vector operands](https://github.com/llvm/llvm-project/issues/128230) | ||
| * [[HLSL] Compatibility overloads for integer operations](https://github.com/llvm/llvm-project/issues/128229) | ||
| * [[HLSL] hlsl202x double compatability overloads](https://github.com/llvm/llvm-project/issues/128228) | ||
| * [[HLSL] Add bool compatibility overloads](https://github.com/llvm/llvm-project/issues/92941) | ||
| * [[HLSL] select not resolving correctly for complex cases](https://github.com/llvm/llvm-project/issues/126570) | ||
|
|
||
| Should the missing overloads for the third class of methods prove to be a | ||
| significant burden for adoption they could be added under an additional opt-in | ||
| compatibility mode. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it pick which one to add?