Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions reports/Overload-Resolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# Overload Resolution

HLSL supports argument dependent lookup of function overload candidates, and an
algorithm to determine the best match from a set of possible candidates.

## DXC's Approach

The algorithm used in DXC is different for resolving user-defined functions than
for HLSL standard library functions which are compiler intrinsics rather than
true functions.

For true-functions, when processing a call expression, DXC generates a list of
all functions with the specified name. Any functions with an incorrect number of
parameters for the provided argument list are discarded as invalid. For each
remaining function, a conversion sequence is generated from each argument to its
corresponding parameter. The conversion sequences are scored, and the sum of all
conversion sequences for a given overload are compared against the sums of the
conversion sequences for other overloads. If one overload has the lowest score,
it is the best match.

The scoring based solution works based on assigning certain conversions as being
an order of magnitude worse than other conversions. For example, the score for a
float to integer conversion is orders of magnitude greater than the score for a
float to float-vector extension, so an overload candidate with two
float to float-vector extension conversion sequences will be chosen over an
overload candidate with one float to integer conversion.

```hlsl
bool fn(int, float) {
return true;
}

bool fn(float2, float2) {
return false;
}

export bool call() {
float F = 1;
return fn(F, F); // resolves to fn(float2,float2)
}
```
[Compiler Explorer](https://godbolt.org/z/KnKePeqob)

The second part of DXC's approach is how DXC selects truncated overload sets for
standard library functions. DXC processes overload candidates for intrisncs
outside the overload sets, and only puts one matching overload candidate in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it pick which one to add?

the set (note the [function returning after encountering a
match](https://github.com/microsoft/DirectXShaderCompiler/blob/main/tools/clang/lib/Sema/SemaHLSL.cpp#L4828)).
This prevents ambiguous cases, where ambiguity would exist in standalone
functions. In the example below, `WaveActiveMax` is a standard library function
with both `float` and `int` overloads (among others). Both should be valid in
the overload set, however only the `float` overload is added to the set.

```hlsl
export bool callW() {
bool B = true;
return WaveActiveMax(B); // resolves float.
}

bool fn(int) {
return true;
}

bool fn(float) {
return false;
}

export bool call() {
bool B = true;
return fn(B); // Ambiguous
}
```
[Compiler Explorer](https://godbolt.org/z/ccTM63GGs)

### Problems With DXC's Approach

First and foremost the inconsistency between standard library functions and
user-defined functions is a source of confusion and bugs.

DXC's custom overload resolution also does not handle cases that can arise in
HLSL like constant implicit objects in member functions, non-member overloaded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLSL, like
*i think a comma is needed here.

operators, and user-defined conversion functions.

Additionally, an algorithm that depends on scores is subject to computing
limitations like numeric precision and overflow. As such the algorithm cannot
solve some case when a large number of parameters are involved.

## Specified Behavior: Clang Approach

In the draft language specification and in Clang's implementation, HLSL adopts
C++'s best match algorithm and candidate selection algorithms. HLSL extends the
set of conversion ranks defined by C++ to included vector and matrix dimension
and element conversions. The conversions are ranked in accordance with how DXC
assigned scores. With this approach for a given overload set, any overload that
Clang resolves will match the overload that DXC resolves.

Clang will not always resolve an overload in cases that DXC would, this is true
of user-defined functions, but even more true when the difference in standard
library function resolution is taken into account.

### Problems with the Specified Behavior

The specified behavior reports ambiguity in more cases than DXC. While in some
cases this is a benefit, and adapting existing code is generally simple, it
may impose a significant burden for some adopters.

## Experiment!

A clear impediment to decision making is insufficient data about the impact of
these changes on user code. Compiling proprietary HLSL sources give us
visibility into how these decisions may impact users.

The codebase used to test is comprised of 188 base shaders which are
compiled to 4757 unique shader variants making heavy use of the C preprocessor
to differentiate shader variations. The shaders contain cumulatively over 400000
lines of HLSL. Important caveat: these sources may not be representative of a
majority of HLSL users as these shaders do not come from video games.

Migration of these shaders to build with Clang with updates to both the shader
source and Clang's defined HLSL intrinsics header took about half a day of
concerted effort. Expectation is that would be reduced for software developers
due to mitigations implemented in Clang, although the degree of reduction
depends on the mitigation strategy.

This effort identified three classes of overload failures:

* Missing overloads in Clang that DXC supports.
* Overloads that DXC implicitly resolves to reasonable results.
* Overloads that DXC implicitly resolves to potentially bad results.

### Missing Overloads

An example of a missing overload is the unsigned integer overloads for `abs`.
The [official
docs](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-abs)
define support for `abs` overloads for `float` and `int`, however in practice
DXC supports valid lowering for all 16, 32 and 64 bit arithmetic types signed
and unsigned. The unsigned integer case is a no-op.

```hlsl
export uint call(uint I) {
return abs(I); // no-op
}
```
[Compiler Explorer](https://godbolt.org/z/MsrEceM4E )

### Implicitly resolved; reasonable results

In DXC, math operations which are only supported for floating point types where
DXC will resolve the type as `float` when integer types are provided. Some
examples of such operations are `rcp`, `exp`, `pow` and `sqrt`. None of these
operations really make sense on integers, so it is unlikely that the user
expected integer lower.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what 'expected integer lower' means so I'm unsure if this is what you meant to type.


One important note here is that these operations produce implicit conversions,
which DXC may not warn about:

```hlsl
export float call(int I) {
return sqrt(I); // calls sqrt(float)
}
```
[Compiler Explorer](https://godbolt.org/z/cf8W7455s)

Another good example of reasonable results are cases like `max`, `min`,
`select`, and `clamp` when applied to mixes of vector and scalar operands:

```hlsl
export int4 call(int4 I, int Min, int Max) {
return clamp(I, Min, Max); // calls clamp(int4, int4, int4)
}
```
[Compiler Explorer](https://godbolt.org/z/9jTe1Pq4x)

### Implicitly resolved; potentially bad results

Some math operations where DXC matches intrinsics come with significant
performance impacts. For example consider the `lerp` intrinsic which only has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the performance impacts?

floating point implementations. This case also emits no user diagnostics, and
produces _a lot_ of implicit conversions:

```hlsl
export float call(int I, int J, int K) {
return lerp(I, J, K); // calls lerp(float, float, float)
}
```
[Compiler Explorer](https://godbolt.org/z/aWET9WTEc)

## Possible Solutions

Any solution should involve adding missing overloads that DXC lowers to valid
and efficient IR. The absence of those overloads should be viewed as a bug in
Clang. The following possible solutions should be viewed as applying only to
cases where DXC resolves overloads through implicit conversion of arguments to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is very confusing to me.

cases that Clang identifies as ambiguous.

### DXC-like scoring algorithm

One potential "full" mitigation strategy is to align Clang with DXC's best-match
resolution algorithm. This could be done in a less-invasive method than DXC's
implementation however it has significant drawbacks. Since this is unlikely to
be the correct path for the language long-term it would be a significant
investment for a short-lived solution.

### Make all builtins argument independent templates

Another possible "full" mitigation strategy is to use C++ templates to more
flexibly design intrinsic declarations. This would likely involve declaring
intrinsics as open-ended templates where the return type and argument types
could be resolved independently, and relying on builtin function validation and
codegen to generate legalized IR.

This is the worst possible solution both in terms of the additional accrued
technical debt in both semantic analysis and codegen, and because it would
dramatically reduce cases where Clang could emit useful diagnostics for implicit
conversions.

This option is included only because it is technically feasible, not because it
is a good idea.

### Recommended Approach: Careful Curation of Overloads

A more measured partial mitigation is to add additional overloads to builtin
functions to resolve ambiguity in some but not all cases. Following this model
overloads are grouped into three categories:

* Overloads that can be lowered to efficient IR (e.g. `select`, `clamp`, `min`
and `max` for mixed vectors and scalars).
* Overloads that can be applied predictably to reasonable IR with conversion
diagnostics (e.g. `rcp`, `exp`, `pow` and `sqrt`).
* Overloads that resolve to significantly inferior IR (e.g. `lerp`, `step`).

This proposal suggests that:
* The first category of overloads should be added to HLSL as official language
overloads.
* The second category of overloads should be added as HLSL 202x features to be
removed in HLSL 202y, and Clang should diagnose the implicit conversions and
warn about deprecation in 202x pending removal in 202y.
* The third category of overloads should be left out from Clang and produce
errors when migrating from DXC to Clang.

As the recommended approach the following issues have been filed:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach, the folllowing

* [[HLSL] max and min overloads with mixed scalar and vector arguments](https://github.com/llvm/llvm-project/issues/128231)
* [[HLSL] clamp overloads with mixed scalar vector operands](https://github.com/llvm/llvm-project/issues/128230)
* [[HLSL] Compatibility overloads for integer operations](https://github.com/llvm/llvm-project/issues/128229)
* [[HLSL] hlsl202x double compatability overloads](https://github.com/llvm/llvm-project/issues/128228)
* [[HLSL] Add bool compatibility overloads](https://github.com/llvm/llvm-project/issues/92941)
* [[HLSL] select not resolving correctly for complex cases](https://github.com/llvm/llvm-project/issues/126570)

Should the missing overloads for the third class of methods prove to be a
significant burden for adoption they could be added under an additional opt-in
compatibility mode.