From bc196b6c7742a0005a8799c323019fa47ab12f24 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Wed, 27 Aug 2025 22:58:16 +0300 Subject: [PATCH 1/5] Update explainer with new proposal for simple accelerator mapping Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 48 +++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 50475561..2a9b2f01 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -63,7 +63,9 @@ Possible means: - identify hints/constraints that require a feedback (error) if not supported, for instance "avoid CPU fallback" or "need low power and low latency acceleration". ### 3. Post-compile query of inference details -**Requirement**: query a compiled graph for details on how may it be run (subject to being overridden by the platform). +**Requirement**: +- Query a compiled graph for details on how may it be run (subject to being overridden by the platform). +- Query if CPU fallback is active for a context. This is being discussed in [Get devices used for a graph after graph compilation #836](https://github.com/webmachinelearning/webnn/issues/836) and being explored in PR [#854 (define graph.devices)](https://github.com/webmachinelearning/webnn/pull/854). @@ -73,7 +75,7 @@ Initially, the proposal was to obtain the list/combination of devices usable for Design decisions may take the following into account: -1. Allow the underlying platform to hint to, or ultimately choose the preferred compute device(s). +1. Allow the underlying platform to ultimately choose the appropriate compute device(s). 2. Allow scripts to express hints/options when creating contexts, such as preference for low power consumption, high performance (throughput), low latency, stable sustained performance, accuracy, etc. @@ -81,11 +83,15 @@ Design decisions may take the following into account: 4. Allow selection from available GPU devices, for instance, by allowing specification of an explicit [GPUDevice](https://gpuweb.github.io/gpuweb/#gpudevice) obtained from available [GPUAdapters](https://gpuweb.github.io/gpuweb/#gpuadapter) using [WebGPU](https://gpuweb.github.io/gpuweb) mechanisms via [GPURequestAdapterOptions](https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions), such as feature level or power preference. -5. Allow selection from available various AI accelerators, including NPUs or a combination of accelerators. This may happen using a (to-be-specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, for instance, `["npu", "cpu"]`, meaning that implementations should try executing the graph on an NPU as much as possible and try to avoid the GPU. The `"cpu"` option could even be omitted, as it could be the default fallback device; therefore, specifying `"npu"` alone would mean the same. However, this can become complex with all possible device variations, so we must specify and standardize the supported fallback orders. (Related to discussions in Issue #815). +5. Allow selection from available various AI accelerators, including NPUs, GPUs or a combination of accelerators. This may happen using a (to-be-specified) algorithmic mapping from context options. Or, allow web apps to hint a preferred fallback order for the given context, or fallbacks to avoid (if that is supported). (Related to discussions in Issue #815). -6. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context so that web apps can select the best device that would work with the intended model. This needs more developer input and examples. (Related to discussions in Issue #815). +6. Add a context creation option/hint for telling app preference for being simply ["accelerated"](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2658627753), meaning NPU, GPU or both. -7. As a corollary to 6, allow creating a context using options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). (Related to discussions in Issue #815). +7. Allow enumeration of [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary) before creating a context so that web apps can select the best device that would work with the intended model. This needs more developer input and examples. (Related to discussions in Issue #815). + +8. As a corollary to 6, allow creating a context using options for [OpSupportLimits](https://webmachinelearning.github.io/webnn/#api-mlcontext-opsupportlimits-dictionary). (Related to discussions in Issue #815). + +9. Expose a context property (or event) to tell whether CPU fallback is active (or likely active) for the context. ## Scenarios, examples, design discussion @@ -102,6 +108,24 @@ context = await navigator.ml.createContext({powerPreference: 'low-power'}); // create a context that will likely map to GPU context = await navigator.ml.createContext({powerPreference: 'high-performance'}); +// create a context that should use massive parallel processing (e.g. GPU or NPU) +context = await navigator.ml.createContext({mpp: true}); +if (context.mpp === "probably") { + // the context will mostly use MPP (GPU or NPU, but CPU fallback may happen) +} else if (context.mpp === "maybe") { + // MPP is supported by the platform but it cannot guarantee it for the moment +} else if (context.mpp === "no") { + // the platform tells it likely cannot provide MPP accelerators such as NPU or GPU +} + +// // create a context that should preferably use NPU +context = await navigator.ml.createContext({mpp: true, powerPreference: 'low-power'}); +if (context.mpp === "no") { + // NPU is likely not available, and since GPU needs high power, it is not used +} else if (context.mpp === "probably") { + // NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice +} + // enumerate devices and limits (as allowed by policy/implementation) // and select one of them to create a context const limitsMap = await navigator.ml.opSupportLimitsPerDevice(); @@ -122,7 +146,7 @@ const context = await navigator.ml.createContext({ fallback: ['npu', 'cpu'] }); ## Open questions -- WebGPU provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should WebNN expose a similar adapter API for NPUs? +- WebGPU provides a way to select a GPU device via [GPUAdapter](https://gpuweb.github.io/gpuweb/#gpuadapter). Should WebNN expose a similar adapter API for NPUs? The current take is to not expose explicit adapters. - How should WebNN extend the context options? What exactly is best to pass as context options? Operator support limits? Supported features, similar to [GPUSupportedFeatures](https://gpuweb.github.io/gpuweb/#gpusupportedfeatures)? Others? @@ -164,7 +188,7 @@ A WebNN application may have specific device preferences for model execution. Th * *Description*: The application developer hints that the model execution should contribute as little as possible to the overall system power draw. This is a broader consideration than just the model's own efficiency, potentially influencing scheduling and resource allocation across the system. The implementation may choose any device ("where JS and Wasm execute," "where WebGL and WebGPU programs execute," or "other") that best achieves this goal. -## Minimum Viable Solution +## Minimum Viable Solution (MVS, completed) Based on the discussion above, the best starting point was a simple solution that can be extended and refined later. A first contribution could include the following changes: - Remove `MLDeviceType` (see [CRD 20250131](https://www.w3.org/TR/2025/CRD-webnn-20250131/#enumdef-mldevicetype)) as an explicit [context option](https://webmachinelearning.github.io/webnn/#dictdef-mlcontextoptions). @@ -179,7 +203,7 @@ Besides, the following topics have been discussed: - Document the valid use cases for requesting a certain device type or combination of devices, and under what error conditions. Currently, after these changes, there remains explicit support for a GPU-only context when an `MLContext` is created from a `GPUDevice` in `createContext()`. - Discuss option #3 from [Considered alternatives](#considered-alternatives). -## Next Phase Device Selection Solution +## Next discussion phase after MVS In [Remove MLDeviceType #809](https://github.com/webmachinelearning/webnn/pull/809), this [comment](https://github.com/webmachinelearning/webnn/pull/809#discussion_r1936856070) raised a new use case: @@ -210,6 +234,14 @@ Given the discussion in Issue #815 ([comment](https://github.com/webmachinelearn - If yes, then in some cases (e.g., CoreML), the model needs to be dispatched before knowing for sure whether it can be executed on the GPU. For that, a new API is needed, as discussed in [Get devices used for a graph after graph compilation #836](https://github.com/webmachinelearning/webnn/issues/836) and being explored in PR [#854 (define graph.devices)](https://github.com/webmachinelearning/webnn/pull/854). Based on the answer, the developer may choose an option other than WebNN. Besides that, the feature permits gathering data on typical graph allocations (note: fingerprintable), which might help the specification work on the device selection API. +## Simple accelerator mapping solution + +The following [proposal](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-3198261369) gained support for a simple accelerator mapping solution (before using the previously discussed fine grained constraints): +- Expose a context property (or event) to tell whether CPU fallback is active (or likely active). +- Add a context creation option/hint for telling app preference for NPU and/or GPU accelerated processing. +As alternatives to the term `"accelerated"`, ["massively parallel computing" (MPP)](https://en.wikipedia.org/wiki/Massively_parallel), or ["highly parallel"](https://link.springer.com/chapter/10.1007/978-1-4613-2249-8_23) (which can be considered a superset of MPP) could be used. This could be exposed as a context property (`"mpp"` or `"supportMPP"`) with possible values (following [this guidance](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2980364545)): `"no"` (or empty string, for likely no support for neither GPU nor NPU), `"maybe"` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, but CPU fallback may occur), `"probably"` (e.g. controlled by the underlying platform which reports best effort for MPP, with CPU fallback being unlikely). + + ## History Previous discussion covered the following main topics: From 21add8691d76fa3aa5184c3e594cf60854934cf4 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Mon, 1 Sep 2025 16:41:34 +0300 Subject: [PATCH 2/5] Update device selection explainer with feedback from the WG call Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 2a9b2f01..9c5c0292 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -108,21 +108,21 @@ context = await navigator.ml.createContext({powerPreference: 'low-power'}); // create a context that will likely map to GPU context = await navigator.ml.createContext({powerPreference: 'high-performance'}); -// create a context that should use massive parallel processing (e.g. GPU or NPU) -context = await navigator.ml.createContext({mpp: true}); -if (context.mpp === "probably") { - // the context will mostly use MPP (GPU or NPU, but CPU fallback may happen) -} else if (context.mpp === "maybe") { - // MPP is supported by the platform but it cannot guarantee it for the moment -} else if (context.mpp === "no") { - // the platform tells it likely cannot provide MPP accelerators such as NPU or GPU +// create a context that should use massive parallel processing (e.g. GPU/NPU) +context = await navigator.ml.createContext({accelerated: true}); +if (context.accelerated === "probably") { + // the context will mostly use GPU/NPU, but CPU fallback may happen +} else if (context.accelerated === "best-effort") { + // NPU/GPU is supported by the platform but it cannot guarantee it for sure +} else if (context.accelerated === "no") { + // the platform tells it likely cannot provide NPU or GPU } -// // create a context that should preferably use NPU -context = await navigator.ml.createContext({mpp: true, powerPreference: 'low-power'}); -if (context.mpp === "no") { +// create a context that should preferably use NPU +context = await navigator.ml.createContext({accelerated: true, powerPreference: 'low-power'}); +if (context.accelerated === "no") { // NPU is likely not available, and since GPU needs high power, it is not used -} else if (context.mpp === "probably") { +} else if (context.accelerated === "probably") { // NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice } @@ -238,8 +238,9 @@ Based on the answer, the developer may choose an option other than WebNN. Beside The following [proposal](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-3198261369) gained support for a simple accelerator mapping solution (before using the previously discussed fine grained constraints): - Expose a context property (or event) to tell whether CPU fallback is active (or likely active). -- Add a context creation option/hint for telling app preference for NPU and/or GPU accelerated processing. -As alternatives to the term `"accelerated"`, ["massively parallel computing" (MPP)](https://en.wikipedia.org/wiki/Massively_parallel), or ["highly parallel"](https://link.springer.com/chapter/10.1007/978-1-4613-2249-8_23) (which can be considered a superset of MPP) could be used. This could be exposed as a context property (`"mpp"` or `"supportMPP"`) with possible values (following [this guidance](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2980364545)): `"no"` (or empty string, for likely no support for neither GPU nor NPU), `"maybe"` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, but CPU fallback may occur), `"probably"` (e.g. controlled by the underlying platform which reports best effort for MPP, with CPU fallback being unlikely). +- Add a context creation option/hint (e.g. `accelerated: true`) for telling app preference for NPU and/or GPU accelerated ["massively parallel"](https://en.wikipedia.org/wiki/Massively_parallel) processing (MPP). + - **Note**. This context option makes sense when an error is returned when the implementation overrides the option. Otherwise, if instead of returning an error a silent fallback is implemented (which seems the more generic behaviour), then applications could query the following proposed property on the context (albeit after context creation). If implementations could detect a CPU fallback, then they could also return an error. Whether to expose an error in this case is to be discussed, as it would allow detecting lack of massively parallel acceleration _before_ creating a context. +- Add a context property named `"accelerated"` with possible values (following [this guidance](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2980364545)): `"no"` (or empty string, for likely no support for neither GPU nor NPU), `"best-effort"` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, but CPU fallback may occur), `"probably"` (e.g. controlled by the underlying platform which reports best effort for MPP, with CPU fallback being unlikely). ## History From 3a61a15076b1e3d51c21341d400c7b3fe25bdde7 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Wed, 24 Sep 2025 12:50:19 +0200 Subject: [PATCH 3/5] Use simple boolean accelerated property and simple policy Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 9c5c0292..e8564169 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -110,20 +110,18 @@ context = await navigator.ml.createContext({powerPreference: 'high-performance'} // create a context that should use massive parallel processing (e.g. GPU/NPU) context = await navigator.ml.createContext({accelerated: true}); -if (context.accelerated === "probably") { +if (context.accelerated) { // the context will mostly use GPU/NPU, but CPU fallback may happen -} else if (context.accelerated === "best-effort") { - // NPU/GPU is supported by the platform but it cannot guarantee it for sure -} else if (context.accelerated === "no") { - // the platform tells it likely cannot provide NPU or GPU +} else { + // the platform tells it likely cannot provide NPU or GPU, so try something else } // create a context that should preferably use NPU context = await navigator.ml.createContext({accelerated: true, powerPreference: 'low-power'}); -if (context.accelerated === "no") { - // NPU is likely not available, and since GPU needs high power, it is not used -} else if (context.accelerated === "probably") { +if (context.accelerated) { // NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice +} else { + // NPU is likely not available, and since GPU needs high power, it is not used } // enumerate devices and limits (as allowed by policy/implementation) @@ -240,8 +238,23 @@ The following [proposal](https://github.com/webmachinelearning/webnn/issues/815# - Expose a context property (or event) to tell whether CPU fallback is active (or likely active). - Add a context creation option/hint (e.g. `accelerated: true`) for telling app preference for NPU and/or GPU accelerated ["massively parallel"](https://en.wikipedia.org/wiki/Massively_parallel) processing (MPP). - **Note**. This context option makes sense when an error is returned when the implementation overrides the option. Otherwise, if instead of returning an error a silent fallback is implemented (which seems the more generic behaviour), then applications could query the following proposed property on the context (albeit after context creation). If implementations could detect a CPU fallback, then they could also return an error. Whether to expose an error in this case is to be discussed, as it would allow detecting lack of massively parallel acceleration _before_ creating a context. -- Add a context property named `"accelerated"` with possible values (following [this guidance](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-2980364545)): `"no"` (or empty string, for likely no support for neither GPU nor NPU), `"best-effort"` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, but CPU fallback may occur), `"probably"` (e.g. controlled by the underlying platform which reports best effort for MPP, with CPU fallback being unlikely). +- Add a context property named `"accelerated"` with possible values: `false` (for likely no support for neither GPU nor NPU), and `true` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, yet CPU fallback may occur). + +The following changes are proposed: + +```js +partial dictionary MLContextOptions { + boolean accelerated = true; +}; + +partial interface MLContext { + boolean cpuFallbackActive; +}; +``` +The behavior of [createContext()](https://webmachinelearning.github.io/webnn/#dom-ml-createcontext) is proposed to follow this policy: +- return an error [in step 4](https://webmachinelearning.github.io/webnn/#create-a-context) if the context option `accelerated` has been set to `true`, but the platform cannot provide massive parallel processing at all, +- and set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. ## History From 2955ec775b46a0ae0c27016f90a6c195c0a2aec1 Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Wed, 8 Oct 2025 10:13:51 +0300 Subject: [PATCH 4/5] Finalize the simple accelerator selection proposal Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index e8564169..e0144b92 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -237,10 +237,11 @@ Based on the answer, the developer may choose an option other than WebNN. Beside The following [proposal](https://github.com/webmachinelearning/webnn/issues/815#issuecomment-3198261369) gained support for a simple accelerator mapping solution (before using the previously discussed fine grained constraints): - Expose a context property (or event) to tell whether CPU fallback is active (or likely active). - Add a context creation option/hint (e.g. `accelerated: true`) for telling app preference for NPU and/or GPU accelerated ["massively parallel"](https://en.wikipedia.org/wiki/Massively_parallel) processing (MPP). - - **Note**. This context option makes sense when an error is returned when the implementation overrides the option. Otherwise, if instead of returning an error a silent fallback is implemented (which seems the more generic behaviour), then applications could query the following proposed property on the context (albeit after context creation). If implementations could detect a CPU fallback, then they could also return an error. Whether to expose an error in this case is to be discussed, as it would allow detecting lack of massively parallel acceleration _before_ creating a context. +Note that in [certain use cases](https://www.w3.org/2025/09/25-webmachinelearning-minutes.html) applications might prefer CPU inference, therefore specifying `accelerated: false` has legit use cases as well. - Add a context property named `"accelerated"` with possible values: `false` (for likely no support for neither GPU nor NPU), and `true` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, yet CPU fallback may occur). +- Add a context property named `"cpuFallbackActive"` that may be polled for detecting CPU fallbacks. In the future, depending on developer feedback, this may be turned into an event. -The following changes are proposed: +The following Web IDL changes are proposed: ```js partial dictionary MLContextOptions { @@ -248,13 +249,16 @@ partial dictionary MLContextOptions { }; partial interface MLContext { - boolean cpuFallbackActive; + readonly attribute boolean accelerated; + readonly attribute boolean cpuFallbackActive; }; ``` The behavior of [createContext()](https://webmachinelearning.github.io/webnn/#dom-ml-createcontext) is proposed to follow this policy: -- return an error [in step 4](https://webmachinelearning.github.io/webnn/#create-a-context) if the context option `accelerated` has been set to `true`, but the platform cannot provide massive parallel processing at all, -- and set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. +- Set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. Applications may poll this property, together with `MLContext::cpuFallbackActive`. + +In the future, more policy options could be considered, for instance: +- Return an error [in step 4](https://webmachinelearning.github.io/webnn/#create-a-context) if the context option `accelerated` has been set to `true`, but the platform cannot provide massive parallel processing at all. ## History From 35ffc057dde85ba3f7a1f2f27764ee67798ac3cc Mon Sep 17 00:00:00 2001 From: Zoltan Kis Date: Thu, 30 Oct 2025 21:03:24 +0100 Subject: [PATCH 5/5] Remove cpuFallbackActive for now Signed-off-by: Zoltan Kis --- device-selection-explainer.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index e0144b92..7c15df55 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -239,7 +239,6 @@ The following [proposal](https://github.com/webmachinelearning/webnn/issues/815# - Add a context creation option/hint (e.g. `accelerated: true`) for telling app preference for NPU and/or GPU accelerated ["massively parallel"](https://en.wikipedia.org/wiki/Massively_parallel) processing (MPP). Note that in [certain use cases](https://www.w3.org/2025/09/25-webmachinelearning-minutes.html) applications might prefer CPU inference, therefore specifying `accelerated: false` has legit use cases as well. - Add a context property named `"accelerated"` with possible values: `false` (for likely no support for neither GPU nor NPU), and `true` (e.g. fully controlled by the underlying platform which makes a best effort for MPP, yet CPU fallback may occur). -- Add a context property named `"cpuFallbackActive"` that may be polled for detecting CPU fallbacks. In the future, depending on developer feedback, this may be turned into an event. The following Web IDL changes are proposed: @@ -250,12 +249,11 @@ partial dictionary MLContextOptions { partial interface MLContext { readonly attribute boolean accelerated; - readonly attribute boolean cpuFallbackActive; }; ``` The behavior of [createContext()](https://webmachinelearning.github.io/webnn/#dom-ml-createcontext) is proposed to follow this policy: -- Set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. Applications may poll this property, together with `MLContext::cpuFallbackActive`. +- Set the `accelerated` property to `false` when the platform could in principle provide massive parallel processing which may or may not be available at the moment. Applications may poll this property. In the future, more policy options could be considered, for instance: - Return an error [in step 4](https://webmachinelearning.github.io/webnn/#create-a-context) if the context option `accelerated` has been set to `true`, but the platform cannot provide massive parallel processing at all.