-
Couldn't load subscription status.
- Fork 53
Add a simple accelerator selection mechanism. #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zoltan Kis <zoltan.kis@intel.com>
…d the poll CPU fallback status steps. Invoke it from graph.dispatch(). Signed-off-by: Zoltan Kis <zoltan.kis@intel.com>
|
@zolkis thank you for formalizing the group’s current thinking into this PR! @huningxin @RafaelCintron, this spec PR is on the WebML WG Teleconference – 23 October 2025 agenda. Reviews, comments, questions prior in this PR appreciated. @handellm to check we remain aligned with Google Meet requirements. FYI @mtavenrath who expressed interest in this space. |
|
Seems good! |
Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>
| 1. Enqueue the following steps to |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}: | ||
| 1. Run these steps, but [=/abort when=] [=this=] [=MLContext/is lost=]: | ||
| 1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|. | ||
| 1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|, as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[powerPreference]]}} and |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[accelerated]]}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose powerPreference and accelerated options should be used by build steps rather than dispatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps were meant for the dispatch phase, when the actual accelerators are selected.
If underlying accelerators cannot be modified during dispatch, then yes, these could be in the build steps, for static preparation.
However, if supported, they should also be included in the dispatch steps, which is the final decision point in dynamic execution.
I guess for now we could just move it to the build phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we have 2 options:
- If dynamic execution is not supported, include the CPU fallback checking in the build steps.
- Otherwise, include the CPU fallback checking in the graph dispatch steps.
Can we make such a distinction in the spec at this stage, or do we assume static execution for now, and include the checks only in the build steps? (Later on that could be modified). We could track this in an issue, outside of this PR.
index.bs
Outdated
| 1. Run these steps, but [=/abort when=] [=this=] [=MLContext/is lost=]: | ||
| 1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|. | ||
| 1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|, as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[powerPreference]]}} and |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[accelerated]]}}. | ||
| 1. Run the steps to [=poll CPU fallback status=] for |graph|.{{MLGraph/[[context]]}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step seems to be unnecessary because cpuFallbackActive getter already runs it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, these would only be needed if there was an event (discussed earlier and agreed that polling is enough for now).
index.bs
Outdated
| </summary> | ||
| 1. If [=this=].{{MLContext/[[accelerated]]}} is `false`, then: | ||
| 1. Set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `true` and return. | ||
| 1. If the underlying execution device is available, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a definition for "underlying execution device"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are mentioned in the device selection section, though no formal definition is given.
If we wanted to give one, it's important to stress it's not a single device, but the final, eventually heterogeneous execution plan that maps specific parts of the model graph to the best available combination of accelerators at the exact moment of inference.
During the build phase, we should not select a device, but define preferences (e.g. prioritized list of execution providers/delegates), which the runtime / underlying platform uses for the actual decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This opens again the discussion on the relationship between context and underlying execution device(s). I think we should not refer to a single device here, in the light of past discussions.
In general we should bind the context not to a device, but to the execution plan (prioritized list of execution providers) mentioned above. Then, a separate concept (internal slot) would be the actual execution plan in the moment of inference. The text formulation should allow for a single device per context to supporting heterogeneous sub-graph execution on different devices.
I think we could track that in a separate issue. In this PR, I have just removed the text "currently only the {{MLPowerPreference}} option" in line 751, and used the term from the device selection section in this algorithm.
For this PR, I modify the text so that it's compatible with my explanation above.
| The {{MLContext}}'s processing type (CPU or massively parallel processing). | ||
| : <dfn>\[[cpuFallbackActive]]</dfn> of type {{boolean}}. | ||
| :: | ||
| The {{MLContext}}'s status for CPU fallback type (CPU or massively parallel processing). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the major native ML runtimes, including Core ML, Windows ML (ONNX Runtime) and TFLite, enable CPU fallback by default. Some runtimes, e.g. ONNX Runtime, allow developers to disable CPU fallback explicitly through a session option disable_cpu_ep_fallback 1. Without CPU fallback, model compilation may fail if the accelerator cannot execute all ops. Chromium prototype has a switch for that only for debugging purpose 2. What are the other cases that a WebNN implementation may set this to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the CPU fallback option to false is when the application wants to have an (error) indication if massively parallel execution is not guaranteed with high chance (not an exact thing, but among many contradicting options, it's good enough). The use case is laid out in issue #815, see e.g. comment, and the following discussion.
(Feel free to suggest other solutions.)
EDIT (w.r.t. where to check for CPU fallback): this use case would prefer early warning of CPU fallback likelihood (to be able to choose another inference path), so for that the checks make more sense in the build steps, indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application wants to have an (error) indication if massively parallel execution is not guaranteed with high chance
How could an application indicate that? Should MLContextOptions add another property, something like boolean cpuFallback, default to true? An application can set contextOptions.cpuFallback to false for this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was discussed in earlier calls (in the explainer related discussions): exposing a context option for setting CPU fallback to false hits some constraints and could be accomplished with the accelerated option, hence was discarded as an approach.
In #884 there is a code example for this use case:
// create a context that should use massive parallel processing (e.g. GPU/NPU)
context = await navigator.ml.createContext({accelerated: true});
if (context.accelerated) {
// the context will mostly use GPU/NPU, but CPU fallback may happen
} else {
// the platform tells it likely cannot provide NPU or GPU, so try something else
}
// create a context that should preferably use NPU
context = await navigator.ml.createContext({accelerated: true, powerPreference: 'low-power'});
if (context.accelerated) {
// NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice
} else {
// NPU is likely not available, and since GPU needs high power, it is not used
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code example. I understand an implementation should preferably use GPU/NPU if accelerated option is set to true. However, as I shared, the CPU fallback is enabled by default by major native ML runtimes. It's not clear to me how an implementation can tell an application wants to disable the CPU fallback.
could be accomplished with the accelerated option
Do you mean the implementation should disable CPU fallback if accelerated option is set to true? Then how could an application indicate it is fine with CPU fallback while preferring GPU/NPU execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how an implementation can tell an application wants to disable the CPU fallback.
No, that would not be supported (cannot be guaranteed). The use case is not to tell "disable CPU fallback", but to tell "I prefer massive parallel processing". So we don't expose disabling CPU fallback as a context option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But applications can find out if CPU fallback is being used, so the original Meet use case is covered by this, and by the context option for accelerated processing.
…he steps checking CPU fallback Signed-off-by: Zoltan Kis <zoltan.kis@intel.com>
Fixes #815
As explained in #884, add context options and attributes/internal slots that can be used for conveying application hints wrt the preferred acceleration type (CPU or massively parallel processing, i.e. NPU or GPU).
This is a minimal change, and we might want to refine more the algorithms wrt. context power preferences and acceleration options (currently not addressed). These could be done in this PR, or in a separate subsequent PR.
Preview | Diff