Skip to content

Add sycl_khr_free_function_commands extension#922

Open
slawekptak wants to merge 77 commits intoKhronosGroup:mainfrom
slawekptak:khr_free_function_commands_new
Open

Add sycl_khr_free_function_commands extension#922
slawekptak wants to merge 77 commits intoKhronosGroup:mainfrom
slawekptak:khr_free_function_commands_new

Conversation

@slawekptak
Copy link

This is a new, follow-up PR to #644, originally created by John Pennycook. All the future work related to that PR will be continued here. The reason for creating a new PR is that the PR ownership transfer is required.

This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects.

It also proposes alternative names for several commands (e.g., launch) and simplifies some concepts (e.g., by removing the need for the nd_range class).

Pennycook and others added 30 commits October 17, 2024 15:00
This extension provides an alternative mechanism for submitting commands to a
device via free-functions that require developers to opt-in to the creation of
event objects.

It also proposes alternative names for several commands (e.g., launch) and
simplifies some concepts (e.g., by removing the need for the nd_range class).
Previous "0 or more" wording only made sense when reductions could be
optionally provided to functions like parallel_for; now that there are
dedicated *_reduce functions, at least one reduction is required.
"is" is more consistent with ISO C++ wording.
Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>
There is no need to constrain T here because T must be device-copyable in order
to construct the accessor passed as an argument.
Renaming sycl::nd_item is not a necessary part of the API redesign for
submitting work, so it should be moved to its own extension.

This will also give us more time to consider the design and naming of any
proposed replacement(s), including how they should interact with new
functionality proposed in other KHRs.
There are currently no backends that define interop for reductions,
so we can remove these functions for now. If we decide later that
these functions are necessary, we can release a revision of the KHR.
Co-authored-by: Andrey Alekseenko <al42and@gmail.com>
@tomdeakin
Copy link
Contributor

The WG discussed this, and feel we need a solution for local memory in this KHR.

@PeterTh
Copy link

PeterTh commented Oct 16, 2025

Regarding local memory: to me, it seems like the least invasive strategy (as in, it doesn't depend on many other changes) that fits with the current specification of this extension would be using requirements for local accessors - since it's a natural fit with how non-local accessors are proposed to be handled. A future extension for e.g. static work group memory could then make that superfluous where it applies.

gmlueck added a commit to gmlueck/llvm that referenced this pull request Oct 29, 2025
Revamp the proposed specification to provide convenience APIs that are
similar to CUDA's `cudaEventRecord` and `cudaStreamWaitEvent` because
this is the immediate request from our customer.

I think we do still want to add a `record_event` property, but I think
we could add that separately as part of the KHR being proposed in
KhronosGroup/SYCL-Docs#922, or as a separate oneapi extension based on
that KHR.
gmlueck added a commit to gmlueck/llvm that referenced this pull request Oct 29, 2025
Revamp the proposed specification to provide convenience APIs that are
similar to CUDA's `cudaEventRecord` and `cudaStreamWaitEvent` because
this is the immediate request from our customer.

I think we do still want to add a `record_event` property, but I think
we could add that separately as part of the KHR being proposed in
KhronosGroup/SYCL-Docs#922, or as a separate oneapi extension based on
that KHR.
@TApplencourt
Copy link
Contributor

Agree with @PeterTh , would like to keep the change of this PR "minimal" so we can merge it and then we can discuss new feature. I want to avoid the feature creep problem. This PR is immensely useful as if, so no need to do everything in one go :)

The function names for memory operations now follow
the "enqueue_*" pattern, to indicate that these operations
are added to the queue and not executed immediately.
- Changed the return type of the functions to void (signal_event
should be used to track completion).
- Added the signal_event, wait_event and wait_events structs
to be used with the requirements object.
- Added the following functions: make_event, enqueue_signal_event,
enqueue_wait_event, enqueue_wait_events, enqueue_barrier.
- Removed the following functions: command_barrier, event_barrier.
- Updated the code example.
slawekptak and others added 6 commits February 5, 2026 08:57
Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>
Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>
Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>
Co-authored-by: Greg Lueck <gregory.m.lueck@intel.com>
Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@abagusetty
Copy link

Thanks a bunch for the efforts to push this to finish line.

My recollection, prefetch_host is sparingly used ATM. And these few apps (qmcpack, exachem, etc) use it at production-scale.

My two-cents is towards free function standardization of this API just to reduce the cycles over non-free

@gmlueck gmlueck mentioned this pull request Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants