From 66c8acea1bc460d132786e2d8504321df818765a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 7 Oct 2025 17:29:54 -0400 Subject: [PATCH 1/5] [SYCL][Doc] Add spec to record an event Add a proposed extension specification which allows an application to reuse the same event object in multiple command submissions, rather than creating a new event for each submission. --- .../sycl_ext_oneapi_record_event.asciidoc | 213 ++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc new file mode 100644 index 0000000000000..5b79d3ad4c880 --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc @@ -0,0 +1,213 @@ += sycl_ext_oneapi_record_event + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] +:endnote: —{nbsp}end{nbsp}note + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2025 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 10 specification. +All references below to the "core SYCL specification" or to section numbers in +the SYCL specification refer to that revision. + +This extension also depends on the following other SYCL extensions: + +* link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[ + sycl_ext_oneapi_enqueue_functions] + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. +Interfaces defined in this specification may not be implemented yet or may be in +a preliminary state. +The specification itself may also change in incompatible ways before it is +finalized. +*Shipping software products should not rely on APIs defined in this +specification.* + + +== Overview + +This extension adds the ability to reuse the same `event` object in multiple +command submissions, rather than creating a new event for each submission. +This pattern may perform better on some implementations because fewer event +objects need to be created and destroyed. +The pattern may also be more familiar to users porting CUDA code to SYCL. + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. An implementation supporting this extension must predefine the +macro `SYCL_EXT_ONEAPI_RECORD_EVENT` to one of the values defined in the table +below. Applications can test for the existence of this macro to determine if +the implementation supports this feature, or applications can test the macro's +value to determine which of the extension's features the implementation +supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + +=== New kernel launch property + +This extension adds a new kernel launch property: + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +struct record_event { + record_event(event* evt); (1) +}; +using record_event_key = record_event; + +} // namespace sycl::ext::oneapi::experimental +---- + +This property may be passed as a launch property to the following command +submission functions from +link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[ +sycl_ext_oneapi_enqueue_functions]: + +* The `submit` overload that takes parameters of type `queue` and `Properties`. +* The `single_task` overloads that take parameters of type `queue` and + `Properties`. +* The `parallel_for` overloads that take parameters of type `queue` and + `launch_config`. +* The `nd_launch` overloads that take parameters of type `queue` and + `launch_config`. +* The `memcpy` overload that takes parameters of type `queue` and `Properties`. +* The `copy` overload that takes parameters of type `queue` and `Properties`. +* The `memset` overload that takes parameters of type `queue` and `Properties`. +* The `fill` overload that takes parameters of type `queue` and `Properties`. +* The `prefetch` overload that takes parameters of type `queue` and + `Properties`. +* The `mem_advise` overload that takes parameters of type `queue` and + `Properties`. +* The `barrier` overload that takes parameters of type `queue` and `Properties`. +* The `partial_barrier` overload that takes parameters of type `queue` and + `Properties`. +* The `execute_graph` overload that takes parameters of type `queue` and + `Properties`. + +_Effects (1)_: Constructs a `record_event` property with a pointer to an `event` +object. +When `evt` is not null, the following happens. +The status of the event is disassociated with any previously submitted command, +and its status is reset to `info::event_command_status::submitted`. +For the `submit` function, this happens when the command group function returns +back to `submit`. +The event is then associated with the newly submitted command. +Assuming the event remains associated with this command, the event's status +changes according to the execution status of that command. +When `evt` is null, the property has no effect on the command submission. + +_Remarks:_ + +* If a recorded event is used as a command dependency for some other command + _C2_ (e.g. via `handler::depends_on`), the dependency is captured at the point + when _C2_ is submitted. + The dependency does _not_ change if the event is subsequently overwritten via + `record_event`. + +* If another host thread is blocked in a call to `event::wait` when that same + event is associated with a new command via `record_event`, it is unspecified + whether the call to `event:wait` unblocks. + + +== Example + +[source,c++] +---- +#include +namespace syclex = sycl::ext::oneapi::experimental; + +int main() { + sycl::queue q1; + sycl::queue q2; + sycl::event e; + sycl::range r{GLOBAL}; + + // Launch a command and record an event which tracks its completion. + syclex::launch_config cfg{r, syclex::record_event{&e}}; + syclex::parallel_for(q1, cfg, [=](sycl::item<> it) { /* ... */ }); + + // Launch another command which depends on that event and also + // record completion of this new command using the same event. + syclex::submit(q2, syclex::record_event{&e}, [&](sycl::handler cgh) { + cgh.depends_on(e); + syclex::parallel_for(cgh, r, [=](sycl::item<> it) { /* ... */ }); + }); + + // Wait for both commands to complete. + e.wait(); +} +---- + + +== Implementation notes + +It is expected that the implementation will often be able to reuse the +underlying backend event object when a SYCL event is passed to `record_event`. +However, there will still be cases when the implementation needs to release the +underlying backend event and create a new one. +For example, this will happen when the existing backend event is from a +different backend or from a different context than the command being submitted. +In these cases, we expect that the implementation will release the backend event +and associate the SYCL event with a new backend event. + + +== Issues + +* Is it possible to implement the behavior specified above regarding + `event::wait` and `record_event`? + What if the implementation needs to release the backend event when another + host thread is blocked in a call to `event:wait`? + Can we guarantee that the call to `event::wait` either remains blocked or + becomes unblocked? + (Either is fine.) + Or, is it possible that this will lead to a crash? + If a crash is possible, we need to weaken the specification to say this + condition is UB. From ce271b418bdd0cf92d72268795b29230c32880c9 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 27 Oct 2025 09:02:29 -0400 Subject: [PATCH 2/5] Revamp specification Revamp the proposed specification to provide convenience APIs that are similar to CUDA's `cudaEventRecord` and `cudaStreamWaitEvent` because this is the immediate request from our customer. I think we do still want to add a `record_event` property, but I think we could add that separately as part of the KHR being proposed in KhronosGroup/SYCL-Docs#922, or as a separate oneapi extension based on that KHR. --- .../sycl_ext_oneapi_record_event.asciidoc | 213 -------- .../sycl_ext_oneapi_reusable_events.asciidoc | 460 ++++++++++++++++++ 2 files changed, 460 insertions(+), 213 deletions(-) delete mode 100644 sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc create mode 100644 sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc deleted file mode 100644 index 5b79d3ad4c880..0000000000000 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc +++ /dev/null @@ -1,213 +0,0 @@ -= sycl_ext_oneapi_record_event - -:source-highlighter: coderay -:coderay-linenums-mode: table - -// This section needs to be after the document title. -:doctype: book -:toc2: -:toc: left -:encoding: utf-8 -:lang: en -:dpcpp: pass:[DPC++] -:endnote: —{nbsp}end{nbsp}note - -// Set the default source code type in this document to C++, -// for syntax highlighting purposes. This is needed because -// docbook uses c++ and html5 uses cpp. -:language: {basebackend@docbook:c++:cpp} - - -== Notice - -[%hardbreaks] -Copyright (C) 2025 Intel Corporation. All rights reserved. - -Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks -of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by -permission by Khronos. - - -== Contact - -To report problems with this extension, please open a new issue at: - -https://github.com/intel/llvm/issues - - -== Dependencies - -This extension is written against the SYCL 2020 revision 10 specification. -All references below to the "core SYCL specification" or to section numbers in -the SYCL specification refer to that revision. - -This extension also depends on the following other SYCL extensions: - -* link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[ - sycl_ext_oneapi_enqueue_functions] - - -== Status - -This is a proposed extension specification, intended to gather community -feedback. -Interfaces defined in this specification may not be implemented yet or may be in -a preliminary state. -The specification itself may also change in incompatible ways before it is -finalized. -*Shipping software products should not rely on APIs defined in this -specification.* - - -== Overview - -This extension adds the ability to reuse the same `event` object in multiple -command submissions, rather than creating a new event for each submission. -This pattern may perform better on some implementations because fewer event -objects need to be created and destroyed. -The pattern may also be more familiar to users porting CUDA code to SYCL. - - -== Specification - -=== Feature test macro - -This extension provides a feature-test macro as described in the core SYCL -specification. An implementation supporting this extension must predefine the -macro `SYCL_EXT_ONEAPI_RECORD_EVENT` to one of the values defined in the table -below. Applications can test for the existence of this macro to determine if -the implementation supports this feature, or applications can test the macro's -value to determine which of the extension's features the implementation -supports. - -[%header,cols="1,5"] -|=== -|Value -|Description - -|1 -|The APIs of this experimental extension are not versioned, so the - feature-test macro always has this value. -|=== - -=== New kernel launch property - -This extension adds a new kernel launch property: - -[source,c++] ----- -namespace sycl::ext::oneapi::experimental { - -struct record_event { - record_event(event* evt); (1) -}; -using record_event_key = record_event; - -} // namespace sycl::ext::oneapi::experimental ----- - -This property may be passed as a launch property to the following command -submission functions from -link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[ -sycl_ext_oneapi_enqueue_functions]: - -* The `submit` overload that takes parameters of type `queue` and `Properties`. -* The `single_task` overloads that take parameters of type `queue` and - `Properties`. -* The `parallel_for` overloads that take parameters of type `queue` and - `launch_config`. -* The `nd_launch` overloads that take parameters of type `queue` and - `launch_config`. -* The `memcpy` overload that takes parameters of type `queue` and `Properties`. -* The `copy` overload that takes parameters of type `queue` and `Properties`. -* The `memset` overload that takes parameters of type `queue` and `Properties`. -* The `fill` overload that takes parameters of type `queue` and `Properties`. -* The `prefetch` overload that takes parameters of type `queue` and - `Properties`. -* The `mem_advise` overload that takes parameters of type `queue` and - `Properties`. -* The `barrier` overload that takes parameters of type `queue` and `Properties`. -* The `partial_barrier` overload that takes parameters of type `queue` and - `Properties`. -* The `execute_graph` overload that takes parameters of type `queue` and - `Properties`. - -_Effects (1)_: Constructs a `record_event` property with a pointer to an `event` -object. -When `evt` is not null, the following happens. -The status of the event is disassociated with any previously submitted command, -and its status is reset to `info::event_command_status::submitted`. -For the `submit` function, this happens when the command group function returns -back to `submit`. -The event is then associated with the newly submitted command. -Assuming the event remains associated with this command, the event's status -changes according to the execution status of that command. -When `evt` is null, the property has no effect on the command submission. - -_Remarks:_ - -* If a recorded event is used as a command dependency for some other command - _C2_ (e.g. via `handler::depends_on`), the dependency is captured at the point - when _C2_ is submitted. - The dependency does _not_ change if the event is subsequently overwritten via - `record_event`. - -* If another host thread is blocked in a call to `event::wait` when that same - event is associated with a new command via `record_event`, it is unspecified - whether the call to `event:wait` unblocks. - - -== Example - -[source,c++] ----- -#include -namespace syclex = sycl::ext::oneapi::experimental; - -int main() { - sycl::queue q1; - sycl::queue q2; - sycl::event e; - sycl::range r{GLOBAL}; - - // Launch a command and record an event which tracks its completion. - syclex::launch_config cfg{r, syclex::record_event{&e}}; - syclex::parallel_for(q1, cfg, [=](sycl::item<> it) { /* ... */ }); - - // Launch another command which depends on that event and also - // record completion of this new command using the same event. - syclex::submit(q2, syclex::record_event{&e}, [&](sycl::handler cgh) { - cgh.depends_on(e); - syclex::parallel_for(cgh, r, [=](sycl::item<> it) { /* ... */ }); - }); - - // Wait for both commands to complete. - e.wait(); -} ----- - - -== Implementation notes - -It is expected that the implementation will often be able to reuse the -underlying backend event object when a SYCL event is passed to `record_event`. -However, there will still be cases when the implementation needs to release the -underlying backend event and create a new one. -For example, this will happen when the existing backend event is from a -different backend or from a different context than the command being submitted. -In these cases, we expect that the implementation will release the backend event -and associate the SYCL event with a new backend event. - - -== Issues - -* Is it possible to implement the behavior specified above regarding - `event::wait` and `record_event`? - What if the implementation needs to release the backend event when another - host thread is blocked in a call to `event:wait`? - Can we guarantee that the call to `event::wait` either remains blocked or - becomes unblocked? - (Either is fine.) - Or, is it possible that this will lead to a crash? - If a crash is possible, we need to weaken the specification to say this - condition is UB. diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc new file mode 100644 index 0000000000000..9b74ec6ff9a79 --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc @@ -0,0 +1,460 @@ += sycl_ext_oneapi_reusable_events + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] +:endnote: —{nbsp}end{nbsp}note + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2025 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 10 specification. +All references below to the "core SYCL specification" or to section numbers in +the SYCL specification refer to that revision. + +This extension also depends on the following other SYCL extensions: + +* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ + sycl_ext_oneapi_properties] + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. +Interfaces defined in this specification may not be implemented yet or may be in +a preliminary state. +The specification itself may also change in incompatible ways before it is +finalized. +*Shipping software products should not rely on APIs defined in this +specification.* + + +== Overview + +This extension adds the ability to reuse the same `event` object to track +completion of multiple command submissions, rather than creating a new event for +each submission. +This pattern may perform better on some implementations because fewer event +objects need to be created and destroyed. +This pattern may also be more familiar to users porting CUDA code to SYCL. + +This extension also adds a way to construct an event that enables profiling +timing without enabling profiling on the entire queue. +This is more efficient on some platforms because only a subset of the events are +required to contain timestamp information. +It is also more convenient for use in libraries when the library wants to get +timing information for some commands, but the library does not control the +construction of the queue. + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. +An implementation supporting this extension must predefine the macro +`SYCL_EXT_ONEAPI_REUSABLE_EVENTS` to one of the values defined in the table +below. +Applications can test for the existence of this macro to determine if the +implementation supports this feature, or applications can test the macro's value +to determine which of the extension's features the implementation supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + +=== New functions to create an event + +This extension adds the following factory functions which can create an event +with a specific context. + +''' + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +template +event make_event(const context &ctxt, PropertyListT props = {}); + +} // namespace sycl::ext::oneapi::experimental +---- + +_Constraints:_ + +* `PropertyListT` is one of the properties listed below in section "New property + for creating an event"; or +* `is_property_list_v` is `true` and contains no properties other + than those listed below in section "New property for creating an event". + +_Returns:_ An event that is associated with context `ctxt`. + +_Throws:_ An `exception` with the `errc::feature_not_supported` error code if +`PropertyListT` contains an `enable_profiling` property that enables profiling +timestamps and if the platform containing `context` does not support creation +of such events as reported by the `event_profiling` information descriptor. + +''' + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +template +event make_event(PropertyListT props = {}); + +} // namespace sycl::ext::oneapi::experimental +---- + +_Constraints:_ + +* `PropertyListT` is one of the properties listed below in section "New property + for creating an event"; or +* `is_property_list_v` is `true` and contains no properties other + than those listed below in section "New property for creating an event". + +_Effects:_ Equivalent to: + +[source,c++,indent=2] +---- +sycl::device d; +sycl::context ctxt = d.get_platform().khr_get_default_context(); +return sycl::ext::oneapi::experimental::make_event(ctxt, props); +---- + +''' + +=== New property for creating an event + +This extension adds the following property, which can be used with `make_event`: + +''' + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +struct enable_profiling { + enable_profiling(bool enable); (1) +}; +using enable_profiling_key = enable_profiling; + +} // namespace sycl::ext::oneapi::experimental +---- + +This property controls whether the event captures profiling timestamps. + +_Effects_ (1): Creates a new `enable_profiling` property with a boolean value +indicating whether the event captures profiling timestamp information. + +''' + +=== New information descriptor for the platform class + +This extension adds the following information descriptor that can be used as the +`Param` template parameter to `platform::get_info`. + +''' + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental::info::platform { + +struct event_profiling { + using return_type = bool; +}; + +} // namespace sycl::ext::oneapi::experimental::info::platform +---- + +_Remarks:_ Template parameter to `platform::get_info`. + +_Returns:_ The value `true` if this platform allows events to be created with +profiling enabled via `make_event`. + +''' + +=== New functions to enqueue event operations + +This extension adds the following free functions which submit operations related +to events onto a queue. + +''' + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +void enqueue_event_wait(queue q, const std::vector& evts); + +} // namespace sycl::ext::oneapi::experimental +---- + +_Effects:_ Enqueues a special barrier to `q` with the following semantic. +Any commands submitted to the queue after this barrier cannot begin execution +until all commands associated with `evts` have completed. + +_Remarks:_ The events in `evts` do _not_ need to have the same context as `q`. + +''' + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +void enqueue_event_signal(queue q, event& evt); + +} // namespace sycl::ext::oneapi::experimental +---- + +_Effects:_ The event `evt` is immediately disassociated with any previous +command, and its status is set to `info::event_command_status::submitted`. + +If the queue `q` is in-order (i.e. was constructed with +`property::queue::in_order`), this function enqueues a lightweight "tag" +operation that marks the current head of the queue. +The status of `evt` becomes `info::event_command_status::complete` when all +commands submitted prior to the tag have completed. + +If the queue `q` is out-of-order, it enqueues a command barrier, and any +commands submitted after this barrier cannot begin execution until all +previously submitted commands have completed. +After the barrier completes, a "tag" operation sets the status of `evt` to +`info::event_command_status::complete`. + +The event's timestamp information is also set if the queue `q` was created with +the `property::queue::enable_profiling` property or if the event `e` was created +with the `ext::oneapi::experimental::enable_profiling` property. +The event's `info::event_profiling::command_submit` timestamp reflects the time +at which `enqueue_event_signal` is called. +The event's `info::event_profiling::command_end` timestamp reflects the time at +which the event enters the "complete" state. +The event's `info::event_profiling::command_start` timestamp reflects the time +that the tag operation starts executing. +This timestamp is between the `info::event_profiling::command_submit` and +`info::event_profiling::command_end` timestamps. + +It is unspecified whether the event ever has the +`info::event_command_status::running` status. +Implementations are encouraged to transition the event directly from the +"submitted" status to the "complete" status and are encouraged to set the +"command_start" timestamp to the same value as the "command_end" timestamp. + +[_Note:_ In order to understand why the "command_start" and "command_end" +timestamps are encouraged to be the same, think of the tag operation as an empty +kernel with an implicit set of dependencies on all previous commands in the +same queue. +This theoretical kernel starts executing when the dependencies are resolved. +Since the kernel is empty, the end time is the same as the start time. +The "command_start" and "command_end" timestamps are not required to be the +same, though, in order to accommodate an implementation where the tag operation +is implemented by submitting an actual kernel, which has non-zero execution +time. +_{endnote}_] + +_Throws:_ An `exception` with the `errc::invalid` error code if `evt` and `q` +don't have the same context. + +=== Interaction with other event APIs + +An event _E_ created via `make_event` can be used as a command dependency (e.g. +via `handler::depends_on`) for a command submitted to some queue _Q_. +It is _not_ necessary for the context of _E_ to match the context of _Q_. + +If an event _E_ is used as a command dependency for some command _C_ (e.g. via +`handler::depends_on`), the dependency is captured at the point when _C_ is +submitted. +It is legal to reassociated the event _E_ to a new command via +`enqueue_event_signal` even before command _C_ completes. +Doing so does _not_ change the dependency for command _C_. + +If another host thread is blocked waiting for event _E_ to complete via +`event:wait` or `event::wait_and_throw` when event _E_ is reassociated with a +new command via `enqueue_event_signal`, the behavior of the `event:wait` or +`event::wait_and_throw` call is undefined. + + +== Examples + +=== Recording cross queue dependencies + +[source,c++] +---- +#include +namespace syclex = sycl::ext::oneapi::experimental; + +static constexpr size_t N = 1024; + +int main() { + sycl::queue q1{sycl::property::queue::in_order{}}; + sycl::queue q2{sycl::property::queue::in_order{}}; + sycl::event e = syclex::make_event(); + + // Launch a kernel on `q1` and then signal an event when the kernel completes. + syclex::parallel_for(q1, {N}, [=](sycl::item<> it) { /* ... */ }); + syclex::enqueue_event_signal(q1, e); + + // Add a dependency on `q2` which waits for the kernel on `q1` to complete. + // Then launch a kernel on `q2`. + syclex::enqueue_event_wait(q2, {e}); + syclex::parallel_for(q2, {N}, [=](sycl::item<> it) { /* ... */ }); + + // Reassociate the same event with the kernel on `q2`. + syclex::enqueue_event_signal(q2, e); + + // Wait for both commands to complete. + e.wait(); +} +---- + +=== Timing a sequence of kernels + +[source,c++] +---- +#include +#include +namespace syclex = sycl::ext::oneapi::experimental; + +static constexpr size_t N = 1024; + +int main() { + sycl::queue q; + + // This example creates a queue that does not enable profiling and then + // creates events that do enable profiling. This is an optional feature, so + // check if the platform supports this. + sycl::platform p = q.get_platform(); + if (!p.get_info) { + std::cout << "Cannot time kernels without enabling profiling on queue\n"; + return; + } + + sycl::event start = syclex::make_event(syclex::enable_profiling{true}); + sycl::event end = syclex::make_event(syclex::enable_profiling{true}); + + syclex::enqueue_event_signal(q, start); + sycl::parallel_for(q, {N}, [=](auto i) { /* first kernel */ }); + sycl::parallel_for(q, {N}, [=](auto i) { /* second kernel */ }); + syclex::enqueue_event_signal(q, end); + + q.wait(); + + uint64_t elapsed = + end.get_profiling_info() - + start.get_profiling_info(); + std::cout << "Execution time: " << elapsed << " (nanoseconds)\n"; +} +---- + + +== Implementation notes + +=== Mapping on Level Zero + +The APIs in this extension have a straightforward mapping to Level Zero when +using counter-based events: + +* The `make_event` function maps to `zeEventPoolCreate` and `zeEventCreate`. + If the `enable_profiling` property is specified to `make_event`, the event + should be created from a pool that has the + `ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP` flag. +* Platforms on the Level Zero backend can return `true` for the + `event_profiling` information descriptor. +* The `enqueue_event_wait` function maps to `zeCommandListAppendWaitOnEvents`. +* The `enqueue_event_signal` function maps to `zeCommandListAppendSignalEvent`. +* The SYCL `event` passed to `enqueue_event_signal` will contain a + `ze_event_handle_t`. + Each call to `zeCommandListAppendSignalEvent` can pass this same + `ze_event_handle_t`, thus reusing the backend event across many calls. + +However, standard events to not directly map because a standard Level Zero event +_E_ cannot be passed to `zeCommandListAppendSignalEvent` until all previous +"signal" operations on _E_ have completed and all previous commands using _E_ +as a "wait event" (i.e. dependency) have completed. +To compensate for this, the runtime can track the lifetime of the backend event. +If the backend event associated with `enqueue_event_signal` is still in use, +the runtime can disassociate that backend event from the SYCL event, and +allocate a new backend event. + +=== Mapping on OpenCL + +The mapping is not so straightforward for OpenCL because OpenCL APIs return an +event when a command is submitted, rather than taking an event as input. + +* The `make_event` function has no direct mapping to OpenCL. + Instead, this function just creates SYCL `event` object with no underlying + OpenCL event. +* Platforms on the OpenCL backend are expected to return `false` for the + `event_profiling` information descriptor, unless we create some OpenCL + extension that makes this possible. +* The `enqueue_event_wait` function maps to either `clEnqueueMarkerWithWaitList` + (for in-order queues) or to `clEnqueueBarrierWithWaitList` (for out-of-order + queues). + In either case, the output `event` parameter should be NULL. + It would also be legal to call `clEnqueueBarrierWithWaitList` for both + in-order and out-of-order queues. +* The `enqueue_event_signal` function also maps to either + `clEnqueueMarkerWithWaitList` or `clEnqueueBarrierWithWaitList`, but in this + case the input `event_wait_list` parameter is empty and the output `event` + parameter is non-NULL. + These OpenCL functions return an output `cl_event`, and the SYCL runtime will + store this in the SYCL `event`. +* Because OpenCL is unable to reuse a `cl_event`, subsequent calls to + `enqueue_event_signal` first check to see if there is a `cl_event` from a + previous call attached to the SYCL `event`. + If so, the `cl_event` is released before calling `clEnqueueMarkerWithWaitList` + or `clEnqueueBarrierWithWaitList`. + +=== Host tasks + +Because host tasks are executed by the SYCL runtime, there can be cases where +a command _C_ is submitted at the SYCL level, but the command remains pending +inside the SYCL runtime until a host task completes. +(E.g. when command _C_ has a dependency on the host task.) +As a result, there may be cases when `enqueue_event_signal` must also leave the +"event signal" operation pending in the SYCL runtime, or when +`enqueue_event_wait` must leave the "event wait" operation pending in the SYCL +runtime. +In these cases, we expect that a backend event may not be associated with the +SYCL event until the pending operations are resolved in the runtime library. +This will likely cause the handling of events to be less efficient when host +tasks are submitted to the same queue as "native" commands like kernels or +copy operations, or when there are dependencies between host tasks and native +commands. From b6df113641d9cbd4e44920622210916c79021cdf Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 31 Oct 2025 09:33:24 -0400 Subject: [PATCH 3/5] Clarify overview --- .../proposed/sycl_ext_oneapi_reusable_events.asciidoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc index 9b74ec6ff9a79..6d26e6e5baa32 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc @@ -64,6 +64,9 @@ specification.* This extension adds the ability to reuse the same `event` object to track completion of multiple command submissions, rather than creating a new event for each submission. +An `event` object is still only able to track one command at a time, but this +extension allows an `event` to track a new command once the application no +longer needs to track the previous one. This pattern may perform better on some implementations because fewer event objects need to be created and destroyed. This pattern may also be more familiar to users porting CUDA code to SYCL. From a923a128046c760757d1eef2d38f07ddadcb7602 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 12 Nov 2025 15:48:26 -0500 Subject: [PATCH 4/5] Change names to align with KHR proposal I've been sketching out how reusable events can be added to the proposed KHR sycl_khr_free_function_commands. Align the names here with the ones I plan to propose in the KHR. --- .../sycl_ext_oneapi_reusable_events.asciidoc | 46 ++++++++++--------- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc index 6d26e6e5baa32..dcdbdb6fd8f84 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc @@ -37,7 +37,7 @@ https://github.com/intel/llvm/issues == Dependencies -This extension is written against the SYCL 2020 revision 10 specification. +This extension is written against the SYCL 2020 revision 11 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision. @@ -225,16 +225,18 @@ to events onto a queue. ---- namespace sycl::ext::oneapi::experimental { -void enqueue_event_wait(queue q, const std::vector& evts); +void enqueue_wait_event(queue q, const event& evt); +void enqueue_wait_events(queue q, const std::vector& evts); } // namespace sycl::ext::oneapi::experimental ---- _Effects:_ Enqueues a special barrier to `q` with the following semantic. Any commands submitted to the queue after this barrier cannot begin execution -until all commands associated with `evts` have completed. +until all commands associated with `evt` or `evts` have completed. -_Remarks:_ The events in `evts` do _not_ need to have the same context as `q`. +_Remarks:_ The event `evt` and the events in `evts` do _not_ need to have the +same context as `q`. ''' @@ -242,7 +244,7 @@ _Remarks:_ The events in `evts` do _not_ need to have the same context as `q`. ---- namespace sycl::ext::oneapi::experimental { -void enqueue_event_signal(queue q, event& evt); +void enqueue_signal_event(queue q, event& evt); } // namespace sycl::ext::oneapi::experimental ---- @@ -266,7 +268,7 @@ The event's timestamp information is also set if the queue `q` was created with the `property::queue::enable_profiling` property or if the event `e` was created with the `ext::oneapi::experimental::enable_profiling` property. The event's `info::event_profiling::command_submit` timestamp reflects the time -at which `enqueue_event_signal` is called. +at which `enqueue_signal_event` is called. The event's `info::event_profiling::command_end` timestamp reflects the time at which the event enters the "complete" state. The event's `info::event_profiling::command_start` timestamp reflects the time @@ -305,12 +307,12 @@ If an event _E_ is used as a command dependency for some command _C_ (e.g. via `handler::depends_on`), the dependency is captured at the point when _C_ is submitted. It is legal to reassociated the event _E_ to a new command via -`enqueue_event_signal` even before command _C_ completes. +`enqueue_signal_event` even before command _C_ completes. Doing so does _not_ change the dependency for command _C_. If another host thread is blocked waiting for event _E_ to complete via `event:wait` or `event::wait_and_throw` when event _E_ is reassociated with a -new command via `enqueue_event_signal`, the behavior of the `event:wait` or +new command via `enqueue_signal_event`, the behavior of the `event:wait` or `event::wait_and_throw` call is undefined. @@ -332,15 +334,15 @@ int main() { // Launch a kernel on `q1` and then signal an event when the kernel completes. syclex::parallel_for(q1, {N}, [=](sycl::item<> it) { /* ... */ }); - syclex::enqueue_event_signal(q1, e); + syclex::enqueue_signal_event(q1, e); // Add a dependency on `q2` which waits for the kernel on `q1` to complete. // Then launch a kernel on `q2`. - syclex::enqueue_event_wait(q2, {e}); + syclex::enqueue_wait_event(q2, e); syclex::parallel_for(q2, {N}, [=](sycl::item<> it) { /* ... */ }); // Reassociate the same event with the kernel on `q2`. - syclex::enqueue_event_signal(q2, e); + syclex::enqueue_signal_event(q2, e); // Wait for both commands to complete. e.wait(); @@ -372,10 +374,10 @@ int main() { sycl::event start = syclex::make_event(syclex::enable_profiling{true}); sycl::event end = syclex::make_event(syclex::enable_profiling{true}); - syclex::enqueue_event_signal(q, start); + syclex::enqueue_signal_event(q, start); sycl::parallel_for(q, {N}, [=](auto i) { /* first kernel */ }); sycl::parallel_for(q, {N}, [=](auto i) { /* second kernel */ }); - syclex::enqueue_event_signal(q, end); + syclex::enqueue_signal_event(q, end); q.wait(); @@ -400,9 +402,9 @@ using counter-based events: `ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP` flag. * Platforms on the Level Zero backend can return `true` for the `event_profiling` information descriptor. -* The `enqueue_event_wait` function maps to `zeCommandListAppendWaitOnEvents`. -* The `enqueue_event_signal` function maps to `zeCommandListAppendSignalEvent`. -* The SYCL `event` passed to `enqueue_event_signal` will contain a +* The `enqueue_wait_event` function maps to `zeCommandListAppendWaitOnEvents`. +* The `enqueue_signal_event` function maps to `zeCommandListAppendSignalEvent`. +* The SYCL `event` passed to `enqueue_signal_event` will contain a `ze_event_handle_t`. Each call to `zeCommandListAppendSignalEvent` can pass this same `ze_event_handle_t`, thus reusing the backend event across many calls. @@ -412,7 +414,7 @@ _E_ cannot be passed to `zeCommandListAppendSignalEvent` until all previous "signal" operations on _E_ have completed and all previous commands using _E_ as a "wait event" (i.e. dependency) have completed. To compensate for this, the runtime can track the lifetime of the backend event. -If the backend event associated with `enqueue_event_signal` is still in use, +If the backend event associated with `enqueue_signal_event` is still in use, the runtime can disassociate that backend event from the SYCL event, and allocate a new backend event. @@ -427,20 +429,20 @@ event when a command is submitted, rather than taking an event as input. * Platforms on the OpenCL backend are expected to return `false` for the `event_profiling` information descriptor, unless we create some OpenCL extension that makes this possible. -* The `enqueue_event_wait` function maps to either `clEnqueueMarkerWithWaitList` +* The `enqueue_wait_event` function maps to either `clEnqueueMarkerWithWaitList` (for in-order queues) or to `clEnqueueBarrierWithWaitList` (for out-of-order queues). In either case, the output `event` parameter should be NULL. It would also be legal to call `clEnqueueBarrierWithWaitList` for both in-order and out-of-order queues. -* The `enqueue_event_signal` function also maps to either +* The `enqueue_signal_event` function also maps to either `clEnqueueMarkerWithWaitList` or `clEnqueueBarrierWithWaitList`, but in this case the input `event_wait_list` parameter is empty and the output `event` parameter is non-NULL. These OpenCL functions return an output `cl_event`, and the SYCL runtime will store this in the SYCL `event`. * Because OpenCL is unable to reuse a `cl_event`, subsequent calls to - `enqueue_event_signal` first check to see if there is a `cl_event` from a + `enqueue_signal_event` first check to see if there is a `cl_event` from a previous call attached to the SYCL `event`. If so, the `cl_event` is released before calling `clEnqueueMarkerWithWaitList` or `clEnqueueBarrierWithWaitList`. @@ -451,9 +453,9 @@ Because host tasks are executed by the SYCL runtime, there can be cases where a command _C_ is submitted at the SYCL level, but the command remains pending inside the SYCL runtime until a host task completes. (E.g. when command _C_ has a dependency on the host task.) -As a result, there may be cases when `enqueue_event_signal` must also leave the +As a result, there may be cases when `enqueue_signal_event` must also leave the "event signal" operation pending in the SYCL runtime, or when -`enqueue_event_wait` must leave the "event wait" operation pending in the SYCL +`enqueue_wait_event` must leave the "event wait" operation pending in the SYCL runtime. In these cases, we expect that a backend event may not be associated with the SYCL event until the pending operations are resolved in the runtime library. From 3af50047cdad8a4983e1c788fcb58846a18c0022 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 13 Nov 2025 15:03:51 -0500 Subject: [PATCH 5/5] Implementation note on counter vs. standard events --- .../sycl_ext_oneapi_reusable_events.asciidoc | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc index dcdbdb6fd8f84..f5fced024e7d3 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc @@ -409,7 +409,7 @@ using counter-based events: Each call to `zeCommandListAppendSignalEvent` can pass this same `ze_event_handle_t`, thus reusing the backend event across many calls. -However, standard events to not directly map because a standard Level Zero event +However, standard events do not directly map because a standard Level Zero event _E_ cannot be passed to `zeCommandListAppendSignalEvent` until all previous "signal" operations on _E_ have completed and all previous commands using _E_ as a "wait event" (i.e. dependency) have completed. @@ -418,6 +418,31 @@ If the backend event associated with `enqueue_signal_event` is still in use, the runtime can disassociate that backend event from the SYCL event, and allocate a new backend event. +A counter-based event, however, cannot be used to signal a command in an +out-of-order queue. +Therefore, the implementation currently uses standard events for out-of-order +queues and counter-based events for in-order queues. +This presents a problem for the `make_event` function because we do not know +whether the event will be used with an in-order vs. an out-of-order queue at the +point when it is created. +One option is to require the application to pass a property to `make_event` +telling whether the event will be used to signal an in-order vs. an out-of-order +queue. +Doing this is less convenient for the user, though. +Instead, `make_event` can always create a counter-based event when the backend +is Level Zero. +If the application later uses the SYCL event to signal a command from an +out-of-order queue, the implementation can release the backend counter-based +event and allocate a standard-event instead. +This optimizes for the case of in-order queues and adds a bit of overhead to +the case when the queue is out-of-order. +However, this is consistent with our strategy to favor in-order queues as the +optimal path. +Alternatively, `make_event` could simply not allocate any backend event, +delaying this until the first time the event is used to signal a command. +At that point, we do know whether the event will signal an out-of-order vs. an +in order queue, so we can create the right type of backend event. + === Mapping on OpenCL The mapping is not so straightforward for OpenCL because OpenCL APIs return an