Skip to content

Towards unified device & host tasks#299

Open
devreal wants to merge 8 commits intoTESSEorg:masterfrom
devreal:device-task-for-host-master
Open

Towards unified device & host tasks#299
devreal wants to merge 8 commits intoTESSEorg:masterfrom
devreal:device-task-for-host-master

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Oct 28, 2024

We now template the ttg::device::Task on the ExecutionSpace so that we can determine whether it's a host or device task based on the space. We can then optimize away the select and kernel-wait suspension points. We could remove the send suspension point but we use coroutines for storing the final sends anyway and we don't have access to the task return type in ttg::device::send().

This allows tasks to be written once for both host and device without duplicating much of the code. Host tasks that are not coroutines will continue to be supported.

@devreal devreal requested a review from evaleev October 28, 2024 21:16
@devreal
Copy link
Contributor Author

devreal commented Oct 28, 2024

I should also mention that this could allow us to take multiple callables in ttg::make_tt and identify their execution space by the return type, laying the ground work for multi-implementation tasks in the future.

@evaleev
Copy link
Contributor

evaleev commented Oct 29, 2024

I think it looks fine. The only concern is the terminology ... dev-host is awkward ... perhaps this is the right time to revise the terminology (e.g. ttg::device::Task<> -> ttg::coTask<>?)

@devreal devreal force-pushed the device-task-for-host-master branch from b31e2e1 to 491b6c5 Compare November 8, 2024 02:26
We now template the `ttg::device::Task` on the `ExecutionSpace` so that
we can determine whether it's a host or device task based on the space.
We can then optimize away the select and kernel-wait suspension points.
We could remove the send suspension point but we use coroutines for
storing the final sends anyway and we don't have access to the
task return type in `ttg::device::send()`.

This allows tasks to be written once for both host and device without
duplicating much of the code. Host tasks that are not coroutines will
continue to be supported.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
@devreal devreal force-pushed the device-task-for-host-master branch from 491b6c5 to 91e1d3d Compare November 8, 2024 02:54
@devreal
Copy link
Contributor Author

devreal commented Nov 8, 2024

perhaps this is the right time to revise the terminology (e.g. ttg::device::Task<> -> ttg::coTask<>?)

I'm not sure about that. We probably don't want to make coroutine tasks the default. So what are they? For me, these tasks are still device tasks but now we have the ability to execute the same task structure on the host. I agree that the naming is a bit funny though.

@therault
Copy link
Contributor

therault commented Nov 8, 2024

AsyncTask? Just to make it independent of the technique/implementation used, and try to convey that it can be used for something else than devices?

That being said, coTask (or co_task to keep a coroutine-like syntax?) conveys the idea that the task itself is a coroutine, which is maybe more helpful for the user.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
@evaleev
Copy link
Contributor

evaleev commented Nov 8, 2024

I'm not sure about that. We probably don't want to make coroutine tasks the default. So what are they? For me, these tasks are still device tasks but now we have the ability to execute the same task structure on the host. I agree that the naming is a bit funny though.

I agree we don't want to make coTask the default. But it's extremely confusing to refer to them as device task. I would prefer a name that does not refer to device to avoid misleading ppl.

@evaleev
Copy link
Contributor

evaleev commented Nov 8, 2024

AsyncTask? Just to make it independent of the technique/implementation used, and try to convey that it can be used for something else than devices?

To me the word "task" already implies "asynchrony" of execution, by decoupling statement of what to do from when to do it (i.e. after submitting task there is no way to ensure that work has completed unless you await for the result/completion). So AsyncTask sounds a bit redundant to me.

That being said, coTask (or co_task to keep a coroutine-like syntax?) conveys the idea that the task itself is a coroutine, which is maybe more helpful for the user.

I like co_task ... and we are not the only ones who thought of it: e.g. https://llvm.org/devmtg/2024-04/slides/QuickTalks/Saxena-MitigatingLifetimeIssues.pdf ... google c++ "co_task" for more.

Perhaps what we need is to make cotasks and tasks more symmetric?

using Task = void;

make_tt([]() -> Task, ...)

Device ID alone does not uniquely identify a device.
The host always has ID 0.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Host tasks only suspend at the very end. We perfom all
communication and then destroy the coroutine handle because
there is no reason to keep it around. This may enable compiler
optimizations and enables backends that otherwise do not handle
device tasks to work with host-enabled device tasks.

Yes, this needs renaming.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
@devreal
Copy link
Contributor Author

devreal commented Dec 11, 2024

Proposal from the EPEXA meeting: ttg::CoTask<ttg::ExecutionSpace::CUDA> for CUDA or ttg::CoTask<ttg::ExecutionSpace::Host> for host execution.

This was decided at the December '24 EPEXA meeting. We need to figure out
what to do with the existing resumable task.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
#include "chrono.h"

#if defined(TTG_HAVE_CUDA)
#if defined(CHAIN_CUDA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between CHAIN_CUDA/HIP and ENABLE_CUDA/HIP?

#include "../devblas_helper.h"

#if (defined(TTG_ENABLE_CUDA) || defined(TTG_ENABLE_HIP))
#if (defined(TTG_ENABLE_CUDA) || defined(TTG_ENABLE_HIP) || defined(TTG_ENABLE_DEV_HOST))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is inconsistent, TTG_ENABLE_CPU ?


#include "cuda_kernel.h"

#if defined(TTG_HAVE_CUDA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be TTG_ENABLE_CUDA, or CHAIN_CUDA?

template <typename Key, ttg::Runtime Runtime = ttg::ttg_runtime>
inline detail::send_t sendk(std::size_t i, const Key& key) {
auto *terminal_ptr = ttg::detail::get_out_terminal<Key, void>(i, "ttg::device::send(i, key, value)");
return detail::send_t{detail::sendk_coro(key, *terminal_ptr)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key parameter moves from 2nd to first parameter depending on level.

@evaleev evaleev added this to the 1.0 milestone May 23, 2025
@evaleev evaleev force-pushed the master branch 4 times, most recently from a2878ce to a49eb91 Compare November 24, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants