-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
There are some backend / hardware combinations where a launch kernel doing a simple 1d operations, like a[i] = 2 * b[i], is faster than the equivalent gtensor expression, a = 2 * b. This needs to be explored further and optimization techniques considered, and possibly reproducers sent to GPU vendors (ideally with a port to the underlying GPU vendor programming model not using all of gtensor, when possible).
There are also potential issues when the size of the array dimensions are not multiples of the warp size of the underlying architecture.
See #248 which adds benchmarks for exploring this.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed