micro benchmarking assign expressions on all platforms

There are some backend / hardware combinations where a launch kernel doing a simple 1d operations, like `a[i] = 2 * b[i]`, is faster than the equivalent gtensor expression, `a = 2 * b`. This needs to be explored further and optimization techniques considered, and possibly reproducers sent to GPU vendors (ideally with a port to the underlying GPU vendor programming model not using all of gtensor, when possible).

There are also potential issues when the size of the array dimensions are not multiples of the warp size of the underlying architecture.

See #248 which adds benchmarks for exploring this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

micro benchmarking assign expressions on all platforms #249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

micro benchmarking assign expressions on all platforms #249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions