Skip to content

micro benchmarking assign expressions on all platforms #249

@bd4

Description

@bd4

There are some backend / hardware combinations where a launch kernel doing a simple 1d operations, like a[i] = 2 * b[i], is faster than the equivalent gtensor expression, a = 2 * b. This needs to be explored further and optimization techniques considered, and possibly reproducers sent to GPU vendors (ideally with a port to the underlying GPU vendor programming model not using all of gtensor, when possible).

There are also potential issues when the size of the array dimensions are not multiples of the warp size of the underlying architecture.

See #248 which adds benchmarks for exploring this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions