Have there been efforts regarding a cblas_sgemm_strided_batched API?

See https://github.com/markusheimerl/attention/blob/1316628d97b4a626725607c0a9973f5206e7277b/attention.c#L208

I would like to ask whether there have been any efforts to introduce a `cblas_sgemm_strided_batched` API in OpenBLAS.

A strided batched API would make it possible to parallelize along the batch dimension in scenarios where the core computation is fundamentally two-dimensional, and the batch dimension effectively introduces a third axis.

At the moment, I am working around this by placing an OpenMP `parallel for` loop around the batch dimension. However, this requires setting OpenBLAS threads to 1 and OpenMP threads to 4 (instead of simply setting OpenBLAS threads to 4), which feels like an unnecessary and somewhat hacky configuration.

An official API for strided batched GEMM would simplify this workflow significantly and provide a cleaner, more robust solution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Have there been efforts regarding a cblas_sgemm_strided_batched API? #5447

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Have there been efforts regarding a cblas_sgemm_strided_batched API? #5447

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions