Skip to content

Have there been efforts regarding a cblas_sgemm_strided_batched API? #5447

@markusheimerl

Description

@markusheimerl

See https://github.com/markusheimerl/attention/blob/1316628d97b4a626725607c0a9973f5206e7277b/attention.c#L208

I would like to ask whether there have been any efforts to introduce a cblas_sgemm_strided_batched API in OpenBLAS.

A strided batched API would make it possible to parallelize along the batch dimension in scenarios where the core computation is fundamentally two-dimensional, and the batch dimension effectively introduces a third axis.

At the moment, I am working around this by placing an OpenMP parallel for loop around the batch dimension. However, this requires setting OpenBLAS threads to 1 and OpenMP threads to 4 (instead of simply setting OpenBLAS threads to 4), which feels like an unnecessary and somewhat hacky configuration.

An official API for strided batched GEMM would simplify this workflow significantly and provide a cleaner, more robust solution.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions