Problem: The current approach using libxsmm_gemm_generator can only generate GEMMs with alpha = +/-1. If YATeTo encounters a GEMM with |alpha| != 1, it falls back to default code (nested for loops), which is not performant.
Solution: Use the new libxsmm interface libxsmm_?gemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);
Problem: The current approach using
libxsmm_gemm_generatorcan only generate GEMMs with alpha = +/-1. If YATeTo encounters a GEMM with |alpha| != 1, it falls back to default code (nested for loops), which is not performant.Solution: Use the new libxsmm interface
libxsmm_?gemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);