Skip to content

Multiplication of StridedMaybeAdjOrTransMat broken for certain matrix sizes #442

@leios

Description

@leios

If the size of the array is ~ 10, then a' * a works fine.

julia> using oneAPI

julia> rand_array = rand(Float32, 10, 2);

julia> one_array = oneArray(rand_array);

julia> rand_array' * rand_array
2×2 Matrix{Float32}:
 3.73734  2.68277
 2.68277  3.32426

julia> one_array' * one_array
2×2 oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}:
 3.73734  2.68277
 2.68277  3.32426

If it is 100, it fails:

julia> rand_array = rand(Float32, 100, 2);

julia> rand_array' * rand_array
2×2 Matrix{Float32}:
 32.107   24.3659
 24.3659  32.234

julia> one_array = oneArray(rand_array);

julia> one_array' * one_array
2×2 oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}:
 0.0  0.0
 0.0  0.0

It seems to be calling this function in LinearAlgebra/matmul.jl:

function (*)(A::StridedMaybeAdjOrTransMat{<:BlasReal}, B::StridedMaybeAdjOrTransMat{<:BlasReal})
    TS = promote_type(eltype(A), eltype(B))
    mul!(similar(B, TS, (size(A, 1), size(B, 2))),
         wrapperop(A)(convert(AbstractArray{TS}, _unwrap(A))),
         wrapperop(B)(convert(AbstractArray{TS}, _unwrap(B))))
end

segfault on close:

[982661] signal (11.128): Segmentation fault
in expression starting at none:0
_ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE16processResidencyERKSt6vectorIPNS_18GraphicsAllocationESaIS5_EEj at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE13flushInternalERKNS_11BatchBufferERKSt6vectorIPNS_18GraphicsAllocationESaIS8_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE5flushERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS7_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO21CommandStreamReceiver17submitBatchBufferERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS5_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L015CommandQueueImp17submitBatchBufferEmRSt6vectorIPN3NEO18GraphicsAllocationESaIS4_EEPvb at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY18EE26executeCommandListsRegularERNS2_27CommandListExecutionContextEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tP18_ze_event_handle_tjPSB_ at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY18EE19executeCommandListsEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tbP18_ze_event_handle_tjPS9_ at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L033zeCommandQueueExecuteCommandListsEP26_ze_command_queue_handle_tjPP25_ze_command_list_handle_tP18_ze_fence_handle_t at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN18ur_queue_handle_t_18executeCommandListENSt3__119__hash_map_iteratorINS0_15__hash_iteratorIPNS0_11__hash_nodeINS0_17__hash_value_typeIP25_ze_command_list_handle_t22ur_command_list_info_tEEPvEEEEEEbb at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
_ZN18ur_queue_handle_t_26executeAllOpenCommandListsEv at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
urQueueRelease at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
piQueueRelease at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
_ZNK4sycl3_V16detail6plugin12call_nocheckILNS1_9PiApiKindE26EJP9_pi_queueEEE10_pi_resultDpT0_ at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail10queue_implD2Ev at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libsycl.so.7 (unknown line)
_M_release at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:161 [inlined]
~__shared_count at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:712 [inlined]
~__shared_ptr at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:1151 [inlined]
~queue at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/sycl/queue.hpp:119 [inlined]
~syclQueue_st at /workspace/srcdir/oneAPI.jl/deps/src/sycl.hpp:19 [inlined]
syclQueueDestroy at /workspace/srcdir/oneAPI.jl/deps/src/sycl.cpp:60
syclQueueDestroy at /home/u222842/projects/oneAPI.jl/lib/support/liboneapi_support.jl:58 [inlined]
#7 at /home/u222842/projects/oneAPI.jl/lib/sycl/SYCL.jl:74
unknown function (ip: 0x7faf4714b085)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
run_finalizer at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:454
ijl_atexit_hook at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/init.c:299
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:732
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7faf5ed83d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 14926321 (Pool: 14908017; Big: 18304); GC: 23
Segmentation fault (core dumped)

I am using the intel devcloud for this and

pbsnodes | grep -B4 gpu
---
s019-n010
     state = job-exclusive
     power_state = Running
     np = 2
     properties = core,tgl,i9-11900kb,ram32gb,netgbe,gpu,gen11
---

This seems related to the issue I have been having with JuliaGPU/GPUArrays.jl#525

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions