-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
bugSomething isn't workingSomething isn't working
Description
How is this issue impacting you?
Application hang
Share Your Debug Logs
Operation log as follows:
log.txt
Steps to Reproduce the Issue
Minimal Steps:
cd nvshmem-3.4.5-0
cmake -DCUDA_ARCHITECTURES="75" -B build -S .
cd build
make -j8
cd build-install/bin/perftest/device/coll
nvshmrun -n 4 -hosts node1,node2 ./reduction_latency -w 0 -n 1 -c 1 -t 32 -b 8 -e 16
Environment Details:
NVSHMEM_IBGDA_SUPPORT=1
NVSHMEM_DISABLE_MNNVL=true
NVSHMEM_BUILD_EXAMPLES=1
NVSHMEM_MPI_SUPPORT=0
NVSHMEM_IBGDA_NIC_HANDLER=gpu
NVSHMEM_IB_DISABLE_DMABUF=1
NVSHMEM_USE_GDRCOPY=0
NVSHMEM_NVTX=0
NVSHMEM_DISABLE_P2P=1
NVSHMEM_BUILD_TESTS=1
NVSHMEM_IB_ENABLE_IBGDA=0
NVSHMEM_DEBUG_SUBSYS=ALL
NVSHMEM_DEVICELIB_CUDA_HOME=/usr/local/cuda
NVSHMEM_PREFIX=/opt/nvshmem-3.4.5-0/build-install
NVSHMEM_IBRC_SUPPORT=1
NVSHMEM_DISABLE_CUDA_VMM=true
NVSHMEM_DEBUG=INFO
Intermittency: everytime
Previous Success: no, also failed in nvshmem-3.2.5
NVSHMEM Version
nvshmem-3.4.5-0 + cuda12.6
Your platform details
Error Message & Behavior
[nvshmem-3.4.5-0/perftest/device/coll/reduction_latency.cu:253] cuda failed with an illegal memory access was encountered
[nvshmem-3.4.5-0/perftest/device/coll/reduction_latency.cu:253] cuda failed with an illegal memory access was encountered
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working