-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Problem Description
We followed the instruction to install ANP Plugin v1.1.0-2 from GitHub. The error occurs during the "make" process, and the details are as follows:
$ make RCCL_BUILD=$RCCL_BUILD MPI_INCLUDE=$MPI_INCLUDE MPI_LIB_PATH=$MPI_LIB_PATH
/opt/rocm/bin/hipcc -fPIC -g -O2 -O3 -DNDEBUG -Werror -MMD -MP -DTARGET_PLUGIN -DNCCL_BUILD_RDMA_CORE -UANP_TELEMETRY_ENABLED -D__HIP_PLATFORM_AMD__ -Iinclude -I/opt/rocm/include -I/usr/include -I/home/ubuntu/github/rccl/build/include -I/home/ubuntu/github/rccl/build/hipify/src -I/home/ubuntu/github/rccl/build/hipify/src/include -I/opt/ompi/include -c src/net_ib.cc -o build/src/net_ib.o
src/net_ib.cc:46:9: error: 'NCCL_NET_OPTIONAL_RECV_COMPLETION' macro redefined [-Werror,-Wmacro-redefined]
46 | #define NCCL_NET_OPTIONAL_RECV_COMPLETION (void *)0x1
| ^
/home/ubuntu/github/rccl/build/hipify/src/include/nccl_net.h:18:9: note: previous definition is here
18 | #define NCCL_NET_OPTIONAL_RECV_COMPLETION 0x1
| ^
src/net_ib.cc:56:9: error: 'NCCL_NET_PLUGIN_SYMBOL' macro redefined [-Werror,-Wmacro-redefined]
56 | #define NCCL_NET_PLUGIN_SYMBOL ncclNetPlugin_v8
| ^
/home/ubuntu/github/rccl/build/hipify/src/include/nccl_net.h:120:9: note: previous definition is here
120 | #define NCCL_NET_PLUGIN_SYMBOL ncclNetPlugin_v9
| ^
src/net_ib.cc:2732:14: error: cannot initialize a member subobject of type 'ncclResult_t (*)(void *, void *, size_t, int, void *, void **)' (aka 'ncclResult_t (*)(void *, void *, unsigned long, int, void *, void **)') with an lvalue of type 'ncclResult_t (void *, void *, int, int, void *, void **)': type mismatch at 3rd parameter ('size_t' (aka 'unsigned long') vs 'int')
2732 | .isend = anpNetIsend,
| ^~~~~~~~~~~
src/net_ib.cc:2733:14: error: cannot initialize a member subobject of type 'ncclResult_t (*)(void *, int, void **, size_t *, int *, void **, void **)' (aka 'ncclResult_t (*)(void *, int, void **, unsigned long *, int *, void **, void **)') with an lvalue of type 'ncclResult_t (void *, int, void **, int *, int *, void **, void **)': type mismatch at 4th parameter ('size_t *' (aka 'unsigned long *') vs 'int *')
2733 | .irecv = anpNetIrecv,
| ^~~~~~~~~~~
4 errors generated when compiling for gfx942.
failed to execute:/opt/rocm-6.4.0/lib/llvm/bin/clang++ --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 -fPIC -g -O2 -O3 -DNDEBUG -Werror -MMD -MP -DTARGET_PLUGIN -DNCCL_BUILD_RDMA_CORE -UANP_TELEMETRY_ENABLED -D__HIP_PLATFORM_AMD__ -Iinclude -I/opt/rocm/include -I/usr/include -I/home/ubuntu/github/rccl/build/include -I/home/ubuntu/github/rccl/build/hipify/src -I/home/ubuntu/github/rccl/build/hipify/src/include -I/opt/ompi/include -c -x hip src/net_ib.cc -o "build/src/net_ib.o"
make: *** [Makefile:77: build/src/net_ib.o] Error 1
Can you please investigate this issue for AMD-ANP setup?
Thank you.
Operating System
Ubuntu 22.04.5 LTS
CPU
AMD EPYC 9965 192-Core Processor
GPU
AMD Instinct MI325X
ROCm Version
ROCm 6.4.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels