I try to synthesize (C, S, R)=(24, 8, 8) algorithm for alltoall on my machine. But I cannot synthesize successfully even in one day ! And the SCCL paper shows that it can be synthesized in 133.7s
My script is
TOPO="DGX1"
COLL="Alltoall"
STEPS="8"
ROUNDS=$STEPS
CHUNKS="3"
msccl solve instance ${TOPO} ${COLL} \
--steps ${STEPS} \
--rounds ${ROUNDS} \
--chunks ${CHUNKS} \
And my cpu config is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz
I try to synthesize (C, S, R)=(24, 8, 8) algorithm for alltoall on my machine. But I cannot synthesize successfully even in one day ! And the SCCL paper shows that it can be synthesized in 133.7s
My script is
And my cpu config is: