[8d6bf97c3bf4:3039 :0:3095] Caught signal 7 (Bus error: nonexistent physical address)
==== backtrace (tid: 3095) ====
0 0x0000000000043090 killpg() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x000000000007587d ncclGroupEnd() ???:0
3 0x000000000007b0ef ncclGroupEnd() ???:0
4 0x0000000000059e97 ncclGetUniqueId() ???:0
5 0x00000000000489b1 ???() /usr/lib/x86_64-linux-gnu/libnccl.so.2:0
6 0x000000000004a655 ???() /usr/lib/x86_64-linux-gnu/libnccl.so.2:0
7 0x0000000000063dcc ncclRedOpDestroy() ???:0
8 0x0000000000008609 start_thread() ???:0
9 0x000000000011f133 clone() ???:0
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: