At least using xlnet model. When using high max_len, it doesn't print any error just crashes. Training with 1 GPU works well. When setting low max_len I get the error below. I'm using 4 Nvidia V100.
Traceback (most recent call last):
File "src/train.py", line 830, in <module>
main()
File "src/train.py", line 690, in main
train_step(dummy_batch)
File "src/train.py", line 566, in train_step
loss, acc, ppl = forward_step(batch)
File "src/train.py", line 556, in forward_step
acc = reduce_tensor(acc)
File "src/train.py", line 530, in reduce_tensor
reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
Traceback (most recent call last):
File "src/train.py", line 830, in <module>
main()
File "src/train.py", line 690, in main
train_step(dummy_batch)
File "src/train.py", line 566, in train_step
loss, acc, ppl = forward_step(batch)
File "src/train.py", line 556, in forward_step
acc = reduce_tensor(acc)
File "src/train.py", line 530, in reduce_tensor
reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
Traceback (most recent call last):
File "src/train.py", line 830, in <module>
Traceback (most recent call last):
File "src/train.py", line 830, in <module>
main()
File "src/train.py", line 690, in main
main()
File "src/train.py", line 690, in main
train_step(dummy_batch)
train_step(dummy_batch)
File "src/train.py", line 566, in train_step
File "src/train.py", line 566, in train_step
loss, acc, ppl = forward_step(batch)
File "src/train.py", line 556, in forward_step
loss, acc, ppl = forward_step(batch)
File "src/train.py", line 556, in forward_step
acc = reduce_tensor(acc)
File "src/train.py", line 530, in reduce_tensor
acc = reduce_tensor(acc)
File "src/train.py", line 530, in reduce_tensor
reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
At least using xlnet model. When using high max_len, it doesn't print any error just crashes. Training with 1 GPU works well. When setting low max_len I get the error below. I'm using 4 Nvidia V100.