-
Notifications
You must be signed in to change notification settings - Fork 1
Description
python3 train.py was suddenly killed after running for an hour.
How to debug it?
2/186 [..............................] - ETA: 1:20:21 - loss: 7.3700 - accuracy: 0.48802020-10-31 14:04:21.869470: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1207959552 exceeds 10% of free system memory.
7/186 [>.............................] - ETA: 2:14:49 - loss: 2.6004 - accuracy: 0.6981
186/186 [==============================] - ETA: 0s - loss: 0.7419 - accuracy: 0.7763
Epoch 00001: val_loss improved from inf to 0.64748, saving model to unet.h5
186/186 [==============================] - 9966s 54s/step - loss: 0.7419 - accuracy: 0.7763 - val_loss: 0.6475 - val_accuracy: 0.7814
Epoch 2/30
39/186 [=====>........................] - ETA: 2:29:12 - loss: 0.6430 - accuracy: 0.7831
44/186 [======>.......................] - ETA: 2:22:09 - loss: 0.6429 - accuracy: 0.7815Killed