I am training the t5 variant of model (grail_train_t5.jsonnet).
The config says
- training batch size 1
- gradient accumulation of 8
- epochs 30
I am running on an A100 with 40 GB memory. It shows me a total training time of around 120 hours (with around 70 seconds per step).
Is this the expected training time? Is there any possibility of optimization? Can I increase batch size (will it affect performance)?
Edit : It shows 120 hours for one epoch (and not all). Am I making some mistake?
I am training the t5 variant of model (grail_train_t5.jsonnet).
The config says
I am running on an A100 with 40 GB memory. It shows me a total training time of around 120 hours (with around 70 seconds per step).
Is this the expected training time? Is there any possibility of optimization? Can I increase batch size (will it affect performance)?
Edit : It shows 120 hours for one epoch (and not all). Am I making some mistake?