Many thanks for sharing the amazing work!
I'm trying to finetune sliced 7b model on some large dataset with millions of samples.
But the distribute-model seems to be model parallel.
How can we finetune on the model with, let's say 8 gpus, under data parallel setting ?