How to finetune with multi-gpus under data parallel setting?

Many thanks for sharing the amazing work!

I'm trying to finetune sliced 7b model on some large dataset with millions of samples.
But the `distribute-model` seems to be model parallel.
How can we finetune on the model with, let's say 8 gpus, under data parallel setting ?