Hi there!
I want to use this model. I would like to train and fine-tune the model on my custom dataset. The only condition is that I only have the Videos and corresponding chat.json. I don't have any images. Will I be able to do that? If Yes, then what modification do I need to make?