GitHub - nkrasner/ML-CLIP: Parallel Multilingual Pre-training for Multimodal Representations

Parallel Multilingual Pre-training for Multimodal Representations

Multi-modal representations are useful in many downstream tasks such as image-grounded caption generation and text-grounded image generation. The majority of the datasets available for training these representations are in English or are heavily skewed to certain languages. We extend the CLIP technique from Radford et al. (2021) by incorporating artificial parallel data from three additional diverse languages and find that this not only improves the multilingual performance of the CLIP model, but also improves its performance in English.

File Descriptions

translate_data.py

This takes a dictionary of captions and translates them into more languages (to create n-way parallel data)

finetune_clip.py

This trains the text and image encoders to merge their distributions.

generate_vectors.py

This generates vectors from your data that you can use for downstream tasks or evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
evaluation.ipynb		evaluation.ipynb
finetune_clip.py		finetune_clip.py
generate_vectors.py		generate_vectors.py
translate_data.py		translate_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Multilingual Pre-training for Multimodal Representations

File Descriptions

translate_data.py

finetune_clip.py

generate_vectors.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel Multilingual Pre-training for Multimodal Representations

File Descriptions

translate_data.py

finetune_clip.py

generate_vectors.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages