Skip to content

beaarend/capincho

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Capincho

Image captioning composed of 3 modules: 1) a decoder only language model (OPT) for generating text, 2) a vision-language model CLIP for aligned representation of images and texts, 3) a embeddings mapper that maps CLIP embeddings to k OPT word embeddings.

captioning model pipeline

Some examples from coco dataset, after training for 2 epochs only while learning a prefix of length 10 (k=10):

image 1

image 2

image 3

Installation

pip install git+https://github.com/openai/CLIP.git
pip install -r requirements.txt

Usage

check the following files:

extractFeatures.py to extract the features vectors from coco dataset using CLIP or open CLIP.

trainDecoder.py to train the mapper module and finetune OPT, or a OPT LoRA model.

evaluateCaptioning.py to qualitative evaluate results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%