Skip to content

Andersonsr/capincho

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

133 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Capincho

Image captioning composed of 3 modules: 1) a decoder only language model (OPT) for generating text, 2) a vision-language model CLIP for aligned representation of images and texts, 3) a embeddings mapper that maps CLIP embeddings to k OPT word embeddings.

captioning model pipeline

Some examples from coco dataset, after training for 2 epochs only while learning a prefix of length 10 (k=10):

image 1

Installation

  1. create python 3.12 env
conda create -n capincho python=3.12
  1. install cuda toolkit 11.8
conda install conda-forge::cudatoolkit
  1. install pytorch compatible with cuda
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

  1. install requirements
pip install -r requirements.txt
  1. install radgraph compatible with python 3.12, for mimic evaluation
conda install git
pip install git+https://github.com/aehrc/radgraph.git

Usage

check the following files:

extractFeatures.py to extract the features vectors from coco dataset using CLIP or open CLIP.

trainDecoder.py to train the mapper module and finetune OPT, or a OPT LoRA model.

evaluateCaptioning.py to qualitative evaluate results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors