GitHub - Andersonsr/capincho

Capincho

Image captioning composed of 3 modules: 1) a decoder only language model (OPT) for generating text, 2) a vision-language model CLIP for aligned representation of images and texts, 3) a embeddings mapper that maps CLIP embeddings to k OPT word embeddings.

Some examples from coco dataset, after training for 2 epochs only while learning a prefix of length 10 (k=10):

Installation

create python 3.12 env

conda create -n capincho python=3.12

install cuda toolkit 11.8

conda install conda-forge::cudatoolkit

install pytorch compatible with cuda

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

install requirements

pip install -r requirements.txt

install radgraph compatible with python 3.12, for mimic evaluation

conda install git
pip install git+https://github.com/aehrc/radgraph.git

Usage

check the following files:

extractFeatures.py to extract the features vectors from coco dataset using CLIP or open CLIP.

trainDecoder.py to train the mapper module and finetune OPT, or a OPT LoRA model.

evaluateCaptioning.py to qualitative evaluate results.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
figs		figs
src		src
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capincho

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Capincho

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages