Skip to content

Ecolash/Image-Captioning-Model

Repository files navigation

Image Captioning with Occlusion Analysis

This project implemets a ViT-GPT2-based image captioning model, with experiments analyzing model robustness under different levels of image occlusion (10%, 50%, 80%). Link to dataset: https://github.com/eco-mini/custom_captions_dataset

Contents

The repository includes the following files:

  • 1. Notebooks

    • 20_PartA-1.ipynb: Contains zero-shot smolVLM captioning and ImageCaptioningModel along with training and evaluation on the given dataset in the PS.
    • 20_PartB-1.ipynb: Part 1 of the robustness analysis where we evaluate captions generated using smolVLM on occluded images with varying levels of occlusion - [10% , 50%, 80%]
    • 20_PartB-2.ipynb: Part 2 of the robustness analysis where we evaluate captions generated using ImageCaptioningModel on occluded images with varying levels of occlusion - [10% , 50%, 80%]
    • 20_PartC-1.ipynb: Contains code for a custom BERT Classifier that classifies captions generated into 2 classes
      1. smolVLM (0)
      2. ImageCaptioningModel (1) This notebook trains and evaluates the model.
  • 2. Captions

    • Captions - Custom: This folder contains captions generated by the custom captioning model under all above given occlusion levels.
    • Captions - SmolVLM: This folder contains captions generated by the smolVLM under all above given occlusion levels.
  • 3. Results

    • Results: This folder contains the scores for generated captions (both smolVLM and custom model) under all occlusion levels. Also the file predictions.csv has the results of the CaptionClassifier.

Running Instructions

There are no specific dependencies beyond standard Python libraries and Hugging Face Transformers. All experiments were run in Kaggle Notebooks.

To reproduce results:

  1. Open any notebook in Kaggle.
  2. Ensure GPU is enabled.
  3. Run all cells top to bottom.

The trained model file (best_captioning_model.pt) is loaded directly from a Kaggle dataset in the captioning and evaluation notebooks.

Team Members

Thanks and Acknowledgements

We sincerely thank our Deep Learning course instructor and TAs for their support and feedback throughout the course.

We also acknowledge the open-source community and tools that made this work possible:

  • Hugging Face Transformers
  • PyTorch
  • Kaggle Datasets and Notebooks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published