Skip to content

asmiverma/AI-Captioning-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Image Captioning Web App

A clean, deployable AI project that generates natural language captions from uploaded images using Salesforce BLIP.

This repository is designed to be recruiter-friendly: practical AI integration, clear architecture, and readable code.

Project Overview

This app demonstrates a complete inference workflow:

  1. User uploads an image in the Streamlit UI.
  2. Image is processed with Pillow.
  3. BLIP processor converts image into tensors.
  4. BLIP model generates token IDs for a caption.
  5. Tokens are decoded into readable text.
  6. Caption is shown and can be downloaded.

Demo Screenshot

Main UI and generated caption preview:

App Home Generated Caption

Tech Stack

  • Python
  • Streamlit
  • PyTorch
  • Hugging Face Transformers
  • Salesforce BLIP (Salesforce/blip-image-captioning-base)
  • Pillow

How It Works

High-Level Architecture (for Recruiters)

The project uses a simple modular architecture:

  • app.py: Streamlit presentation layer and user interactions.
  • model.py: AI model loading and caption generation functions.
  • utils.py: Image/file helper utilities.

This separation keeps the UI code clean while making model logic easier to test and maintain.

Inference Flow

  1. app.py loads BLIP once using @st.cache_resource.
  2. User uploads an image (or chooses one from example_images/).
  3. utils.py converts the image to RGB PIL format.
  4. model.py runs BLIP inference on CPU/GPU.
  5. The generated caption is displayed and offered as a .txt download.

Project Structure

AI-Captioning-App/
├── app.py
├── model.py
├── utils.py
├── requirements.txt
├── README.md
├── assets/
│   ├── README.md
│   ├── p1.png
│   └── p2.png
└── example_images/
│   └── README.md

How to Run Locally

  1. Clone the repository:
git clone https://github.com/asmiverma/AI-Captioning-App.git
cd AI-Captioning-App
  1. Create a virtual environment:
python -m venv .venv
  1. Activate the environment:
# Windows PowerShell
.\.venv\Scripts\Activate.ps1

# macOS/Linux
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the app:
streamlit run app.py

Future Improvements

  • Support beam search/temperature options in the UI for caption diversity.
  • Add multilingual caption translation.
  • Add lightweight evaluation metrics over a small benchmark dataset.
  • Package with Docker for one-command deployment.

About

AI-powered web app that generates captions for images using the Salesforce BLIP vision-language model and Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages