AI-Powered Image Captioning Web App

A clean, deployable AI project that generates natural language captions from uploaded images using Salesforce BLIP.

This repository is designed to be recruiter-friendly: practical AI integration, clear architecture, and readable code.

Project Overview

This app demonstrates a complete inference workflow:

User uploads an image in the Streamlit UI.
Image is processed with Pillow.
BLIP processor converts image into tensors.
BLIP model generates token IDs for a caption.
Tokens are decoded into readable text.
Caption is shown and can be downloaded.

Demo Screenshot

Main UI and generated caption preview:

Tech Stack

Python
Streamlit
PyTorch
Hugging Face Transformers
Salesforce BLIP (Salesforce/blip-image-captioning-base)
Pillow

How It Works

High-Level Architecture (for Recruiters)

The project uses a simple modular architecture:

app.py: Streamlit presentation layer and user interactions.
model.py: AI model loading and caption generation functions.
utils.py: Image/file helper utilities.

This separation keeps the UI code clean while making model logic easier to test and maintain.

Inference Flow

app.py loads BLIP once using @st.cache_resource.
User uploads an image (or chooses one from example_images/).
utils.py converts the image to RGB PIL format.
model.py runs BLIP inference on CPU/GPU.
The generated caption is displayed and offered as a .txt download.

Project Structure

AI-Captioning-App/
├── app.py
├── model.py
├── utils.py
├── requirements.txt
├── README.md
├── assets/
│   ├── README.md
│   ├── p1.png
│   └── p2.png
└── example_images/
│   └── README.md

How to Run Locally

Clone the repository:

git clone https://github.com/asmiverma/AI-Captioning-App.git
cd AI-Captioning-App

Create a virtual environment:

python -m venv .venv

Activate the environment:

# Windows PowerShell
.\.venv\Scripts\Activate.ps1

# macOS/Linux
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run the app:

streamlit run app.py

Future Improvements

Support beam search/temperature options in the UI for caption diversity.
Add multilingual caption translation.
Add lightweight evaluation metrics over a small benchmark dataset.
Package with Docker for one-command deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Image Captioning Web App

Project Overview

Demo Screenshot

Tech Stack

How It Works

High-Level Architecture (for Recruiters)

Inference Flow

Project Structure

How to Run Locally

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
assets		assets
example_images		example_images
README.md		README.md
app.py		app.py
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Image Captioning Web App

Project Overview

Demo Screenshot

Tech Stack

How It Works

High-Level Architecture (for Recruiters)

Inference Flow

Project Structure

How to Run Locally

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages