A clean, deployable AI project that generates natural language captions from uploaded images using Salesforce BLIP.
This repository is designed to be recruiter-friendly: practical AI integration, clear architecture, and readable code.
This app demonstrates a complete inference workflow:
- User uploads an image in the Streamlit UI.
- Image is processed with Pillow.
- BLIP processor converts image into tensors.
- BLIP model generates token IDs for a caption.
- Tokens are decoded into readable text.
- Caption is shown and can be downloaded.
Main UI and generated caption preview:
- Python
- Streamlit
- PyTorch
- Hugging Face Transformers
- Salesforce BLIP (
Salesforce/blip-image-captioning-base) - Pillow
The project uses a simple modular architecture:
app.py: Streamlit presentation layer and user interactions.model.py: AI model loading and caption generation functions.utils.py: Image/file helper utilities.
This separation keeps the UI code clean while making model logic easier to test and maintain.
app.pyloads BLIP once using@st.cache_resource.- User uploads an image (or chooses one from
example_images/). utils.pyconverts the image to RGB PIL format.model.pyruns BLIP inference on CPU/GPU.- The generated caption is displayed and offered as a
.txtdownload.
AI-Captioning-App/
├── app.py
├── model.py
├── utils.py
├── requirements.txt
├── README.md
├── assets/
│ ├── README.md
│ ├── p1.png
│ └── p2.png
└── example_images/
│ └── README.md
- Clone the repository:
git clone https://github.com/asmiverma/AI-Captioning-App.git
cd AI-Captioning-App- Create a virtual environment:
python -m venv .venv- Activate the environment:
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Run the app:
streamlit run app.py- Support beam search/temperature options in the UI for caption diversity.
- Add multilingual caption translation.
- Add lightweight evaluation metrics over a small benchmark dataset.
- Package with Docker for one-command deployment.

