Skip to content

Harry-jain/caption-craft.ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ caption-craft.ai : AI-Powered Social Media Caption Generator

Transform your images into engaging social media content with cutting-edge AI technology

Python React FastAPI TypeScript Docker


๐ŸŽฏ Project Overview

The caption-craft.ai : AI-Powered Social Media Caption Generator is a revolutionary full-stack web application that leverages state-of-the-art artificial intelligence to transform ordinary images into compelling social media content. Built with modern web technologies and advanced AI models, this application provides intelligent, platform-specific caption generation for Instagram, Facebook, and LinkedIn.

๐ŸŒŸ Why This Project Matters

In today's digital landscape, creating engaging social media content is crucial for personal branding, business growth, and community building. However, crafting the perfect caption that resonates with your audience while maintaining platform-specific best practices can be time-consuming and challenging. This application democratizes high-quality content creation by making AI-powered caption generation accessible to everyone.

โœจ Core Features

๐Ÿค– Advanced AI Integration

  • LLAVA 7B Vision Model: State-of-the-art image analysis with detailed scene understanding
  • GPT-OSS 120B: Large language model for sophisticated caption generation
  • DeepSeek-R1: Alternative reasoning model for diverse content creation
  • Transparent AI Reasoning: View the AI's decision-making process with <think></think> tags
  • Multi-Model Support: Switch between different AI models for varied results

๐Ÿ“ฑ Platform-Specific Optimization

  • Instagram: Trendy, aesthetic-focused captions with strategic emojis and hashtags
  • Facebook: Community-oriented, conversational tone with engagement-focused content
  • LinkedIn: Professional, thought-leadership content with industry-relevant insights
  • Smart Adaptation: AI automatically adjusts tone, style, and hashtags for each platform

๐ŸŽฏ Five Caption Styles

  • SHORT: Concise, punchy captions perfect for quick engagement
  • STORY: Narrative, storytelling approach that draws readers in
  • PHILOSOPHY: Deep, thought-provoking content that sparks reflection
  • LIFESTYLE: Aspirational, lifestyle-focused content that inspires
  • QUOTE: Inspirational, quote-style captions that motivate and uplift

๐Ÿš€ User Experience Features

  • Drag & Drop Interface: Intuitive file upload with visual feedback
  • Real-time Processing: Live progress indicators during AI analysis
  • Dark/Light Mode: Beautiful, responsive UI with theme switching
  • History Management: Save, organize, and revisit your generated captions
  • Smart Sharing: Direct integration with social media platforms
  • Duplicate Detection: Intelligent image deduplication using SHA-256 hashing

๐Ÿ—๏ธ System Architecture

๐Ÿ“Š High-Level Architecture

graph TB
    A[User Interface] --> B[React Frontend]
    B --> C[FastAPI Backend]
    C --> D[LLAVA Vision Model]
    C --> E[GPT-OSS/DeepSeek]
    C --> F[History Storage]
    D --> G[Image Analysis]
    E --> H[Caption Generation]
    G --> H
    H --> I[Platform Optimization]
    I --> J[Social Media Sharing]
Loading

๐Ÿ—‚๏ธ Project Structure

๐Ÿ“ AI-Caption-Generator/
โ”œโ”€โ”€ ๐Ÿ“ backend/                    # FastAPI Backend Server
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ main.py                # Core API endpoints and logic
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt       # Python dependencies
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ Dockerfile            # Backend container configuration
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ setup.py              # Environment setup script
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ uploads/               # Temporary file storage
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ history.json          # Persistent data storage
โ”œโ”€โ”€ ๐Ÿ“ frontend/                  # React + TypeScript Frontend
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ src/                   # Source code directory
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ App.tsx           # Main application component
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ main.tsx          # Application entry point
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ pages/            # Page components
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ Home.tsx      # Main application page
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“„ Result.tsx    # Results display page
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“„ types.d.ts        # TypeScript definitions
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ package.json          # Node.js dependencies
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ Dockerfile            # Frontend container configuration
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ tailwind.config.js    # Tailwind CSS configuration
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ vite.config.mjs       # Vite build configuration
โ”œโ”€โ”€ ๐Ÿ“„ docker-compose.yml         # Multi-container orchestration
โ”œโ”€โ”€ ๐Ÿ“„ setup.py                   # Project setup automation
โ””โ”€โ”€ ๐Ÿ“„ README.md                  # Project documentation

๐Ÿ”„ Data Flow Architecture

  1. Image Upload โ†’ Frontend receives file via drag & drop
  2. File Validation โ†’ Backend validates file type and size
  3. Hash Calculation โ†’ SHA-256 hash for duplicate detection
  4. AI Analysis โ†’ LLAVA model analyzes image content
  5. Caption Generation โ†’ GPT-OSS/DeepSeek creates platform-specific captions
  6. Result Processing โ†’ Backend formats and optimizes output
  7. User Display โ†’ Frontend presents organized results
  8. History Storage โ†’ JSON-based persistent storage
  9. Social Sharing โ†’ Direct platform integration

๐Ÿš€ Quick Start Guide

๐Ÿ“‹ Prerequisites

Requirement Version Purpose
Python 3.11+ Backend API development
Node.js 18+ Frontend development
Docker Latest Containerized deployment
Hugging Face API Required AI model access
Ollama Latest Local LLAVA model serving

๐Ÿ”ง System Requirements

  • RAM: Minimum 8GB (16GB recommended for optimal performance)
  • Storage: 10GB free space for models and dependencies
  • Network: Stable internet connection for AI model access
  • OS: Windows 10+, macOS 10.15+, or Linux (Ubuntu 20.04+)

๐Ÿ“ฅ Step 1: Clone the Repository

# Clone the repository
git clone https://github.com/Harry-jain/caption-craft.ai.git
cd caption-craft.ai

# Verify the project structure
ls -la

๐Ÿ Step 2: Backend Setup

# Navigate to backend directory
cd backend

# Create Python virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Upgrade pip to latest version
python -m pip install --upgrade pip

# Install Python dependencies
pip install -r requirements.txt

# Configure environment variables
# Option A: Use automated setup script (Recommended)
python ../setup.py

# Option B: Manual configuration
echo "HF_TOKEN=your_huggingface_token_here" > .env
echo "HF_GPT_OSS_MODEL=openai/gpt-oss-120b:together" >> .env

# Start the FastAPI server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

๐Ÿค– Step 3: Ollama Setup (Required for Image Analysis)

# Install Ollama from https://ollama.ai
# Download the appropriate installer for your OS

# Pull the LLAVA 7B model for image analysis
ollama pull llava:7b

# Start Ollama service in the background
ollama serve &

# Verify the installation and model
ollama list

# Test the model (optional)
ollama run llava:7b "Describe this image" --image /path/to/test/image.jpg

โš›๏ธ Step 4: Frontend Setup

# Navigate to frontend directory
cd frontend

# Install Node.js dependencies
npm install

# Start the development server
npm run dev

# Alternative: Build for production
npm run build
npm run preview

๐ŸŒ Step 5: Access the Application

Service URL Description
Frontend http://localhost:5173 Main application interface
Backend API http://localhost:8000 REST API endpoints
API Documentation http://localhost:8000/docs Interactive API docs
Health Check http://localhost:8000/health Service status

โœ… Verification Steps

  1. Backend Health Check: Visit http://localhost:8000/docs
  2. Frontend Loading: Visit http://localhost:5173
  3. Ollama Status: Run ollama list in terminal
  4. API Connectivity: Check browser developer tools for API calls

๐Ÿณ Docker Deployment

๐Ÿš€ Quick Docker Setup

# Build and run with Docker Compose (Recommended)
docker-compose up --build

# Run in detached mode
docker-compose up -d --build

# View logs
docker-compose logs -f

# Stop services
docker-compose down

๐Ÿ”ง Individual Container Deployment

# Build backend container
docker build -t caption-backend ./backend

# Build frontend container
docker build -t caption-frontend ./frontend

# Run backend container
docker run -p 8000:8000 -e HF_TOKEN=your_token_here caption-backend

# Run frontend container
docker run -p 5173:5173 caption-frontend

๐Ÿ“Š Docker Compose Configuration

The docker-compose.yml file orchestrates the entire application stack:

version: '3.8'
services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    environment:
      - HF_TOKEN=${HF_TOKEN}
      - HF_GPT_OSS_MODEL=${HF_GPT_OSS_MODEL}
    volumes:
      - ./backend:/app
    command: uvicorn main:app --host 0.0.0.0 --port 8000

  frontend:
    build: ./frontend
    ports:
      - "5173:5173"
    volumes:
      - ./frontend:/app
    command: npm run dev
    depends_on:
      - backend

๐Ÿ”ง Configuration & Environment

๐Ÿ” Environment Variables

Variable Description Required Default
HF_TOKEN Hugging Face API token for AI model access โœ… Yes -
HF_GPT_OSS_MODEL Primary model for caption generation โŒ No openai/gpt-oss-120b:together
OLLAMA_URL Ollama service endpoint โŒ No http://localhost:11434
UPLOAD_DIR Directory for temporary file storage โŒ No uploads

๐Ÿ›ก๏ธ Security Best Practices

  • ๐Ÿ”’ API Token Security: Never commit .env files to version control
  • ๐Ÿ”„ Token Rotation: Regularly rotate Hugging Face API tokens
  • ๐Ÿ“ File Permissions: Ensure proper file permissions for upload directories
  • ๐ŸŒ Network Security: Use HTTPS in production environments
  • ๐Ÿ” Input Validation: All file uploads are validated for type and size

๐Ÿค– Available AI Models

Model Provider Size Use Case Performance
GPT-OSS 120B Together AI 120B parameters Primary caption generation High quality, slower
GPT-OSS 20B Together AI 20B parameters Faster caption generation Good quality, faster
DeepSeek-R1 Fireworks AI 7B parameters Alternative reasoning Fast, diverse output
LLAVA 7B Ollama 7B parameters Image analysis Local processing

๐Ÿ“ฑ How It Works

  1. Upload Image: Drag & drop or select an image file
  2. Choose Platform: Select Instagram, Facebook, or LinkedIn
  3. Select Model: Choose between GPT-OSS or DeepSeek-R1
  4. AI Analysis: LLAVA analyzes the image content
  5. Caption Generation: AI generates 5 distinct caption types
  6. Review & Share: Copy captions or share directly to platforms

๐ŸŽจ Caption Types

Each platform generates 5 unique caption styles:

  • SHORT: Concise, punchy captions with hashtags
  • STORY: Narrative, storytelling approach
  • PHILOSOPHY: Deep, thought-provoking content
  • LIFESTYLE: Aspirational, lifestyle-focused
  • QUOTE: Inspirational, quote-style captions

๐Ÿ”Œ API Endpoints

Image Analysis

POST /api/describe
Content-Type: multipart/form-data

file: [image file]

Caption Generation

POST /api/caption
Content-Type: application/json

{
  "description": "image description",
  "image_name": "filename.jpg",
  "image_hash": "optional_hash",
  "tone": "instagram|facebook|linkedin",
  "model_id": "optional_model_override"
}

History Management

GET /api/history          # Get all captions
GET /api/history/{id}     # Get specific caption
DELETE /api/history/{id}  # Delete caption
DELETE /api/history       # Clear all history

๐Ÿ› ๏ธ Tech Stack

Backend

  • FastAPI: Modern, fast web framework
  • Python 3.11+: Latest Python features
  • Hugging Face: AI model inference
  • LLAVA: Vision-language model for image analysis
  • OpenAI Client: Router API integration

Frontend

  • React 18: Modern React with hooks
  • TypeScript: Type-safe development
  • Tailwind CSS: Utility-first styling
  • Vite: Fast build tool
  • React Router: Client-side routing

AI Models

  • LLAVA 7B: Image recognition and description
  • GPT-OSS 120B: Advanced caption generation
  • DeepSeek-R1: Alternative caption model

๐ŸŒŸ Key Features

Smart Platform Adaptation

  • Instagram: Trendy, aesthetic-focused with strategic emojis
  • Facebook: Community-oriented, conversational tone
  • LinkedIn: Professional, thought-leadership content

Advanced AI Reasoning

  • View the AI's thought process with <think></think> tags
  • Understand how captions are adapted for each platform
  • Transparent AI decision-making

Seamless Sharing

  • Direct platform integration
  • Optimized content for each social network
  • Copy-to-clipboard functionality

๐Ÿ“Š Performance

  • Image Analysis: ~5-10 seconds with LLAVA
  • Caption Generation: ~10-20 seconds with GPT-OSS
  • Response Time: Optimized for real-time interaction
  • Scalability: Docker-ready for production deployment

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“ Project Information

This project is developed as a comprehensive demonstration of modern AI integration with full-stack web development. The application showcases advanced capabilities in computer vision, natural language processing, and user experience design.

๐Ÿ™ Acknowledgments

  • Hugging Face for providing the AI models
  • Together AI for GPT-OSS model hosting
  • Fireworks AI for DeepSeek-R1 hosting
  • Open Source Community for the amazing tools

๐Ÿ“ž Support

๐Ÿ”ฎ Roadmap

  • Multi-language support
  • Video caption generation
  • Advanced hashtag optimization
  • Social media scheduling integration
  • Team collaboration features
  • Analytics and insights

๐Ÿšจ Troubleshooting

Common Issues

1. "Ollama service not available" Error

# Install Ollama
# Download from: https://ollama.ai

# Pull LLAVA model
ollama pull llava:7b

# Start service
ollama serve

# Verify in another terminal
ollama list

2. "HF_TOKEN not set" Error

# Create .env file in backend/ directory
echo "HF_TOKEN=your_actual_token_here" > backend/.env
echo "HF_GPT_OSS_MODEL=openai/gpt-oss-120b:together" >> backend/.env

3. Frontend Build Errors

# Clear node_modules and reinstall
cd frontend
rm -rf node_modules package-lock.json
npm install

4. Port Already in Use

# Check what's using the port
netstat -ano | findstr :8000  # Windows
lsof -i :8000                 # Mac/Linux

# Kill the process or use different port
uvicorn main:app --host 0.0.0.0 --port 8001 --reload

Made with โค๏ธ by Harsh Jain

โญ Star this repo if you found it helpful!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors