Context AI - Video Context Extraction API

A powerful multimodal AI system that analyzes video content by extracting and combining information from multiple sources: audio transcription, visual object detection, and text recognition. Built with FastAPI for high-performance video processing.

🎯 Features

🎙️ Audio Transcription: Automatic speech-to-text using OpenAI's Whisper model
👁️ Object Detection: Real-time object recognition using YOLOv8
📝 Text Extraction: OCR capabilities with Tesseract for detecting text in video frames
🤖 AI Summarization: Intelligent context fusion using local LLM (with fallback to rule-based summarization)
⚡ Fast API: RESTful API built with FastAPI for easy integration
🔄 Automatic Cleanup: Temporary files are automatically managed and cleaned up

🏗️ Architecture

Video Upload → Frame Extraction → Parallel Processing
                                   ├─ Audio → Whisper → Transcription
                                   ├─ Visual → YOLO → Object Detection
                                   └─ Text → Tesseract → OCR
                                          ↓
                                   LLM Summarizer
                                          ↓
                                   Unified Summary

📋 Requirements

System Requirements

Python: 3.9 or higher
Operating System: Windows, macOS, or Linux
RAM: Minimum 8GB (16GB recommended for larger models)
Storage: At least 5GB free space for model weights

External Dependencies

Important

Tesseract OCR must be installed separately on your system:

Windows: Download from GitHub Releases
macOS: brew install tesseract
Linux: sudo apt-get install tesseract-ocr

If Tesseract is not in your system PATH, update line 19 in visual_analyser.py:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

🚀 Installation

Using UV (Recommended)

UV is a fast Python package installer and resolver.

Install UV (if not already installed):

# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

Clone or navigate to the project directory:
```
cd path/to/Contex_AI
```
Install dependencies:
```
uv sync
```

Using pip (Alternative)

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

⚙️ Configuration

Optional: LLM Model Setup

The system works out-of-the-box with a rule-based fallback summarizer. For enhanced AI summarization:

Download a GGUF model file (e.g., from Hugging Face)

Update video_processor.py line 14:

self.llm_summarizer = LLMSummarizer(model_path="path/to/your/model.gguf")

Model Downloads

On first run, the following models will be automatically downloaded:

Whisper (base model): ~140MB
YOLOv8n: ~6MB

🎬 Usage

Starting the API Server

# Using UV
uv run python main.py

# Using standard Python
python main.py

The server will start at http://localhost:8000

API Documentation

Once the server is running, visit:

Interactive API docs: http://localhost:8000/docs
Alternative docs: http://localhost:8000/redoc

Making API Requests

Using cURL

curl -X POST "http://localhost:8000/analyze_video/" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your_video.mp4"

Using Python

import requests

url = "http://localhost:8000/analyze_video/"
files = {"file": open("your_video.mp4", "rb")}

response = requests.post(url, files=files)
result = response.json()

print("Summary:", result["summary"])
print("Transcription:", result["transcription"])
print("Detected Objects:", result["detected_objects"])

Using JavaScript/Fetch

const formData = new FormData();
formData.append('file', videoFile);

fetch('http://localhost:8000/analyze_video/', {
  method: 'POST',
  body: formData
})
  .then(response => response.json())
  .then(data => {
    console.log('Summary:', data.summary);
    console.log('Transcription:', data.transcription);
    console.log('Detected Objects:', data.detected_objects);
  });

Response Format

{
  "summary": "Audio Content: This is a tutorial about... | Detected Objects: person, laptop, book | Text in Video: Tutorial Title",
  "transcription": "Full audio transcription text here...",
  "detected_objects": {
    "frame_1": {
      "objects": [
        {"object": "person", "confidence": 0.95},
        {"object": "laptop", "confidence": 0.87}
      ],
      "text": "Detected text from frame"
    },
    "frame_2": {
      "objects": [...],
      "text": "..."
    }
  }
}

📦 Project Structure

Contex_AI/
├── main.py                 # FastAPI application entry point
├── video_processor.py      # Main video processing orchestrator
├── audio_analyser.py       # Whisper-based audio transcription
├── visual_analyser.py      # YOLO object detection + Tesseract OCR
├── llm_summarizer.py       # LLM-based context summarization
├── requirements.txt        # Python dependencies (legacy)
├── pyproject.toml          # Modern Python project configuration
├── uv.lock                 # Locked dependency versions
└── README.md               # This file

🔧 Troubleshooting

Common Issues

Issue: ModuleNotFoundError: No module named 'whisper'

Solution: Install openai-whisper: uv add openai-whisper or pip install openai-whisper

Issue: pytesseract.pytesseract.TesseractNotFoundError

Solution: Install Tesseract OCR (see Requirements section) and configure the path in visual_analyser.py

Issue: CUDA/GPU errors with Whisper

Solution: The code uses fp16=False for CPU compatibility. For GPU acceleration, ensure you have CUDA installed and set fp16=True

Issue: Out of memory errors

Solution:
- Use smaller Whisper model: Change whisper.load_model("base") to "tiny" in audio_analyser.py
- Reduce frame extraction interval in video_processor.py line 53

Issue: Slow processing

Solution:
- Use GPU acceleration if available
- Increase frame_interval parameter (default: 5 seconds)
- Use smaller AI models

🛠️ Development

Running Tests

# Install dev dependencies
uv sync --all-extras

# Run tests (when available)
pytest

Code Formatting

# Format code with Black
black .

# Lint with Ruff
ruff check .

📝 Dependencies

Core Dependencies

fastapi (>=0.104.0): Web framework for building APIs
uvicorn (>=0.24.0): ASGI server
openai-whisper (>=20231117): Audio transcription
ultralytics (>=8.0.0): YOLOv8 object detection
pytesseract (>=0.3.10): OCR text extraction
llama-cpp-python (>=0.2.0): Local LLM inference
moviepy (>=1.0.3): Video processing
opencv-python (>=4.8.0): Computer vision operations
Pillow (>=10.0.0): Image processing
python-multipart (>=0.0.6): File upload handling

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

📄 License

MIT License - feel free to use this project for personal or commercial purposes.

🙏 Acknowledgments

OpenAI Whisper: State-of-the-art speech recognition
Ultralytics YOLOv8: Real-time object detection
Tesseract OCR: Open-source text recognition
FastAPI: Modern, fast web framework

Built with ❤️ for multimodal AI analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context AI - Video Context Extraction API

🎯 Features

🏗️ Architecture

📋 Requirements

System Requirements

External Dependencies

🚀 Installation

Using UV (Recommended)

Using pip (Alternative)

⚙️ Configuration

Optional: LLM Model Setup

Model Downloads

🎬 Usage

Starting the API Server

API Documentation

Making API Requests

Using cURL

Using Python

Using JavaScript/Fetch

Response Format

📦 Project Structure

🔧 Troubleshooting

Common Issues

🛠️ Development

Running Tests

Code Formatting

📝 Dependencies

Core Dependencies

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
audio_analyser.py		audio_analyser.py
llm_summarizer.py		llm_summarizer.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
video_processor.py		video_processor.py
visual_analyser.py		visual_analyser.py

ramblinghermit0403/Context_AI

Folders and files

Latest commit

History

Repository files navigation

Context AI - Video Context Extraction API

🎯 Features

🏗️ Architecture

📋 Requirements

System Requirements

External Dependencies

🚀 Installation

Using UV (Recommended)

Using pip (Alternative)

⚙️ Configuration

Optional: LLM Model Setup

Model Downloads

🎬 Usage

Starting the API Server

API Documentation

Making API Requests

Using cURL

Using Python

Using JavaScript/Fetch

Response Format

📦 Project Structure

🔧 Troubleshooting

Common Issues

🛠️ Development

Running Tests

Code Formatting

📝 Dependencies

Core Dependencies

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages