Skip to content

bechir23/Enterprise-File-AI-Service

Repository files navigation

AI File Utilities Microservice

Version Coverage Tests License

A production-ready FastAPI microservice for AI-powered file processing with comprehensive enterprise features. Deployed and tested with 91% coverage across 439 tests with dual AI model support (OpenAI + Local Models).

PRODUCTION ARCHITECTURE

┌─────────────────────────────────────────────────────────────┐
│             AI File Utilities v1.0.0 - DEPLOYED            │
├─────────────────────────────────────────────────────────────┤
│   FastAPI + JWT Auth                    Port: 8000        │
│  ├── General Files (/api/v1/general)                       │
│  ├── Image Processing (/api/v1/image)                      │
│  ├── Audio/STT (/api/v1/audio)                            │
│  ├── Video Analysis (/api/v1/video)                       │
│  └── PDF Operations (/api/v1/pdf)                         │
├─────────────────────────────────────────────────────────────┤
│   Cloud Integration Stack                                │
│  ├── Redis Cloud (Caching + Locks)                        │
│  ├── Supabase Storage (Files + Database)                  │
│  ├── OpenAI GPT-4 + Whisper OR Local Models (CLIP)      │
│  └── Kafka Events (Docker Local)                          │
├─────────────────────────────────────────────────────────────┤
│   Monitoring & Observability                            │
│  ├── Prometheus Metrics         Port: 9090                │
│  ├── Grafana Dashboard          Port: 3000                │
│  ├── Structured JSON Logging                              │
│  └── Health Checks + Service Status                       │
└─────────────────────────────────────────────────────────────┘

API Coverage - 20 Endpoints

| Service Category | Endpoints | Key Features | |-----------------|-----------|--------|--------------| | Authentication | 2 endpoints | JWT tokens, secure auth | | General Files | 5 endpoints | File_ID architecture, upload, metadata extraction, text extraction, download, delete | | Image Processing | 4 endpoints | Convert, classify, defect detection, thumbnails | | Audio Processing | 4 endpoints | Transcription with timestamps, language detection, summarization, conversion | | Video Processing | 3 endpoints | Frame extraction, object detection, summarization | | PDF Operations | 2 endpoints | PDF splitting, cached text retrieval |

File Processing Capabilities

  • File ID Architecture: Secure file_id-based processing for all operations
  • Multi-format Support: Images, Videos, Audio, PDFs, Documents
  • Dual AI Models: OpenAI GPT-4/Whisper OR Local Models (CLIP for CV, Whisper for STT)
  • Format Conversion: Image/audio conversion, PDF operations
  • Metadata Extraction: EXIF, PDF properties, video specs
  • OCR + Text Extraction: Tesseract integration

AI MODELS & SERVICE ARCHITECTURE

Complete Model Mapping Table - OpenAI vs Local Models

Service API Endpoint OpenAI Model (USE_OPENAI=true) Local Model (USE_OPENAI=false) Local Model Recommendations Purpose
Audio/STT POST /api/v1/audio/transcribe whisper-1 openai/whisper-tiny Better: whisper-base (290MB) or whisper-small (966MB) for higher accuracy. Best: whisper-medium (3GB) for production quality. Enable GPU with device="cuda" Audio transcription with timestamps
Audio/STT POST /api/v1/audio/detect-language whisper-1 openai/whisper-tiny Better: whisper-base for more reliable language detection. Use longer audio samples (>10s) for better accuracy Audio language detection
Audio/STT POST /api/v1/audio/summarize whisper-1 + gpt-3.5-turbo openai/whisper-tiny + t5-small STT: Use whisper-base for cleaner transcripts. Summary: Upgrade to t5-base (850MB) or flan-t5-base for better summarization quality Transcription + summarization
Audio/STT POST /api/v1/audio/convert No AI (FFmpeg) No AI (FFmpeg) No AI - Pure FFmpeg processing Audio format conversion
Image/CV POST /api/v1/image/classify gpt-4o openai/clip-vit-base-patch32 Better: clip-vit-base-patch16 (350MB) for higher resolution analysis. Best: clip-vit-large-patch14 (1.7GB) for production. Use image preprocessing (resize, normalize) for stability Image classification & description
Image/CV POST /api/v1/image/detect-defects gpt-4o openai/clip-vit-base-patch32 Better: clip-vit-base-patch16 + custom defect categories. Advanced: Fine-tune CLIP on defect datasets or use specialized models like microsoft/DiT-base for industrial QA Defect detection & quality analysis
Image/CV POST /api/v1/image/convert No AI (PIL) No AI (PIL) No AI - Pure PIL processing Image format conversion
Image/CV POST /api/v1/image/thumbnail No AI (PIL) No AI (PIL) No AI - Pure PIL processing Thumbnail generation
Video/CV POST /api/v1/video/extract-frames No AI (OpenCV) No AI (OpenCV) No AI - Pure OpenCV processing Frame extraction
Video/CV POST /api/v1/video/detect-objects gpt-4o openai/clip-vit-base-patch32 Better: Use clip-vit-base-patch16 + frame sampling strategy (every 5th frame). Advanced: YOLOv8 or DETR models for real object detection vs semantic analysis Object detection in video frames
Video/CV POST /api/v1/video/summarize gpt-4o openai/clip-vit-base-patch32 + t5-small CV: Use clip-vit-base-patch16 for better frame analysis. Text Gen: Upgrade to t5-base or bart-large-cnn (1.6GB) for video summarization. Implement keyframe extraction Video content summarization
PDF POST /api/v1/pdf/split No AI (PyPDF2) No AI (PyPDF2) No AI - Pure PyPDF2 processing PDF page splitting
PDF GET /api/v1/pdf/text/{cache_id} No AI (Tesseract OCR) No AI (Tesseract OCR) Better OCR: Use PaddleOCR or EasyOCR for higher accuracy. Pre-process images (deskew, denoise) before OCR Cached OCR text retrieval
General POST /api/v1/general/extract-text No AI (Tesseract OCR) No AI (Tesseract OCR) Better OCR: Upgrade to PaddleOCR (multi-language) or TrOCR transformer model for document text extraction Document text extraction

Model Specifications:

OpenAI Models:

  • whisper-1: OpenAI's production Whisper model for audio transcription
  • gpt-3.5-turbo: Text summarization (200 tokens max)
  • gpt-4o: Vision model for image/video analysis (200-300 tokens max)

Local Models:

  • openai/whisper-tiny: 39M parameter Whisper model (fast initialization, 244MB download)
    • Upgrade Path: whisper-base (290MB) → whisper-small (966MB) → whisper-medium (3GB)
    • Best For: Development/testing, low-resource environments
  • t5-small: 60M parameter T5 model for text summarization (242MB download)
    • Upgrade Path: t5-base (850MB) → flan-t5-base (990MB) → bart-large-cnn (1.6GB)
    • Best For: Quick summarization, limited GPU memory
  • openai/clip-vit-base-patch32: 151MB CLIP model for computer vision (optimized for speed)
    • Upgrade Path: clip-vit-base-patch16 (350MB) → clip-vit-large-patch14 (1.7GB)
    • Best For: Fast image classification, real-time processing

** Local Model Optimization Strategies:**

Performance Optimization:

  • GPU Acceleration: All models support CUDA with torch.float16 optimization
  • Model Quantization: Use 8-bit or 4-bit quantization for memory reduction
  • Batch Processing: Process multiple files simultaneously for better throughput
  • Model Caching: Models stay loaded in memory after first use

Quality Improvements:

  • Whisper Audio: Use 16kHz sampling rate, reduce background noise, longer audio clips (>10s)
  • CLIP Vision: Resize images to optimal dimensions (224x224 or 336x336), proper normalization
  • T5 Text: Provide context, use appropriate prompt engineering, limit output length

Production Recommendations:

  • Memory Requirements:
    • Tiny models: 2-4GB RAM
    • Base models: 6-8GB RAM
    • Large models: 12-16GB RAM + GPU
  • Storage: Pre-download models to avoid cold starts
  • Monitoring: Track model inference times and memory usage

Model Selection Logic:

  • Environment Variable: USE_OPENAI=true|false controls model selection
  • Automatic Fallback: No fallback - pure OpenAI or pure local model operation
  • Performance: Local models optimized with torch.float16 for CUDA acceleration
  • Initialization: Local models downloaded and cached on first use from Hugging Face

Enterprise Security

  • JWT Authentication: Secure token-based auth with expiration
  • Authorization Middleware: Protected endpoints with user context
  • Input Validation: Comprehensive request/response validation
  • Secure Headers: CORS, security headers implemented

Modern File Processing Architecture

  • File ID-Based Operations: All processing endpoints use secure file_id parameters instead of file uploads
  • Supabase-Only Storage: 100% cloud-native storage with NO local fallback - all files stored in Supabase buckets
  • Decoupled Storage: Files uploaded once via /upload, then processed by reference
  • General API Refactored: Extract-text endpoint now uses file_id instead of file upload
  • Metadata Bug Fixed: Async metadata service now properly awaits coroutines (was causing 500 errors)
  • Enhanced Security: No direct file exposure in processing endpoints
  • Improved Performance: Cached file access and reduced data transfer
  • Async Processing: Background processing with event-driven architecture
  • Consistent Architecture: All APIs (general, image, audio, video, PDF) use file_id pattern

Production Monitoring & Test Coverage

  • 91% Test Coverage (439 tests passing across all APIs - June 2025)
  • Comprehensive Module Coverage:
    • CV Service: 91% coverage (387 statements, 34 missing) - Full OpenAI/Local support
    • STT Service: 86% coverage (366 statements, 51 missing) - Full OpenAI/Local support
    • Audio API: 95% coverage (188 statements, 10 missing) - Dual model endpoints
    • Image API: 94% coverage (197 statements, 11 missing) - Complete file_id architecture
    • Video API: 96% coverage (141 statements, 6 missing) - Frame extraction + AI analysis
    • PDF API: 100% coverage (81 statements, 0 missing) - Split + cached text
    • General API: 78% coverage (185 statements, 41 missing) - File_id refactoring
    • Auth API: 100% coverage (35 statements, 0 missing) - JWT token management
    • Storage Services: 93-100% coverage - Redis (100%), Supabase (100%), Local Storage (93%)
  • Prometheus Metrics: Request counters, duration histograms
  • Structured Logging: JSON logs with correlation IDs
  • Health Endpoints: /health with service status
  • Performance Tracking: Response time monitoring
  • Dual Model Testing: All CV, STT, and Audio services tested with both OpenAI and local models

☁️ Cloud-Ready Infrastructure

  • Redis Cloud: Distributed caching and session management
  • Supabase Storage: Scalable file storage with CDN (100% cloud-native, no local fallback)
  • Kafka Events: Async event processing for workflows
  • Docker Orchestration: Multi-service deployment ready

QUICK START (Production Tested)

Option 1: Full Docker Stack Recommended

# 1. Clone repository
git clone <repository-url>
cd ai-file-utilities

# 2. Configure environment
Copy-Item .env.example .env
# Edit .env with your API keys (see Configuration section)

# 3. Deploy full stack
docker-compose up -d

# 4. Verify deployment
docker-compose ps
# Expected: All services "Up (healthy)"

# 5. Test API
Invoke-RestMethod "http://localhost:8000/health"

Services deployed:

Option 2: Virtual Environment Setup

# Create and activate virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

# Set environment variables
$env:OPENAI_API_KEY = "your_openai_api_key_here"
$env:SUPABASE_URL = "your_supabase_url"
$env:SUPABASE_KEY = "your_supabase_anon_key"

# Install Tesseract OCR (Windows)
# Download from: https://github.com/UB-Mannheim/tesseract/wiki
# Add to PATH or set TESSERACT_CMD environment variable

# Run the service
uvicorn ai_file_utilities.main:app --reload --host 0.0.0.0 --port 8000

Configuration

Environment Variables

Variable Description Default Required
OPENAI_API_KEY OpenAI API key for AI features - Yes
SUPABASE_URL Supabase project URL - No
SUPABASE_KEY Supabase anon key - No
JWT_SECRET_KEY Secret for JWT token signing auto-generated No
REDIS_URL Redis connection URL redis://localhost:6379 No
KAFKA_BOOTSTRAP_SERVERS Kafka brokers localhost:9092 No
KAFKA_ENABLED Enable Kafka events true No
UPLOAD_DIR Local upload directory ./uploads No
TESSERACT_CMD Tesseract executable path tesseract No

CONFIGURATION SETUP

Required Environment Variables

# Core AI Services
OPENAI_API_KEY=sk-your-openai-api-key-here

# JWT Authentication
JWT_SECRET=your-secure-jwt-secret-minimum-64-characters
JWT_ALGORITHM=HS256
JWT_EXPIRATION_HOURS=24

# Supabase Cloud Storage
SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
SUPABASE_BUCKET=ai-file-utilities

** Supabase Storage Setup** (Step-by-step)

1. Create Supabase Project

# 1. Go to https://supabase.com and sign up
# 2. Create new project (takes 2-3 minutes)
# 3. Navigate to Project Settings → API
# 4. Copy Project URL and API keys

2. Create Storage Bucket

-- In Supabase Dashboard → Storage, create bucket:
-- Name: ai-file-utilities
-- Public: false (private)
-- File size limit: 100MB

3. Set Bucket Policies (Important!)

-- Go to Storage → Policies and add these:

-- Allow uploads for authenticated users
CREATE POLICY "Allow upload for authenticated users" 
ON storage.objects FOR INSERT 
WITH CHECK (bucket_id = 'ai-file-utilities');

-- Allow downloads for authenticated users  
CREATE POLICY "Allow download for authenticated users" 
ON storage.objects FOR SELECT 
USING (bucket_id = 'ai-file-utilities');

-- Allow deletions for authenticated users
CREATE POLICY "Allow delete for authenticated users" 
ON storage.objects FOR DELETE 
USING (bucket_id = 'ai-file-utilities');

4. Test Supabase Integration

# After configuring .env, test with health check:
Invoke-RestMethod "http://localhost:8000/health" | ConvertTo-Json

# Should show: "supabase": "configured"

** Additional Configuration**

Redis Cloud Setup (Optional)

# For production, use Redis Cloud:
REDIS_ENABLED=true
REDIS_URL=redis://default:password@host:port
REDIS_DEFAULT_TTL=3600

Kafka Configuration

# For Docker (local development):
KAFKA_ENABLED=true
KAFKA_BOOTSTRAP_SERVERS=kafka:9092

# For production Kafka cluster:
# KAFKA_BOOTSTRAP_SERVERS=your-kafka-cluster:9092

OCR Setup

# Windows: Download Tesseract OCR
# https://github.com/UB-Mannheim/tesseract/wiki
# Add to PATH or set:
$env:TESSERACT_CMD = "C:\Program Files\Tesseract-OCR\tesseract.exe"

** Deployment Options**

Production Environment Variables

# For production deployment:
LOG_LEVEL=INFO
DEBUG=false
MAX_WORKERS=4
REQUEST_TIMEOUT=300
PROCESSING_TIMEOUT=600

Local Development

# For development:
LOG_LEVEL=DEBUG
DEBUG=true
RELOAD=true
TEST_MODE=false

Testing & Model Switching

Run All Tests with Dual Model Support

For OpenAI Model Testing (Recommended)

# Set environment variables for OpenAI testing
$env:USE_OPENAI="true"
$env:OPENAI_API_KEY="test-key"

# Run all tests with coverage
python -m pytest tests/ --cov=ai_file_utilities --cov-report=html --cov-report=xml --cov-report=term-missing

# View HTML coverage report
start htmlcov/index.html

For Local Model Testing

# First install local model dependencies (if not already installed)
pip install torch torchvision transformers pillow

# Set environment variables for local model testing
$env:USE_OPENAI="false"

# Run tests (now with local model dependencies available)
python -m pytest tests/ --cov=ai_file_utilities --cov-report=html --cov-report=xml --cov-report=term-missing

# Note: Local models will be downloaded automatically on first use
# Models downloaded: whisper-tiny (~244MB), CLIP (~151MB), T5-small (~242MB)

Quick Test Run (No Coverage)

# Fast test run with OpenAI mocking
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="test-key"; python -m pytest tests/ -v

# Fast test run with local models
$env:USE_OPENAI="false"; python -m pytest tests/ -v

PowerShell Environment Commands for Model Switching

Testing with OpenAI Models

# Enable OpenAI models for testing
$env:USE_OPENAI="true"
$env:OPENAI_API_KEY="test-key"

# Run specific test modules
python -m pytest tests/test_cv_service_comprehensive_fixed.py -v
python -m pytest tests/test_stt_service_fixed.py -v
python -m pytest tests/test_audio_endpoints_updated.py -v

Testing with Local Models

# First install local model dependencies
pip install torch torchvision transformers pillow

# Enable local models for testing
$env:USE_OPENAI="false"

# Run dual-model test suites
python -m pytest tests/test_cv_service_comprehensive_fixed.py -v
python -m pytest tests/test_stt_service_fixed.py -v
python -m pytest tests/test_audio_endpoints_updated.py -v

# Note: First run will download models (~640MB total)

Test Categories & Dual Model Coverage

  • API Endpoint Tests: Complete coverage of all REST endpoints with dual model support
  • CV Service Tests: Both OpenAI GPT-4o and local CLIP model testing (65 tests)
  • STT Service Tests: OpenAI Whisper and local Whisper model testing (55 tests)
  • Audio Processing Tests: Dual model transcription, language detection, summarization (30 tests)
  • Service Layer Tests: Storage, OCR, metadata services with comprehensive mocking
  • Integration Tests: End-to-end workflows with environment variable switching
  • File_ID Architecture Tests: All processing endpoints use secure file_id parameters

Current Test Results: 91% Coverage

  • Total Tests: 439 tests across 19 comprehensive test modules
  • Test Status: All tests passing (0 failures)
  • Coverage: 91% overall code coverage (2412 statements, 219 missing)
  • Dual Model Test Support: Both OpenAI and local model paths fully tested
  • Test Files:
    • test_cv_service_comprehensive_fixed.py - CV service with OpenAI + local models (65 tests)
    • test_stt_service_fixed.py - STT service with OpenAI Whisper + local models (55 tests)
    • test_audio_endpoints_updated.py - Audio API with dual model support (30 tests)
    • test_image_fileid.py - Image API endpoints with file_id architecture (20 tests)
    • test_video_fileid.py - Video API endpoints (28 tests)
    • test_*_comprehensive.py - Complete service coverage (241 additional tests)

Service Coverage Breakdown

  • CV Service: 91% coverage (387 statements, 34 missing) - Full OpenAI/Local support
  • STT Service: 86% coverage (366 statements, 51 missing) - Full OpenAI/Local support
  • Audio API: 95% coverage (188 statements, 10 missing) - Dual model endpoints
  • Image API: 94% coverage (197 statements, 11 missing) - Complete file_id architecture
  • Video API: 96% coverage (141 statements, 6 missing) - Frame extraction + AI analysis
  • PDF API: 100% coverage (81 statements, 0 missing) - Split + cached text
  • General API: 78% coverage (185 statements, 41 missing) - File_id refactoring
  • Auth API: 100% coverage (35 statements, 0 missing) - JWT token management
  • Storage Services: 93-100% coverage - Redis (100%), Supabase (100%), Local Storage (93%)
  • Utility Services: 89-100% coverage - Metadata (100%), OCR (100%), Kafka (95%)
  • General API: 78% coverage (file upload, metadata, text extraction)
  • Image API: 94% coverage (classification, defect detection, conversion, thumbnails)
  • Audio API: 95% coverage (transcription, language detection, conversion)
  • Video API: 96% coverage (frame extraction, object detection, summarization)
  • PDF API: 100% coverage (text extraction, metadata, cached retrieval)
  • CV Service: 91% coverage (dual OpenAI/local model support)
  • STT Service: 86% coverage (dual OpenAI Whisper/local model support)
  • Storage Service: 93% coverage (Supabase integration)
  • Redis Service: 100% coverage (caching and distributed locks)
  • Kafka Events: 95% coverage (event publishing)

Model Support in Tests

The test suite supports both OpenAI and local models:

OpenAI Model Tests (Default - set USE_OPENAI=true):

  • Computer Vision: Uses GPT-4 Vision for image classification and defect detection
  • Speech-to-Text: Uses OpenAI Whisper for audio transcription
  • Audio Processing: Uses OpenAI models for language detection and summarization

Local Model Tests (set USE_OPENAI=false):

  • Computer Vision: Uses CLIP and T5 models for local processing
  • Speech-to-Text: Uses local Whisper models and ASR libraries
  • Audio Processing: Uses local NLP models for analysis

API Usage Examples

Authentication

# Generate JWT token
$response = Invoke-RestMethod -Uri "http://localhost:8000/auth/generate-token" -Method POST -ContentType "application/json" -Body '{"user_id": "test-user"}'
$token = $response.access_token

# Use token in requests
$headers = @{ "Authorization" = "Bearer $token" }

General API - File ID Architecture

The General API has been updated to use a file_id-based architecture for improved security and performance:

# Step 1: Upload file (ONLY upload endpoint accepts files)
$uploadResult = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/upload" -Method POST -Form @{ file = Get-Item "document.txt" } -Headers $headers
$fileId = $uploadResult.file_id

# Step 2: Extract text using file_id (NO file upload required)
$extractResult = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/extract-text" -Method POST -Form @{ file_id = $fileId } -Headers $headers

# Step 3: Get metadata using file_id
$metadata = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/metadata/$fileId" -Method GET -Headers $headers

# Step 4: Download file using file_id
Invoke-WebRequest -Uri "http://localhost:8000/api/v1/general/download/$fileId" -Method GET -Headers $headers -OutFile "downloaded_file.txt"

# Step 5: Delete file using file_id
$deleteResult = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/$fileId" -Method DELETE -Headers $headers

General API:

  • Upload: Only /upload endpoint accepts file uploads
  • Extract Text: Now uses file_id parameter instead of file upload
  • Metadata: Uses file_id in URL path
  • Download: Uses file_id in URL path
  • Delete: Uses file_id in URL path
  • Security: No file exposure in processing endpoints
  • Performance: Cached file access and reduced data transfer

File Upload and Processing

# Upload and process image
$form = @{
    file = Get-Item "image.jpg"
    target_format = "png"
    width = 800
    height = 600
}
$result = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/image/convert" -Method POST -Form $form -Headers $headers

Example Scripts & Model Switching

Complete PowerShell examples are available in the examples/ directory with full dual model support:

Image Processing Examples (OpenAI GPT-4o + Local CLIP)

# Run with OpenAI models (default)
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\image_working_test.ps1

# Switch to local models (CLIP for CV)
$env:USE_OPENAI="false"
.\examples\image_working_test.ps1

# Compare OpenAI vs Local model results
$env:USE_OPENAI="true"; .\examples\image_working_test.ps1
$env:USE_OPENAI="false"; .\examples\image_working_test.ps1

Audio Processing Examples (OpenAI Whisper + Local STT)

# Audio transcription with OpenAI Whisper
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\audio_working_test.ps1

# Switch to local STT models (whisper-tiny)
$env:USE_OPENAI="false"
.\examples\audio_working_test.ps1

# Test language detection with both models
$env:USE_OPENAI="true"; .\examples\audio_working_test.ps1  # Uses OpenAI whisper-1
$env:USE_OPENAI="false"; .\examples\audio_working_test.ps1 # Uses local whisper-tiny

Video Processing Examples (OpenAI + Local CV)

# Video analysis with OpenAI GPT-4o
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\video_working_test.ps1

# Test with local models (CLIP + T5)
$env:USE_OPENAI="false"
.\examples\video_working_test.ps1

# Compare AI model outputs
$env:USE_OPENAI="true"; .\examples\video_working_test.ps1  # GPT-4o analysis
$env:USE_OPENAI="false"; .\examples\video_working_test.ps1 # Local model analysis

Complete Test Suite (All Services)

# Run comprehensive tests across all services with OpenAI
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\complete_test_suite.ps1

# Test with local models
$env:USE_OPENAI="false"
.\examples\complete_test_suite.ps1

# Test API endpoints with curl (model-agnostic)
.\examples\curl_test_suite.ps1

Model Configuration Commands (PowerShell)

Switch between OpenAI and local models at runtime:

OpenAI Configuration (Recommended for production)

# Set environment variables for OpenAI models
$env:USE_OPENAI="true"
$env:OPENAI_API_KEY="your-openai-api-key"

# Verify configuration
Write-Output "Model Mode: $env:USE_OPENAI"
Write-Output "API Key Set: $($env:OPENAI_API_KEY -ne $null)"

# Run example with OpenAI
.\examples\audio_working_test.ps1

Local Models Configuration (For offline/private deployments)

# Set environment variables for local models
$env:USE_OPENAI="false"
# Note: Local models are automatically downloaded and cached on first use

# Verify configuration
Write-Output "Model Mode: $env:USE_OPENAI"
Write-Output "Local models will be used (Whisper-tiny, CLIP, T5-small)"

# Run example with local models (no API key required)
.\examples\audio_working_test.ps1

Quick Model Switching for Testing

# Test with OpenAI, then switch to local models
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="test-key"; .\examples\complete_test_suite.ps1
$env:USE_OPENAI="false"; .\examples\complete_test_suite.ps1

# Run specific service tests with model switching
$env:USE_OPENAI="true"; python -m pytest tests/test_cv_service_comprehensive_fixed.py -v
$env:USE_OPENAI="false"; python -m pytest tests/test_cv_service_comprehensive_fixed.py -v

Available Example Scripts (Updated June 2025)

  • general_api_demo.ps1 - Complete General API file_id workflow (Upload → Extract-Text → Metadata → Download → Delete)
  • audio_working_test.ps1 - Audio processing with dual model support (OpenAI Whisper + Local whisper-tiny)
  • video_working_test.ps1 - Video processing with dual model support (OpenAI GPT-4o + Local CLIP/T5)
  • image_working_test.ps1 - Image processing with dual model support (OpenAI GPT-4o + Local CLIP)
  • complete_test_suite.ps1 - Full API test suite with model switching (All endpoints + dual models)
  • pdf_test.ps1 - PDF processing (split, text extraction, cached retrieval)
  • curl_test_suite.ps1 - cURL command examples for all endpoints
  • image_test.ps1 - Legacy image processing test (kept for compatibility)
  • video_test.ps1 - Legacy video processing test (kept for compatibility)

Examples Directory Structure (Updated)

examples/
├── PowerShell Scripts (All with Dual Model Support)
│   ├── audio_working_test.ps1      #  STT: OpenAI Whisper vs Local whisper-tiny
│   ├── video_working_test.ps1      #  CV: OpenAI GPT-4o vs Local CLIP+T5  
│   ├── image_working_test.ps1      #  CV: OpenAI GPT-4o vs Local CLIP
│   ├── complete_test_suite.ps1     #  Full test suite with model switching
│   ├── general_api_demo.ps1        #  Complete file_id workflow demo
│   ├── pdf_test.ps1               #  PDF processing examples
│   └── curl_test_suite.ps1         # cURL command examples
├── Test Media Files (Real files for testing)
│   ├── audio.mp3                   # Real audio file (41KB) - English speech
│   ├── video.mp4                   # Real video file (1.7MB) - Sample video
│   ├── test_image.png              # Real image file (1KB) - Test image
│   └── test.pdf                    # Real PDF file (218KB) - Multi-page document
└── Generated Test Results
    ├── test_debug.png              # Debug output images from processing
    ├── test_final.png              # Final processed images
    └── test_fresh.png              # Fresh test outputs

Dual Model Testing Commands (PowerShell)

All example scripts support environment variable configuration for immediate model switching:

For OpenAI Models (Production)

# Set environment for OpenAI models and run examples
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\audio_working_test.ps1
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\image_working_test.ps1
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\video_working_test.ps1

# Run comprehensive test suite with OpenAI
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\complete_test_suite.ps1

For Local Models (Offline/Private)

# First install local model dependencies (one-time setup)
pip install torch torchvision transformers pillow

# Set environment for local models and run examples (no API key required)
$env:USE_OPENAI="false"; .\examples\audio_working_test.ps1      # Local whisper-tiny + T5-small
$env:USE_OPENAI="false"; .\examples\image_working_test.ps1      # Local CLIP model
$env:USE_OPENAI="false"; .\examples\video_working_test.ps1      # Local CLIP + T5-small

# Run comprehensive test suite with local models
$env:USE_OPENAI="false"; .\examples\complete_test_suite.ps1

# Note: Models are downloaded automatically on first use (~640MB total)

Model Comparison Testing

# Compare outputs between OpenAI and local models
Write-Output "=== Testing with OpenAI Models ==="
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\audio_working_test.ps1

Write-Output "=== Testing with Local Models ==="
$env:USE_OPENAI="false"; .\examples\audio_working_test.ps1

# Results will show model differences in transcription quality, speed, and output format

├── general_api_demo.ps1 # Complete General API demo (file_id architecture) ├── audio_working_test.ps1 # Audio processing with OpenAI/Local models (141 lines) ├── video_working_test.ps1 # Video processing with OpenAI/Local models (164 lines) ├── image_working_test.ps1 # Image processing with OpenAI/Local models (172 lines) ├── pdf_test.ps1 # PDF processing test ├── curl_test_suite.ps1 # cURL commands reference ├── complete_test_suite.ps1 # Comprehensive test suite with model switching ├── image_test.ps1 # Legacy image processing test ├── video_test.ps1 # Legacy video processing test ├── audio.mp3 # Real audio file for testing (41KB) ├── video.mp4 # Real video file for testing (1.7MB) ├── test_debug.png # Test image file (335 bytes) ├── test_final.png # Test image file (335 bytes)
├── test_fresh.png # Test image file (485 bytes) ├── test_image.png # Real image file for testing (1KB) └── test.pdf # Real PDF file for testing (218KB)


### **General API Demo Features** 
**File:** `examples/general_api_demo.ps1`

**Complete File_ID Architecture Test:**
1. **Upload:** `POST /api/v1/general/upload` - Accepts file uploads → Returns `file_id`
2. **Extract-Text:** `POST /api/v1/general/extract-text` - Uses `file_id` (NOT file upload)
3. **Metadata:** `GET /api/v1/general/metadata/{file_id}` - File metadata extraction
4. **Download:** `GET /api/v1/general/download/{file_id}` - File download
5. **Delete:** `DELETE /api/v1/general/{file_id}` - File cleanup


**Sample Output:**

=== General API Complete Test - File ID Architecture === Upload successful! File ID: df90a98c-8f82-4503-bc4c-009aa0d94984 Text extraction successful! Processing Time: 2.149s Metadata retrieval successful! Keys: file_size, content_type, filename, page_count, is_encrypted, pdf_metadata Download successful! Downloaded: 218,741 bytes File deleted successfully!



##  **EXAMPLE WORKFLOWS** (Working & Tested)

### **Pre-requisites for Examples**
```powershell
# 1. Ensure Docker stack is running
docker-compose ps  # All services should be "Up (healthy)"

# 2. Generate authentication token
$token = (Invoke-RestMethod -Uri "http://localhost:8000/auth/generate-token" -Method POST -ContentType "application/json" -Body '{"user_id": "test-user"}').access_token
Write-Host "Token: $token"

Example 1: General API Complete Workflow

# Navigate to examples directory
cd examples

# Run complete General API file_id architecture demo
powershell.exe -ExecutionPolicy Bypass -File general_api_demo.ps1

Example 2: Audio Processing

# Navigate to examples directory
cd examples

# Run audio transcription and conversion workflow
powershell.exe -ExecutionPolicy Bypass -File audio_test.ps1

Example 3: Video Processing

# Test video frame extraction and analysis
powershell.exe -ExecutionPolicy Bypass -File video_test.ps1

Example 4: Image Processing

# Test image processing pipeline
powershell.exe -ExecutionPolicy Bypass -File image_test.ps1

Example 5: PDF Processing

# Test PDF processing capabilities
powershell.exe -ExecutionPolicy Bypass -File pdf_test.ps1


### **OpenAI/Local Model Switching Examples** 

The following examples demonstrate automatic model switching between OpenAI and Local models:

#### **Audio Processing with Model Switching**
```powershell
# Navigate to examples directory
cd examples

# Test with LOCAL models (default)
powershell.exe -ExecutionPolicy Bypass -File audio_working_test.ps1

# Test with OPENAI models (requires OPENAI_API_KEY)
powershell.exe -ExecutionPolicy Bypass -File audio_working_test.ps1 -UseOpenAI

Image Processing with Model Switching

# Test with LOCAL models (ResNet, EfficientNet)
powershell.exe -ExecutionPolicy Bypass -File image_working_test.ps1

# Test with OPENAI models (GPT-4 Vision)
powershell.exe -ExecutionPolicy Bypass -File image_working_test.ps1 -UseOpenAI

Video Processing with Model Switching

# Test with LOCAL models (CLIP, OpenCV)
powershell.exe -ExecutionPolicy Bypass -File video_working_test.ps1

# Test with OPENAI models (GPT-4 Vision for frames)
powershell.exe -ExecutionPolicy Bypass -File video_working_test.ps1 -UseOpenAI

Model Switching Configuration

# Option 1: Environment variable (affects all services)
$env:USE_OPENAI = "true"
$env:OPENAI_API_KEY = "your-api-key"

# Option 2: Parameter-based switching (per test)
powershell.exe -ExecutionPolicy Bypass -File audio_working_test.ps1 -UseOpenAI
powershell.exe -ExecutionPolicy Bypass -File image_working_test.ps1 -UseOpenAI
powershell.exe -ExecutionPolicy Bypass -File video_working_test.ps1 -UseOpenAI

# Option 3: Docker environment switching
# Update docker-compose.yml:
# - USE_OPENAI=true
# - OPENAI_API_KEY=${OPENAI_API_KEY}
docker-compose down && docker-compose up -d

📋 COMPLETE API REFERENCE

AUTHENTICATION SERVICE

Method Endpoint Description OpenAI Model Local Model
POST /api/v1/auth/generate-token Generate JWT token None None
GET /api/v1/auth/verify Verify JWT token None None

Features:

  • JWT token generation with configurable expiration
  • Token verification and user context extraction
  • Secure authentication middleware

GENERAL FILES SERVICE

Method Endpoint Description OpenAI Model Local Model
POST /api/v1/general/upload Upload files to Supabase storage None None
GET /api/v1/general/download/{file_id} Download files by file_id None None
GET /api/v1/general/metadata/{file_id} Extract file metadata None None
POST /api/v1/general/extract-text OCR text extraction (file_id) None None
DELETE /api/v1/general/{file_id} Delete files from storage None None

Architecture:

  • File-ID Based: All processing uses secure file_id parameters
  • Supabase-Only Storage: 100% cloud-native, no local fallback
  • OCR Integration: Tesseract OCR for text extraction
  • Metadata Extraction: EXIF, PDF properties, document analysis

AUDIO/STT SERVICE

Method Endpoint Description OpenAI Model Local Model
POST /api/v1/audio/transcribe Audio transcription with timestamps whisper-1 openai/whisper-tiny
POST /api/v1/audio/detect-language Audio language detection whisper-1 openai/whisper-tiny
POST /api/v1/audio/summarize Transcription + AI summarization whisper-1 + gpt-3.5-turbo whisper-tiny + t5-small
POST /api/v1/audio/convert Audio format conversion None (FFmpeg) None (FFmpeg)

Features:

  • Dual Model Support: OpenAI Whisper-1 OR local Whisper-tiny
  • Language Detection: Automatic language identification
  • VTT Timestamps: WebVTT format with precise timing
  • AI Summarization: GPT-3.5-turbo OR T5-small for summaries
  • Format Conversion: MP3, WAV, M4A, FLAC support

IMAGE/COMPUTER VISION SERVICE

Method Endpoint Description OpenAI Model Local Model
POST /api/v1/image/convert Image format conversion None (PIL) None (PIL)
POST /api/v1/image/classify AI image classification gpt-4o openai/clip-vit-base-patch32
POST /api/v1/image/detect-defects AI defect detection gpt-4o openai/clip-vit-base-patch32
POST /api/v1/image/thumbnail Thumbnail generation None (PIL) None (PIL)

AI Features:

  • Classification: Detailed image description and categorization
  • Defect Detection: Quality control and damage assessment
  • Dual Models: GPT-4o vision OR CLIP for computer vision tasks
  • Format Support: JPEG, PNG, GIF, BMP, TIFF, WebP

VIDEO/COMPUTER VISION SERVICE

Method Endpoint Description OpenAI Model Local Model
POST /api/v1/video/extract-frames Extract frames from video None (OpenCV) None (OpenCV)
POST /api/v1/video/detect-objects AI object detection in video gpt-4o openai/clip-vit-base-patch32
POST /api/v1/video/summarize AI video content summarization gpt-4o clip-vit-base-patch32 + t5-small

AI Features:

  • Object Detection: Identify and describe objects in video frames
  • Content Summarization: AI-powered video content analysis
  • Frame Analysis: Processes first 3-5 frames for optimization
  • Dual Models: GPT-4o vision OR CLIP + T5 for video analysis

PDF SERVICE

Method Endpoint Description OpenAI Model Local Model
POST /api/v1/pdf/split Split PDF by page ranges None (PyPDF2) None (PyPDF2)
GET /api/v1/pdf/text/{cache_id} Retrieve cached OCR text None (Tesseract) None (Tesseract)

Features:

  • Page Range Splitting: Split PDFs by specified page ranges (e.g., "1-3,5-7")
  • Cached Text Retrieval: Redis-cached OCR text with cache_id lookup
  • PDF Processing: PyPDF2 for manipulation, Tesseract for OCR
  • Multiple Cache Patterns: Supports various cache key formats for compatibility

MODEL CONFIGURATION

Environment Control:

USE_OPENAI=true   # Use OpenAI models (whisper-1, gpt-4o, gpt-3.5-turbo)
USE_OPENAI=false  # Use local models (whisper-tiny, clip-vit-base-patch32, t5-small)

Model Performance:

  • OpenAI: Cloud-based, consistent performance, API costs
  • Local: Self-hosted, CUDA acceleration, one-time download costs
  • Optimization: Local models use torch.float16 for CUDA acceleration
  • Initialization: Local models downloaded from Hugging Face on first use

Tests Directory Structure

tests/
├── conftest.py                              # Test configuration and fixtures
├── test_api_endpoints.py                    # API endpoint integration tests
├── test_auth_comprehensive.py               # Authentication system tests
├── test_audio_endpoints_updated.py          # Audio API endpoint tests
├── test_audio_conversion_comprehensive.py   # Audio conversion service tests
├── test_cv_service_comprehensive_fixed.py   # Computer Vision service tests
├── test_general_updated.py                  # General API tests
├── test_image_fileid.py                     # Image API file_id tests
├── test_video_fileid.py                     # Video API file_id tests
├── test_pdf_fileid.py                       # PDF API file_id tests
├── test_storage_comprehensive.py            # Storage service tests
├── test_supabase_storage_comprehensive.py   # Supabase storage tests
├── test_stt_service_fixed.py               # Speech-to-Text service tests
├── test_ocr_service_comprehensive.py        # OCR service tests
├── test_redis_service_comprehensive.py      # Redis caching tests
├── test_kafka_events_comprehensive.py       # Kafka event handling tests
├── test_metadata_service_comprehensive.py   # Metadata extraction tests
├── test_middleware_comprehensive.py         # Middleware tests
└── __init__.py                              # Package initialization

Note: Tests require Docker environment with all dependencies installed:

# Run tests in Docker container
docker-compose up -d
docker-compose exec ai-file-utilities python -m pytest tests/ -v

About

Production-ready AI File Processing Microservice with 91% test coverage - FastAPI + OpenAI + Docker + Kafka + Redis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages