AI File Utilities Microservice

A production-ready FastAPI microservice for AI-powered file processing with comprehensive enterprise features. Deployed and tested with 91% coverage across 439 tests with dual AI model support (OpenAI + Local Models).

PRODUCTION ARCHITECTURE

┌─────────────────────────────────────────────────────────────┐
│             AI File Utilities v1.0.0 - DEPLOYED            │
├─────────────────────────────────────────────────────────────┤
│   FastAPI + JWT Auth                    Port: 8000        │
│  ├── General Files (/api/v1/general)                       │
│  ├── Image Processing (/api/v1/image)                      │
│  ├── Audio/STT (/api/v1/audio)                            │
│  ├── Video Analysis (/api/v1/video)                       │
│  └── PDF Operations (/api/v1/pdf)                         │
├─────────────────────────────────────────────────────────────┤
│   Cloud Integration Stack                                │
│  ├── Redis Cloud (Caching + Locks)                        │
│  ├── Supabase Storage (Files + Database)                  │
│  ├── OpenAI GPT-4 + Whisper OR Local Models (CLIP)      │
│  └── Kafka Events (Docker Local)                          │
├─────────────────────────────────────────────────────────────┤
│   Monitoring & Observability                            │
│  ├── Prometheus Metrics         Port: 9090                │
│  ├── Grafana Dashboard          Port: 3000                │
│  ├── Structured JSON Logging                              │
│  └── Health Checks + Service Status                       │
└─────────────────────────────────────────────────────────────┘

API Coverage - 20 Endpoints

| Service Category | Endpoints | Key Features | |-----------------|-----------|--------|--------------| | Authentication | 2 endpoints | JWT tokens, secure auth | | General Files | 5 endpoints | File_ID architecture, upload, metadata extraction, text extraction, download, delete | | Image Processing | 4 endpoints | Convert, classify, defect detection, thumbnails | | Audio Processing | 4 endpoints | Transcription with timestamps, language detection, summarization, conversion | | Video Processing | 3 endpoints | Frame extraction, object detection, summarization | | PDF Operations | 2 endpoints | PDF splitting, cached text retrieval |

File Processing Capabilities

File ID Architecture: Secure file_id-based processing for all operations
Multi-format Support: Images, Videos, Audio, PDFs, Documents
Dual AI Models: OpenAI GPT-4/Whisper OR Local Models (CLIP for CV, Whisper for STT)
Format Conversion: Image/audio conversion, PDF operations
Metadata Extraction: EXIF, PDF properties, video specs
OCR + Text Extraction: Tesseract integration

AI MODELS & SERVICE ARCHITECTURE

Complete Model Mapping Table - OpenAI vs Local Models

Service	API Endpoint	OpenAI Model (USE_OPENAI=true)	Local Model (USE_OPENAI=false)	Local Model Recommendations	Purpose
Audio/STT	`POST /api/v1/audio/transcribe`	`whisper-1`	`openai/whisper-tiny`	Better: `whisper-base` (290MB) or `whisper-small` (966MB) for higher accuracy. Best: `whisper-medium` (3GB) for production quality. Enable GPU with `device="cuda"`	Audio transcription with timestamps
Audio/STT	`POST /api/v1/audio/detect-language`	`whisper-1`	`openai/whisper-tiny`	Better: `whisper-base` for more reliable language detection. Use longer audio samples (>10s) for better accuracy	Audio language detection
Audio/STT	`POST /api/v1/audio/summarize`	`whisper-1` + `gpt-3.5-turbo`	`openai/whisper-tiny` + `t5-small`	STT: Use `whisper-base` for cleaner transcripts. Summary: Upgrade to `t5-base` (850MB) or `flan-t5-base` for better summarization quality	Transcription + summarization
Audio/STT	`POST /api/v1/audio/convert`	No AI (FFmpeg)	No AI (FFmpeg)	No AI - Pure FFmpeg processing	Audio format conversion
Image/CV	`POST /api/v1/image/classify`	`gpt-4o`	`openai/clip-vit-base-patch32`	Better: `clip-vit-base-patch16` (350MB) for higher resolution analysis. Best: `clip-vit-large-patch14` (1.7GB) for production. Use image preprocessing (resize, normalize) for stability	Image classification & description
Image/CV	`POST /api/v1/image/detect-defects`	`gpt-4o`	`openai/clip-vit-base-patch32`	Better: `clip-vit-base-patch16` + custom defect categories. Advanced: Fine-tune CLIP on defect datasets or use specialized models like `microsoft/DiT-base` for industrial QA	Defect detection & quality analysis
Image/CV	`POST /api/v1/image/convert`	No AI (PIL)	No AI (PIL)	No AI - Pure PIL processing	Image format conversion
Image/CV	`POST /api/v1/image/thumbnail`	No AI (PIL)	No AI (PIL)	No AI - Pure PIL processing	Thumbnail generation
Video/CV	`POST /api/v1/video/extract-frames`	No AI (OpenCV)	No AI (OpenCV)	No AI - Pure OpenCV processing	Frame extraction
Video/CV	`POST /api/v1/video/detect-objects`	`gpt-4o`	`openai/clip-vit-base-patch32`	Better: Use `clip-vit-base-patch16` + frame sampling strategy (every 5th frame). Advanced: YOLOv8 or DETR models for real object detection vs semantic analysis	Object detection in video frames
Video/CV	`POST /api/v1/video/summarize`	`gpt-4o`	`openai/clip-vit-base-patch32` + `t5-small`	CV: Use `clip-vit-base-patch16` for better frame analysis. Text Gen: Upgrade to `t5-base` or `bart-large-cnn` (1.6GB) for video summarization. Implement keyframe extraction	Video content summarization
PDF	`POST /api/v1/pdf/split`	No AI (PyPDF2)	No AI (PyPDF2)	No AI - Pure PyPDF2 processing	PDF page splitting
PDF	`GET /api/v1/pdf/text/{cache_id}`	No AI (Tesseract OCR)	No AI (Tesseract OCR)	Better OCR: Use `PaddleOCR` or `EasyOCR` for higher accuracy. Pre-process images (deskew, denoise) before OCR	Cached OCR text retrieval
General	`POST /api/v1/general/extract-text`	No AI (Tesseract OCR)	No AI (Tesseract OCR)	Better OCR: Upgrade to `PaddleOCR` (multi-language) or `TrOCR` transformer model for document text extraction	Document text extraction

Model Specifications:

OpenAI Models:

whisper-1: OpenAI's production Whisper model for audio transcription
gpt-3.5-turbo: Text summarization (200 tokens max)
gpt-4o: Vision model for image/video analysis (200-300 tokens max)

Local Models:

openai/whisper-tiny: 39M parameter Whisper model (fast initialization, 244MB download)
- Upgrade Path: whisper-base (290MB) → whisper-small (966MB) → whisper-medium (3GB)
- Best For: Development/testing, low-resource environments
t5-small: 60M parameter T5 model for text summarization (242MB download)
- Upgrade Path: t5-base (850MB) → flan-t5-base (990MB) → bart-large-cnn (1.6GB)
- Best For: Quick summarization, limited GPU memory
openai/clip-vit-base-patch32: 151MB CLIP model for computer vision (optimized for speed)
- Upgrade Path: clip-vit-base-patch16 (350MB) → clip-vit-large-patch14 (1.7GB)
- Best For: Fast image classification, real-time processing

Local Model Optimization Strategies:

Performance Optimization:

GPU Acceleration: All models support CUDA with torch.float16 optimization
Model Quantization: Use 8-bit or 4-bit quantization for memory reduction
Batch Processing: Process multiple files simultaneously for better throughput
Model Caching: Models stay loaded in memory after first use

Quality Improvements:

Whisper Audio: Use 16kHz sampling rate, reduce background noise, longer audio clips (>10s)
CLIP Vision: Resize images to optimal dimensions (224x224 or 336x336), proper normalization
T5 Text: Provide context, use appropriate prompt engineering, limit output length

Production Recommendations:

Memory Requirements:
- Tiny models: 2-4GB RAM
- Base models: 6-8GB RAM
- Large models: 12-16GB RAM + GPU
Storage: Pre-download models to avoid cold starts
Monitoring: Track model inference times and memory usage

Model Selection Logic:

Environment Variable: USE_OPENAI=true|false controls model selection
Automatic Fallback: No fallback - pure OpenAI or pure local model operation
Performance: Local models optimized with torch.float16 for CUDA acceleration
Initialization: Local models downloaded and cached on first use from Hugging Face

Enterprise Security

JWT Authentication: Secure token-based auth with expiration
Authorization Middleware: Protected endpoints with user context
Input Validation: Comprehensive request/response validation
Secure Headers: CORS, security headers implemented

Modern File Processing Architecture

File ID-Based Operations: All processing endpoints use secure file_id parameters instead of file uploads
Supabase-Only Storage: 100% cloud-native storage with NO local fallback - all files stored in Supabase buckets
Decoupled Storage: Files uploaded once via /upload, then processed by reference
General API Refactored: Extract-text endpoint now uses file_id instead of file upload
Metadata Bug Fixed: Async metadata service now properly awaits coroutines (was causing 500 errors)
Enhanced Security: No direct file exposure in processing endpoints
Improved Performance: Cached file access and reduced data transfer
Async Processing: Background processing with event-driven architecture
Consistent Architecture: All APIs (general, image, audio, video, PDF) use file_id pattern

Production Monitoring & Test Coverage

91% Test Coverage (439 tests passing across all APIs - June 2025)
Comprehensive Module Coverage:
- CV Service: 91% coverage (387 statements, 34 missing) - Full OpenAI/Local support
- STT Service: 86% coverage (366 statements, 51 missing) - Full OpenAI/Local support
- Audio API: 95% coverage (188 statements, 10 missing) - Dual model endpoints
- Image API: 94% coverage (197 statements, 11 missing) - Complete file_id architecture
- Video API: 96% coverage (141 statements, 6 missing) - Frame extraction + AI analysis
- PDF API: 100% coverage (81 statements, 0 missing) - Split + cached text
- General API: 78% coverage (185 statements, 41 missing) - File_id refactoring
- Auth API: 100% coverage (35 statements, 0 missing) - JWT token management
- Storage Services: 93-100% coverage - Redis (100%), Supabase (100%), Local Storage (93%)
Prometheus Metrics: Request counters, duration histograms
Structured Logging: JSON logs with correlation IDs
Health Endpoints: /health with service status
Performance Tracking: Response time monitoring
Dual Model Testing: All CV, STT, and Audio services tested with both OpenAI and local models

☁️ Cloud-Ready Infrastructure

Redis Cloud: Distributed caching and session management
Supabase Storage: Scalable file storage with CDN (100% cloud-native, no local fallback)
Kafka Events: Async event processing for workflows
Docker Orchestration: Multi-service deployment ready

QUICK START (Production Tested)

Option 1: Full Docker Stack Recommended

# 1. Clone repository
git clone <repository-url>
cd ai-file-utilities

# 2. Configure environment
Copy-Item .env.example .env
# Edit .env with your API keys (see Configuration section)

# 3. Deploy full stack
docker-compose up -d

# 4. Verify deployment
docker-compose ps
# Expected: All services "Up (healthy)"

# 5. Test API
Invoke-RestMethod "http://localhost:8000/health"

Services deployed:

API: http://localhost:8000 (Interactive docs: /docs)
Grafana: http://localhost:3000 (admin/admin)
Prometheus: http://localhost:9090
Redis: localhost:6379
Kafka: localhost:9092

Option 2: Virtual Environment Setup

# Create and activate virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

# Set environment variables
$env:OPENAI_API_KEY = "your_openai_api_key_here"
$env:SUPABASE_URL = "your_supabase_url"
$env:SUPABASE_KEY = "your_supabase_anon_key"

# Install Tesseract OCR (Windows)
# Download from: https://github.com/UB-Mannheim/tesseract/wiki
# Add to PATH or set TESSERACT_CMD environment variable

# Run the service
uvicorn ai_file_utilities.main:app --reload --host 0.0.0.0 --port 8000

Configuration

Environment Variables

Variable	Description	Default	Required
`OPENAI_API_KEY`	OpenAI API key for AI features	-	Yes
`SUPABASE_URL`	Supabase project URL	-	No
`SUPABASE_KEY`	Supabase anon key	-	No
`JWT_SECRET_KEY`	Secret for JWT token signing	auto-generated	No
`REDIS_URL`	Redis connection URL	`redis://localhost:6379`	No
`KAFKA_BOOTSTRAP_SERVERS`	Kafka brokers	`localhost:9092`	No
`KAFKA_ENABLED`	Enable Kafka events	`true`	No
`UPLOAD_DIR`	Local upload directory	`./uploads`	No
`TESSERACT_CMD`	Tesseract executable path	`tesseract`	No

CONFIGURATION SETUP

Required Environment Variables

# Core AI Services
OPENAI_API_KEY=sk-your-openai-api-key-here

# JWT Authentication
JWT_SECRET=your-secure-jwt-secret-minimum-64-characters
JWT_ALGORITHM=HS256
JWT_EXPIRATION_HOURS=24

# Supabase Cloud Storage
SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
SUPABASE_BUCKET=ai-file-utilities

Supabase Storage Setup (Step-by-step)

1. Create Supabase Project

# 1. Go to https://supabase.com and sign up
# 2. Create new project (takes 2-3 minutes)
# 3. Navigate to Project Settings → API
# 4. Copy Project URL and API keys

2. Create Storage Bucket

-- In Supabase Dashboard → Storage, create bucket:
-- Name: ai-file-utilities
-- Public: false (private)
-- File size limit: 100MB

3. Set Bucket Policies (Important!)

-- Go to Storage → Policies and add these:

-- Allow uploads for authenticated users
CREATE POLICY "Allow upload for authenticated users" 
ON storage.objects FOR INSERT 
WITH CHECK (bucket_id = 'ai-file-utilities');

-- Allow downloads for authenticated users  
CREATE POLICY "Allow download for authenticated users" 
ON storage.objects FOR SELECT 
USING (bucket_id = 'ai-file-utilities');

-- Allow deletions for authenticated users
CREATE POLICY "Allow delete for authenticated users" 
ON storage.objects FOR DELETE 
USING (bucket_id = 'ai-file-utilities');

4. Test Supabase Integration

# After configuring .env, test with health check:
Invoke-RestMethod "http://localhost:8000/health" | ConvertTo-Json

# Should show: "supabase": "configured"

Additional Configuration

Redis Cloud Setup (Optional)

# For production, use Redis Cloud:
REDIS_ENABLED=true
REDIS_URL=redis://default:password@host:port
REDIS_DEFAULT_TTL=3600

Kafka Configuration

# For Docker (local development):
KAFKA_ENABLED=true
KAFKA_BOOTSTRAP_SERVERS=kafka:9092

# For production Kafka cluster:
# KAFKA_BOOTSTRAP_SERVERS=your-kafka-cluster:9092

OCR Setup

# Windows: Download Tesseract OCR
# https://github.com/UB-Mannheim/tesseract/wiki
# Add to PATH or set:
$env:TESSERACT_CMD = "C:\Program Files\Tesseract-OCR\tesseract.exe"

Deployment Options

Production Environment Variables

# For production deployment:
LOG_LEVEL=INFO
DEBUG=false
MAX_WORKERS=4
REQUEST_TIMEOUT=300
PROCESSING_TIMEOUT=600

Local Development

# For development:
LOG_LEVEL=DEBUG
DEBUG=true
RELOAD=true
TEST_MODE=false

Testing & Model Switching

Run All Tests with Dual Model Support

For OpenAI Model Testing (Recommended)

# Set environment variables for OpenAI testing
$env:USE_OPENAI="true"
$env:OPENAI_API_KEY="test-key"

# Run all tests with coverage
python -m pytest tests/ --cov=ai_file_utilities --cov-report=html --cov-report=xml --cov-report=term-missing

# View HTML coverage report
start htmlcov/index.html

For Local Model Testing

# First install local model dependencies (if not already installed)
pip install torch torchvision transformers pillow

# Set environment variables for local model testing
$env:USE_OPENAI="false"

# Run tests (now with local model dependencies available)
python -m pytest tests/ --cov=ai_file_utilities --cov-report=html --cov-report=xml --cov-report=term-missing

# Note: Local models will be downloaded automatically on first use
# Models downloaded: whisper-tiny (~244MB), CLIP (~151MB), T5-small (~242MB)

Quick Test Run (No Coverage)

# Fast test run with OpenAI mocking
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="test-key"; python -m pytest tests/ -v

# Fast test run with local models
$env:USE_OPENAI="false"; python -m pytest tests/ -v

PowerShell Environment Commands for Model Switching

Testing with OpenAI Models

# Enable OpenAI models for testing
$env:USE_OPENAI="true"
$env:OPENAI_API_KEY="test-key"

# Run specific test modules
python -m pytest tests/test_cv_service_comprehensive_fixed.py -v
python -m pytest tests/test_stt_service_fixed.py -v
python -m pytest tests/test_audio_endpoints_updated.py -v

Testing with Local Models

# First install local model dependencies
pip install torch torchvision transformers pillow

# Enable local models for testing
$env:USE_OPENAI="false"

# Run dual-model test suites
python -m pytest tests/test_cv_service_comprehensive_fixed.py -v
python -m pytest tests/test_stt_service_fixed.py -v
python -m pytest tests/test_audio_endpoints_updated.py -v

# Note: First run will download models (~640MB total)

Test Categories & Dual Model Coverage

API Endpoint Tests: Complete coverage of all REST endpoints with dual model support
CV Service Tests: Both OpenAI GPT-4o and local CLIP model testing (65 tests)
STT Service Tests: OpenAI Whisper and local Whisper model testing (55 tests)
Audio Processing Tests: Dual model transcription, language detection, summarization (30 tests)
Service Layer Tests: Storage, OCR, metadata services with comprehensive mocking
Integration Tests: End-to-end workflows with environment variable switching
File_ID Architecture Tests: All processing endpoints use secure file_id parameters

Current Test Results: 91% Coverage

Total Tests: 439 tests across 19 comprehensive test modules
Test Status: All tests passing (0 failures)
Coverage: 91% overall code coverage (2412 statements, 219 missing)
Dual Model Test Support: Both OpenAI and local model paths fully tested
Test Files:
- test_cv_service_comprehensive_fixed.py - CV service with OpenAI + local models (65 tests)
- test_stt_service_fixed.py - STT service with OpenAI Whisper + local models (55 tests)
- test_audio_endpoints_updated.py - Audio API with dual model support (30 tests)
- test_image_fileid.py - Image API endpoints with file_id architecture (20 tests)
- test_video_fileid.py - Video API endpoints (28 tests)
- test_*_comprehensive.py - Complete service coverage (241 additional tests)

Service Coverage Breakdown

CV Service: 91% coverage (387 statements, 34 missing) - Full OpenAI/Local support
STT Service: 86% coverage (366 statements, 51 missing) - Full OpenAI/Local support
Audio API: 95% coverage (188 statements, 10 missing) - Dual model endpoints
Image API: 94% coverage (197 statements, 11 missing) - Complete file_id architecture
Video API: 96% coverage (141 statements, 6 missing) - Frame extraction + AI analysis
PDF API: 100% coverage (81 statements, 0 missing) - Split + cached text
General API: 78% coverage (185 statements, 41 missing) - File_id refactoring
Auth API: 100% coverage (35 statements, 0 missing) - JWT token management
Storage Services: 93-100% coverage - Redis (100%), Supabase (100%), Local Storage (93%)
Utility Services: 89-100% coverage - Metadata (100%), OCR (100%), Kafka (95%)
General API: 78% coverage (file upload, metadata, text extraction)
Image API: 94% coverage (classification, defect detection, conversion, thumbnails)
Audio API: 95% coverage (transcription, language detection, conversion)
Video API: 96% coverage (frame extraction, object detection, summarization)
PDF API: 100% coverage (text extraction, metadata, cached retrieval)
CV Service: 91% coverage (dual OpenAI/local model support)
STT Service: 86% coverage (dual OpenAI Whisper/local model support)
Storage Service: 93% coverage (Supabase integration)
Redis Service: 100% coverage (caching and distributed locks)
Kafka Events: 95% coverage (event publishing)

Model Support in Tests

The test suite supports both OpenAI and local models:

OpenAI Model Tests (Default - set USE_OPENAI=true):

Computer Vision: Uses GPT-4 Vision for image classification and defect detection
Speech-to-Text: Uses OpenAI Whisper for audio transcription
Audio Processing: Uses OpenAI models for language detection and summarization

Local Model Tests (set USE_OPENAI=false):

Computer Vision: Uses CLIP and T5 models for local processing
Speech-to-Text: Uses local Whisper models and ASR libraries
Audio Processing: Uses local NLP models for analysis

API Usage Examples

Authentication

# Generate JWT token
$response = Invoke-RestMethod -Uri "http://localhost:8000/auth/generate-token" -Method POST -ContentType "application/json" -Body '{"user_id": "test-user"}'
$token = $response.access_token

# Use token in requests
$headers = @{ "Authorization" = "Bearer $token" }

General API - File ID Architecture

The General API has been updated to use a file_id-based architecture for improved security and performance:

# Step 1: Upload file (ONLY upload endpoint accepts files)
$uploadResult = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/upload" -Method POST -Form @{ file = Get-Item "document.txt" } -Headers $headers
$fileId = $uploadResult.file_id

# Step 2: Extract text using file_id (NO file upload required)
$extractResult = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/extract-text" -Method POST -Form @{ file_id = $fileId } -Headers $headers

# Step 3: Get metadata using file_id
$metadata = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/metadata/$fileId" -Method GET -Headers $headers

# Step 4: Download file using file_id
Invoke-WebRequest -Uri "http://localhost:8000/api/v1/general/download/$fileId" -Method GET -Headers $headers -OutFile "downloaded_file.txt"

# Step 5: Delete file using file_id
$deleteResult = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/general/$fileId" -Method DELETE -Headers $headers

General API:

Upload: Only /upload endpoint accepts file uploads
Extract Text: Now uses file_id parameter instead of file upload
Metadata: Uses file_id in URL path
Download: Uses file_id in URL path
Delete: Uses file_id in URL path
Security: No file exposure in processing endpoints
Performance: Cached file access and reduced data transfer

File Upload and Processing

# Upload and process image
$form = @{
    file = Get-Item "image.jpg"
    target_format = "png"
    width = 800
    height = 600
}
$result = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/image/convert" -Method POST -Form $form -Headers $headers

Example Scripts & Model Switching

Complete PowerShell examples are available in the examples/ directory with full dual model support:

Image Processing Examples (OpenAI GPT-4o + Local CLIP)

# Run with OpenAI models (default)
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\image_working_test.ps1

# Switch to local models (CLIP for CV)
$env:USE_OPENAI="false"
.\examples\image_working_test.ps1

# Compare OpenAI vs Local model results
$env:USE_OPENAI="true"; .\examples\image_working_test.ps1
$env:USE_OPENAI="false"; .\examples\image_working_test.ps1

Audio Processing Examples (OpenAI Whisper + Local STT)

# Audio transcription with OpenAI Whisper
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\audio_working_test.ps1

# Switch to local STT models (whisper-tiny)
$env:USE_OPENAI="false"
.\examples\audio_working_test.ps1

# Test language detection with both models
$env:USE_OPENAI="true"; .\examples\audio_working_test.ps1  # Uses OpenAI whisper-1
$env:USE_OPENAI="false"; .\examples\audio_working_test.ps1 # Uses local whisper-tiny

Video Processing Examples (OpenAI + Local CV)

# Video analysis with OpenAI GPT-4o
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\video_working_test.ps1

# Test with local models (CLIP + T5)
$env:USE_OPENAI="false"
.\examples\video_working_test.ps1

# Compare AI model outputs
$env:USE_OPENAI="true"; .\examples\video_working_test.ps1  # GPT-4o analysis
$env:USE_OPENAI="false"; .\examples\video_working_test.ps1 # Local model analysis

Complete Test Suite (All Services)

# Run comprehensive tests across all services with OpenAI
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key-here"
.\examples\complete_test_suite.ps1

# Test with local models
$env:USE_OPENAI="false"
.\examples\complete_test_suite.ps1

# Test API endpoints with curl (model-agnostic)
.\examples\curl_test_suite.ps1

Model Configuration Commands (PowerShell)

Switch between OpenAI and local models at runtime:

OpenAI Configuration (Recommended for production)

# Set environment variables for OpenAI models
$env:USE_OPENAI="true"
$env:OPENAI_API_KEY="your-openai-api-key"

# Verify configuration
Write-Output "Model Mode: $env:USE_OPENAI"
Write-Output "API Key Set: $($env:OPENAI_API_KEY -ne $null)"

# Run example with OpenAI
.\examples\audio_working_test.ps1

Local Models Configuration (For offline/private deployments)

# Set environment variables for local models
$env:USE_OPENAI="false"
# Note: Local models are automatically downloaded and cached on first use

# Verify configuration
Write-Output "Model Mode: $env:USE_OPENAI"
Write-Output "Local models will be used (Whisper-tiny, CLIP, T5-small)"

# Run example with local models (no API key required)
.\examples\audio_working_test.ps1

Quick Model Switching for Testing

# Test with OpenAI, then switch to local models
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="test-key"; .\examples\complete_test_suite.ps1
$env:USE_OPENAI="false"; .\examples\complete_test_suite.ps1

# Run specific service tests with model switching
$env:USE_OPENAI="true"; python -m pytest tests/test_cv_service_comprehensive_fixed.py -v
$env:USE_OPENAI="false"; python -m pytest tests/test_cv_service_comprehensive_fixed.py -v

Available Example Scripts (Updated June 2025)

general_api_demo.ps1 - Complete General API file_id workflow (Upload → Extract-Text → Metadata → Download → Delete)
audio_working_test.ps1 - Audio processing with dual model support (OpenAI Whisper + Local whisper-tiny)
video_working_test.ps1 - Video processing with dual model support (OpenAI GPT-4o + Local CLIP/T5)
image_working_test.ps1 - Image processing with dual model support (OpenAI GPT-4o + Local CLIP)
complete_test_suite.ps1 - Full API test suite with model switching (All endpoints + dual models)
pdf_test.ps1 - PDF processing (split, text extraction, cached retrieval)
curl_test_suite.ps1 - cURL command examples for all endpoints
image_test.ps1 - Legacy image processing test (kept for compatibility)
video_test.ps1 - Legacy video processing test (kept for compatibility)

Examples Directory Structure (Updated)

examples/
├── PowerShell Scripts (All with Dual Model Support)
│   ├── audio_working_test.ps1      #  STT: OpenAI Whisper vs Local whisper-tiny
│   ├── video_working_test.ps1      #  CV: OpenAI GPT-4o vs Local CLIP+T5  
│   ├── image_working_test.ps1      #  CV: OpenAI GPT-4o vs Local CLIP
│   ├── complete_test_suite.ps1     #  Full test suite with model switching
│   ├── general_api_demo.ps1        #  Complete file_id workflow demo
│   ├── pdf_test.ps1               #  PDF processing examples
│   └── curl_test_suite.ps1         # cURL command examples
├── Test Media Files (Real files for testing)
│   ├── audio.mp3                   # Real audio file (41KB) - English speech
│   ├── video.mp4                   # Real video file (1.7MB) - Sample video
│   ├── test_image.png              # Real image file (1KB) - Test image
│   └── test.pdf                    # Real PDF file (218KB) - Multi-page document
└── Generated Test Results
    ├── test_debug.png              # Debug output images from processing
    ├── test_final.png              # Final processed images
    └── test_fresh.png              # Fresh test outputs

Dual Model Testing Commands (PowerShell)

All example scripts support environment variable configuration for immediate model switching:

For OpenAI Models (Production)

# Set environment for OpenAI models and run examples
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\audio_working_test.ps1
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\image_working_test.ps1
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\video_working_test.ps1

# Run comprehensive test suite with OpenAI
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\complete_test_suite.ps1

For Local Models (Offline/Private)

# First install local model dependencies (one-time setup)
pip install torch torchvision transformers pillow

# Set environment for local models and run examples (no API key required)
$env:USE_OPENAI="false"; .\examples\audio_working_test.ps1      # Local whisper-tiny + T5-small
$env:USE_OPENAI="false"; .\examples\image_working_test.ps1      # Local CLIP model
$env:USE_OPENAI="false"; .\examples\video_working_test.ps1      # Local CLIP + T5-small

# Run comprehensive test suite with local models
$env:USE_OPENAI="false"; .\examples\complete_test_suite.ps1

# Note: Models are downloaded automatically on first use (~640MB total)

Model Comparison Testing

# Compare outputs between OpenAI and local models
Write-Output "=== Testing with OpenAI Models ==="
$env:USE_OPENAI="true"; $env:OPENAI_API_KEY="your-key"; .\examples\audio_working_test.ps1

Write-Output "=== Testing with Local Models ==="
$env:USE_OPENAI="false"; .\examples\audio_working_test.ps1

# Results will show model differences in transcription quality, speed, and output format

├── general_api_demo.ps1 # Complete General API demo (file_id architecture) ├── audio_working_test.ps1 # Audio processing with OpenAI/Local models (141 lines) ├── video_working_test.ps1 # Video processing with OpenAI/Local models (164 lines) ├── image_working_test.ps1 # Image processing with OpenAI/Local models (172 lines) ├── pdf_test.ps1 # PDF processing test ├── curl_test_suite.ps1 # cURL commands reference ├── complete_test_suite.ps1 # Comprehensive test suite with model switching ├── image_test.ps1 # Legacy image processing test ├── video_test.ps1 # Legacy video processing test ├── audio.mp3 # Real audio file for testing (41KB) ├── video.mp4 # Real video file for testing (1.7MB) ├── test_debug.png # Test image file (335 bytes) ├── test_final.png # Test image file (335 bytes)
├── test_fresh.png # Test image file (485 bytes) ├── test_image.png # Real image file for testing (1KB) └── test.pdf # Real PDF file for testing (218KB)


### **General API Demo Features** 
**File:** `examples/general_api_demo.ps1`

**Complete File_ID Architecture Test:**
1. **Upload:** `POST /api/v1/general/upload` - Accepts file uploads → Returns `file_id`
2. **Extract-Text:** `POST /api/v1/general/extract-text` - Uses `file_id` (NOT file upload)
3. **Metadata:** `GET /api/v1/general/metadata/{file_id}` - File metadata extraction
4. **Download:** `GET /api/v1/general/download/{file_id}` - File download
5. **Delete:** `DELETE /api/v1/general/{file_id}` - File cleanup


**Sample Output:**

=== General API Complete Test - File ID Architecture === Upload successful! File ID: df90a98c-8f82-4503-bc4c-009aa0d94984 Text extraction successful! Processing Time: 2.149s Metadata retrieval successful! Keys: file_size, content_type, filename, page_count, is_encrypted, pdf_metadata Download successful! Downloaded: 218,741 bytes File deleted successfully!



##  **EXAMPLE WORKFLOWS** (Working & Tested)

### **Pre-requisites for Examples**
```powershell
# 1. Ensure Docker stack is running
docker-compose ps  # All services should be "Up (healthy)"

# 2. Generate authentication token
$token = (Invoke-RestMethod -Uri "http://localhost:8000/auth/generate-token" -Method POST -ContentType "application/json" -Body '{"user_id": "test-user"}').access_token
Write-Host "Token: $token"

Example 1: General API Complete Workflow

# Navigate to examples directory
cd examples

# Run complete General API file_id architecture demo
powershell.exe -ExecutionPolicy Bypass -File general_api_demo.ps1

Example 2: Audio Processing

# Navigate to examples directory
cd examples

# Run audio transcription and conversion workflow
powershell.exe -ExecutionPolicy Bypass -File audio_test.ps1

Example 3: Video Processing

# Test video frame extraction and analysis
powershell.exe -ExecutionPolicy Bypass -File video_test.ps1

Example 4: Image Processing

# Test image processing pipeline
powershell.exe -ExecutionPolicy Bypass -File image_test.ps1

Example 5: PDF Processing

# Test PDF processing capabilities
powershell.exe -ExecutionPolicy Bypass -File pdf_test.ps1


### **OpenAI/Local Model Switching Examples** 

The following examples demonstrate automatic model switching between OpenAI and Local models:

#### **Audio Processing with Model Switching**
```powershell
# Navigate to examples directory
cd examples

# Test with LOCAL models (default)
powershell.exe -ExecutionPolicy Bypass -File audio_working_test.ps1

# Test with OPENAI models (requires OPENAI_API_KEY)
powershell.exe -ExecutionPolicy Bypass -File audio_working_test.ps1 -UseOpenAI

Image Processing with Model Switching

# Test with LOCAL models (ResNet, EfficientNet)
powershell.exe -ExecutionPolicy Bypass -File image_working_test.ps1

# Test with OPENAI models (GPT-4 Vision)
powershell.exe -ExecutionPolicy Bypass -File image_working_test.ps1 -UseOpenAI

Video Processing with Model Switching

# Test with LOCAL models (CLIP, OpenCV)
powershell.exe -ExecutionPolicy Bypass -File video_working_test.ps1

# Test with OPENAI models (GPT-4 Vision for frames)
powershell.exe -ExecutionPolicy Bypass -File video_working_test.ps1 -UseOpenAI

Model Switching Configuration

# Option 1: Environment variable (affects all services)
$env:USE_OPENAI = "true"
$env:OPENAI_API_KEY = "your-api-key"

# Option 2: Parameter-based switching (per test)
powershell.exe -ExecutionPolicy Bypass -File audio_working_test.ps1 -UseOpenAI
powershell.exe -ExecutionPolicy Bypass -File image_working_test.ps1 -UseOpenAI
powershell.exe -ExecutionPolicy Bypass -File video_working_test.ps1 -UseOpenAI

# Option 3: Docker environment switching
# Update docker-compose.yml:
# - USE_OPENAI=true
# - OPENAI_API_KEY=${OPENAI_API_KEY}
docker-compose down && docker-compose up -d

📋 COMPLETE API REFERENCE

AUTHENTICATION SERVICE

Method	Endpoint	Description	OpenAI Model	Local Model
`POST`	`/api/v1/auth/generate-token`	Generate JWT token	None	None
`GET`	`/api/v1/auth/verify`	Verify JWT token	None	None

Features:

JWT token generation with configurable expiration
Token verification and user context extraction
Secure authentication middleware

GENERAL FILES SERVICE

Method	Endpoint	Description	OpenAI Model	Local Model
`POST`	`/api/v1/general/upload`	Upload files to Supabase storage	None	None
`GET`	`/api/v1/general/download/{file_id}`	Download files by file_id	None	None
`GET`	`/api/v1/general/metadata/{file_id}`	Extract file metadata	None	None
`POST`	`/api/v1/general/extract-text`	OCR text extraction (file_id)	None	None
`DELETE`	`/api/v1/general/{file_id}`	Delete files from storage	None	None

Architecture:

File-ID Based: All processing uses secure file_id parameters
Supabase-Only Storage: 100% cloud-native, no local fallback
OCR Integration: Tesseract OCR for text extraction
Metadata Extraction: EXIF, PDF properties, document analysis

AUDIO/STT SERVICE

Method	Endpoint	Description	OpenAI Model	Local Model
`POST`	`/api/v1/audio/transcribe`	Audio transcription with timestamps	`whisper-1`	`openai/whisper-tiny`
`POST`	`/api/v1/audio/detect-language`	Audio language detection	`whisper-1`	`openai/whisper-tiny`
`POST`	`/api/v1/audio/summarize`	Transcription + AI summarization	`whisper-1` + `gpt-3.5-turbo`	`whisper-tiny` + `t5-small`
`POST`	`/api/v1/audio/convert`	Audio format conversion	None (FFmpeg)	None (FFmpeg)

Features:

Dual Model Support: OpenAI Whisper-1 OR local Whisper-tiny
Language Detection: Automatic language identification
VTT Timestamps: WebVTT format with precise timing
AI Summarization: GPT-3.5-turbo OR T5-small for summaries
Format Conversion: MP3, WAV, M4A, FLAC support

IMAGE/COMPUTER VISION SERVICE

Method	Endpoint	Description	OpenAI Model	Local Model
`POST`	`/api/v1/image/convert`	Image format conversion	None (PIL)	None (PIL)
`POST`	`/api/v1/image/classify`	AI image classification	`gpt-4o`	`openai/clip-vit-base-patch32`
`POST`	`/api/v1/image/detect-defects`	AI defect detection	`gpt-4o`	`openai/clip-vit-base-patch32`
`POST`	`/api/v1/image/thumbnail`	Thumbnail generation	None (PIL)	None (PIL)

AI Features:

Classification: Detailed image description and categorization
Defect Detection: Quality control and damage assessment
Dual Models: GPT-4o vision OR CLIP for computer vision tasks
Format Support: JPEG, PNG, GIF, BMP, TIFF, WebP

VIDEO/COMPUTER VISION SERVICE

Method	Endpoint	Description	OpenAI Model	Local Model
`POST`	`/api/v1/video/extract-frames`	Extract frames from video	None (OpenCV)	None (OpenCV)
`POST`	`/api/v1/video/detect-objects`	AI object detection in video	`gpt-4o`	`openai/clip-vit-base-patch32`
`POST`	`/api/v1/video/summarize`	AI video content summarization	`gpt-4o`	`clip-vit-base-patch32` + `t5-small`

AI Features:

Object Detection: Identify and describe objects in video frames
Content Summarization: AI-powered video content analysis
Frame Analysis: Processes first 3-5 frames for optimization
Dual Models: GPT-4o vision OR CLIP + T5 for video analysis

PDF SERVICE

Method	Endpoint	Description	OpenAI Model	Local Model
`POST`	`/api/v1/pdf/split`	Split PDF by page ranges	None (PyPDF2)	None (PyPDF2)
`GET`	`/api/v1/pdf/text/{cache_id}`	Retrieve cached OCR text	None (Tesseract)	None (Tesseract)

Features:

Page Range Splitting: Split PDFs by specified page ranges (e.g., "1-3,5-7")
Cached Text Retrieval: Redis-cached OCR text with cache_id lookup
PDF Processing: PyPDF2 for manipulation, Tesseract for OCR
Multiple Cache Patterns: Supports various cache key formats for compatibility

MODEL CONFIGURATION

Environment Control:

USE_OPENAI=true   # Use OpenAI models (whisper-1, gpt-4o, gpt-3.5-turbo)
USE_OPENAI=false  # Use local models (whisper-tiny, clip-vit-base-patch32, t5-small)

Model Performance:

OpenAI: Cloud-based, consistent performance, API costs
Local: Self-hosted, CUDA acceleration, one-time download costs
Optimization: Local models use torch.float16 for CUDA acceleration
Initialization: Local models downloaded from Hugging Face on first use

Tests Directory Structure

tests/
├── conftest.py                              # Test configuration and fixtures
├── test_api_endpoints.py                    # API endpoint integration tests
├── test_auth_comprehensive.py               # Authentication system tests
├── test_audio_endpoints_updated.py          # Audio API endpoint tests
├── test_audio_conversion_comprehensive.py   # Audio conversion service tests
├── test_cv_service_comprehensive_fixed.py   # Computer Vision service tests
├── test_general_updated.py                  # General API tests
├── test_image_fileid.py                     # Image API file_id tests
├── test_video_fileid.py                     # Video API file_id tests
├── test_pdf_fileid.py                       # PDF API file_id tests
├── test_storage_comprehensive.py            # Storage service tests
├── test_supabase_storage_comprehensive.py   # Supabase storage tests
├── test_stt_service_fixed.py               # Speech-to-Text service tests
├── test_ocr_service_comprehensive.py        # OCR service tests
├── test_redis_service_comprehensive.py      # Redis caching tests
├── test_kafka_events_comprehensive.py       # Kafka event handling tests
├── test_metadata_service_comprehensive.py   # Metadata extraction tests
├── test_middleware_comprehensive.py         # Middleware tests
└── __init__.py                              # Package initialization

Note: Tests require Docker environment with all dependencies installed:

# Run tests in Docker container
docker-compose up -d
docker-compose exec ai-file-utilities python -m pytest tests/ -v

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ai_file_utilities		ai_file_utilities
docs		docs
examples		examples
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
OpenAI_Models_Summary.md		OpenAI_Models_Summary.md
README.md		README.md
RELEASE_NOTES_v1.0.0.md		RELEASE_NOTES_v1.0.0.md
SETUP.md		SETUP.md
docker-compose.yml		docker-compose.yml
generate_token.py		generate_token.py
prometheus.yml		prometheus.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

AI File Utilities Microservice

PRODUCTION ARCHITECTURE

API Coverage - 20 Endpoints

File Processing Capabilities

AI MODELS & SERVICE ARCHITECTURE

Model Specifications:

** Local Model Optimization Strategies:**

Model Selection Logic:

Enterprise Security

Modern File Processing Architecture

Production Monitoring & Test Coverage

☁️ Cloud-Ready Infrastructure

QUICK START (Production Tested)

Option 1: Full Docker Stack Recommended

Option 2: Virtual Environment Setup

Configuration

Environment Variables

CONFIGURATION SETUP

Required Environment Variables

** Supabase Storage Setup** (Step-by-step)

1. Create Supabase Project

2. Create Storage Bucket

3. Set Bucket Policies (Important!)

4. Test Supabase Integration

** Additional Configuration**

Redis Cloud Setup (Optional)

Kafka Configuration

OCR Setup

** Deployment Options**

Production Environment Variables

Local Development

Testing & Model Switching

Run All Tests with Dual Model Support

For OpenAI Model Testing (Recommended)

For Local Model Testing

Quick Test Run (No Coverage)

PowerShell Environment Commands for Model Switching

Testing with OpenAI Models

Testing with Local Models

Test Categories & Dual Model Coverage

Current Test Results: 91% Coverage

Service Coverage Breakdown

Model Support in Tests

API Usage Examples

Authentication

General API - File ID Architecture

File Upload and Processing

Example Scripts & Model Switching

Image Processing Examples (OpenAI GPT-4o + Local CLIP)

Audio Processing Examples (OpenAI Whisper + Local STT)

Video Processing Examples (OpenAI + Local CV)

Complete Test Suite (All Services)

Model Configuration Commands (PowerShell)

OpenAI Configuration (Recommended for production)

Local Models Configuration (For offline/private deployments)

Quick Model Switching for Testing

Available Example Scripts (Updated June 2025)

Examples Directory Structure (Updated)

Dual Model Testing Commands (PowerShell)

For OpenAI Models (Production)

For Local Models (Offline/Private)

Model Comparison Testing

Example 1: General API Complete Workflow

Example 2: Audio Processing

Example 3: Video Processing

Example 4: Image Processing

Example 5: PDF Processing

Image Processing with Model Switching

Video Processing with Model Switching

Model Switching Configuration

📋 COMPLETE API REFERENCE

AUTHENTICATION SERVICE

GENERAL FILES SERVICE

AUDIO/STT SERVICE

IMAGE/COMPUTER VISION SERVICE

VIDEO/COMPUTER VISION SERVICE

Local Model Optimization Strategies:

Supabase Storage Setup (Step-by-step)

Additional Configuration

Deployment Options

Packages