Alpha Content Engine

Overview

Enterprise-grade content pipeline that transforms web documentation into intelligent AI assistants through automated scraping, processing, and deployment.

Core Achievements:

30+ Articles Processed: Zendesk API → Clean Markdown with preserved structure
Zero-UI Deployment: 100% programmatic OpenAI API integration
Smart Delta Detection: Hash-based change tracking for efficient updates
Production Ready: Dockerized with daily automation and comprehensive logging

Technical Excellence:

API-First Architecture: No manual uploads, fully automated vector store management
Intelligent Chunking: File-based strategy optimized for accurate citations
Cost-Effective Scaling: GitHub Actions over expensive cloud platforms
Security Best Practices: Environment-based secrets, no hardcoded keys

System Architecture

Screenshots

OpenAI Playground Answer - Strict Compliance Achieved

GPT-4o: Perfect adherence to "Only answer using uploaded docs" with proper citations

Comparison: GPT-3.5-turbo shows hallucination vs. compliant behavior demonstration

GitHub Actions Deployment - Zero-Cost Automation

Production Pipeline: Automated daily execution with assistant reuse and intelligent delta tracking

Docker Local Testing - One-Command Deployment

Container Success: Clean execution showing scraping, processing, and exit 0 compliance

Quick Start

Setup:

# Windows: Copy template file and rename
copy .env.sample .env
# Edit .env file and replace with your actual API key:
# OPENAI_API_KEY=your-actual-openai-api-key

# Linux/Mac alternative:
# cp .env.sample .env

How to run locally:

# Install dependencies and run
pip install -r requirements.txt
python main.py

# Build and run Docker container (exits 0 as required)
docker build -t alpha-content-engine .
docker run -e OPENAI_API_KEY=your-api-key alpha-content-engine

Link to daily job logs:

All GitHub Actions Logs - Public repository with complete run history, logs, and downloadable artifacts including config files and scraper logs.

Note: First run will show "artifact not found" warning (expected behavior) as no previous config exists. Assistant reuse starts from second run onwards for optimal performance.

Chunking Strategy

File-based chunking: Each article = 1 file uploaded to OpenAI Vector Store

Benefits:

Preserves article structure for better context
Maintains URLs for accurate citations
Simple & reliable for support use cases
Optimal for retrieval accuracy

Process: HTML → Clean Markdown → Metadata footer → API upload

Logged: Files embedded in vector store + chunks processed count for full transparency

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
images		images
src		src
.env.sample		.env.sample
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
reflection.md		reflection.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alpha Content Engine

Overview

System Architecture

Screenshots

OpenAI Playground Answer - Strict Compliance Achieved

GitHub Actions Deployment - Zero-Cost Automation

Docker Local Testing - One-Command Deployment

Quick Start

Chunking Strategy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alpha Content Engine

Overview

System Architecture

Screenshots

OpenAI Playground Answer - Strict Compliance Achieved

GitHub Actions Deployment - Zero-Cost Automation

Docker Local Testing - One-Command Deployment

Quick Start

Chunking Strategy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages