Skip to content

Project: [Creative Apps] - Latam Book Generator #131

@crissins

Description

@crissins

Track

Creative Apps (GitHub Copilot)

Project Name

Latam Book Generator

GitHub Username

@crissins

Repository URL

https://github.com/crissins/Agent-Framework

Project Description

LATAM Book Generator is an Agent Framework app that creates educational book creation for 350+ million children across Latin America.

Using 17 specialized AI agents, it transforms simple prompts into publication-ready, culturally-relevant children's books in minutes. Our multi-provider LLM strategy enables:

  • Cost optimization: Production books cost ~$0.05 on Qwen vs. $0.30 on Claude
  • No vendor lock-in: Switch providers (GitHub Models, Qwen, Claude, Azure OpenAI) with environment variables
  • Multimodal output: HTML, PDF, Markdown, audiobooks with voice cloning, and JSON exports
  • Intelligence layer: Curriculum design, fact-checking with live web search, educational activity generation, and AI illustrations

Built with Python/Streamlit and Microsoft Agent Framework, it orchestrates agents for Chat, Curriculum, Chapter Writing, Image Generation, Voice Cloning, Fact-Checking, and Export across multiple formats.

The platform demonstrates enterprise-grade agent coordination while maintaining cost efficiency—critical for emerging markets.

Demo Video or Screenshots

Demo video: https://youtu.be/Tr-6JnxO-9k
Demo files: demo_files

Primary Programming Language

Python

Key Technologies Used

  • Agent Framework: Microsoft Agent Framework 1.0.0b260107
  • LLM SDKs: OpenAI Python SDK (GitHub Models), DashScope SDK, Anthropic SDK
  • UI/Backend: Streamlit 1.54.0, FastAPI (HTTP server mode)
  • AI Services: DashScope (TTS, image generation, web search), Qwen3, Claude
  • Data: Pydantic (schema validation), JSON, Markdown
  • Export: fpdf2 (PDF), Pillow (images), HTML/CSS (10 templates)
  • Audio: QwenTtsRealtime (WebSocket TTS), Edge TTS, voice cloning
  • Observability: OpenTelemetry, AI Toolkit Agent Inspector
  • Search: DuckDuckGo integration, DashScope web search
  • Utilities: qrcode, ThreadPoolExecutor (batch processing)
  • Deployment: Docker, docker-compose, Streamlit Cloud ready

Submission Type

Individual

Team Members

No response

Submission Requirements

  • My project meets the track-specific challenge requirements
  • My repository includes a comprehensive README.md with setup instructions
  • My code does not contain hardcoded API keys or secrets
  • I have included demo materials (video or screenshots)
  • My project is my own work with proper attribution for any third-party code
  • I agree to the Code of Conduct
  • I have read and agree to the Disclaimer
  • My submission does NOT contain any confidential, proprietary, or sensitive information
  • I confirm I have the rights to submit this content and grant the necessary licenses

Quick Setup Summary

Option 1: Streamlit UI (Recommended for demo)

git clone https://github.com/latam-book-generator/latam-book-generator
cd latam-book-generator
pip install -r requirements.txt
cp .env.example .env
# Add your API keys to .env
streamlit run app.py

Navigate to http://localhost:8501 and start creating books.

Option 2: CLI Mode

python main.py --prompt "Describe your book..." --provider qwen

Option 3: HTTP Server

python server.py
# API available at http://localhost:8000

Option 4: Batch Testing (Compare providers)

python main.py --batch --providers github qwen claude azure

All modes use the same .env configuration. See README.md for detailed setup.


Technical Highlights

1. Multi-Agent Orchestration Architecture

  • 12+ specialized agents coordinated via Microsoft Agent Framework
  • Each agent has distinct responsibility: curriculum design, chapter writing, image generation, fact-checking, export
  • Agents communicate through structured Pydantic schemas ensuring type safety
  • Horizontal scalability—agents can run in parallel or sequence based on task dependencies

2. Provider-Agnostic LLM Layer

  • Abstraction layer supporting GitHub Models, Qwen DashScope, Anthropic Claude, Azure OpenAI
  • Single configuration point (LLM_PROVIDER env var) switches backends without code changes
  • Cost tracking per provider for transparent economics
  • Fallback logic handles provider rate limits gracefully

3. Integrated Multimodal Capabilities

  • Content generation, image creation, voice cloning, and fact-checking in single workflow
  • Leverages DashScope's unified API (TTS, images, web search)
  • Exports to 5+ formats (HTML, PDF, Markdown, JSON, audio)
  • HTML templates use responsive design for multiple devices

4. Cost Optimization at Scale

  • Development: GitHub Models free tier (~$0 cost)
  • Production: Qwen DashScope (~$0.05/book, 40x cheaper than alternatives)
  • Token counting and cost estimation before generation
  • Batch processing with ThreadPoolExecutor for parallel multi-provider testing

5. Production-Ready Observability

  • OpenTelemetry instrumentation for tracing agent calls
  • Structured logging with context propagation
  • Performance metrics: avg 3m 46s per 4-chapter book
  • Agent Inspector integration for debugging

Challenges & Learnings

Challenge 1: Agent Coordination Complexity
Problem: Coordinating 17 agents with different input/output formats was error-prone.
Solution: Implemented strict Pydantic schemas for all inter-agent communication. Schemas acted as contracts, catching misalignments early and enabling debugging.
Learning: Strong typing in multi-agent systems is non-negotiable—it prevents silent failures and makes workflows auditable.

Challenge 2: Cost vs. Quality Tradeoff
Problem: Needed production-viable books at <$0.10/book, but quality concerns with cheaper models.
Solution: Implemented different model providers.
Learning: Provider diversity is a feature, not a limitation. Customers value optionality.

Challenge 3: Streaming Multimodal Output
Problem: Users want real-time feedback while agents work, but coordinating image gen + text + audio creates bottlenecks.
Solution: Built WebSocket server for streaming agent updates. Images generate in parallel while chapters are being written. Exports happen asynchronously.
Learning: Async-first architecture is essential for good UX in agent systems. Users tolerate slower operations if they see progress.

Challenge 4: Fact-Checking at Scale
Problem: Educational content must be accurate, but fact-checking every statement is expensive.
Solution: Integrated DashScope web search agent that samples facts (every 3rd paragraph) and verifies against live sources. Flags inconsistencies for human review.
Learning: Imperfect automation + human-in-the-loop beats expensive perfect automation. Transparency about limitations builds trust.

Challenge 5: Emerging Market Connectivity
Problem: Some regions have unreliable internet; users need offline-capable books.
Solution: All exports (PDF, HTML, Markdown) work offline. Audio is pre-generated and packaged. Only generation step requires internet.
Learning: Think beyond the happy path. Accessibility includes offline-first design.


Contact Information

https://www.linkedin.com/in/cristopher-olivares/

Country/Region

MEXICO

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions