-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Track
Creative Apps (GitHub Copilot)
Project Name
Latam Book Generator
GitHub Username
Repository URL
https://github.com/crissins/Agent-Framework
Project Description
LATAM Book Generator is an Agent Framework app that creates educational book creation for 350+ million children across Latin America.
Using 17 specialized AI agents, it transforms simple prompts into publication-ready, culturally-relevant children's books in minutes. Our multi-provider LLM strategy enables:
- Cost optimization: Production books cost ~$0.05 on Qwen vs. $0.30 on Claude
- No vendor lock-in: Switch providers (GitHub Models, Qwen, Claude, Azure OpenAI) with environment variables
- Multimodal output: HTML, PDF, Markdown, audiobooks with voice cloning, and JSON exports
- Intelligence layer: Curriculum design, fact-checking with live web search, educational activity generation, and AI illustrations
Built with Python/Streamlit and Microsoft Agent Framework, it orchestrates agents for Chat, Curriculum, Chapter Writing, Image Generation, Voice Cloning, Fact-Checking, and Export across multiple formats.
The platform demonstrates enterprise-grade agent coordination while maintaining cost efficiency—critical for emerging markets.
Demo Video or Screenshots
Demo video: https://youtu.be/Tr-6JnxO-9k
Demo files: demo_files
Primary Programming Language
Python
Key Technologies Used
- Agent Framework: Microsoft Agent Framework 1.0.0b260107
- LLM SDKs: OpenAI Python SDK (GitHub Models), DashScope SDK, Anthropic SDK
- UI/Backend: Streamlit 1.54.0, FastAPI (HTTP server mode)
- AI Services: DashScope (TTS, image generation, web search), Qwen3, Claude
- Data: Pydantic (schema validation), JSON, Markdown
- Export: fpdf2 (PDF), Pillow (images), HTML/CSS (10 templates)
- Audio: QwenTtsRealtime (WebSocket TTS), Edge TTS, voice cloning
- Observability: OpenTelemetry, AI Toolkit Agent Inspector
- Search: DuckDuckGo integration, DashScope web search
- Utilities: qrcode, ThreadPoolExecutor (batch processing)
- Deployment: Docker, docker-compose, Streamlit Cloud ready
Submission Type
Individual
Team Members
No response
Submission Requirements
- My project meets the track-specific challenge requirements
- My repository includes a comprehensive README.md with setup instructions
- My code does not contain hardcoded API keys or secrets
- I have included demo materials (video or screenshots)
- My project is my own work with proper attribution for any third-party code
- I agree to the Code of Conduct
- I have read and agree to the Disclaimer
- My submission does NOT contain any confidential, proprietary, or sensitive information
- I confirm I have the rights to submit this content and grant the necessary licenses
Quick Setup Summary
Option 1: Streamlit UI (Recommended for demo)
git clone https://github.com/latam-book-generator/latam-book-generator
cd latam-book-generator
pip install -r requirements.txt
cp .env.example .env
# Add your API keys to .env
streamlit run app.pyNavigate to http://localhost:8501 and start creating books.
Option 2: CLI Mode
python main.py --prompt "Describe your book..." --provider qwenOption 3: HTTP Server
python server.py
# API available at http://localhost:8000Option 4: Batch Testing (Compare providers)
python main.py --batch --providers github qwen claude azureAll modes use the same .env configuration. See README.md for detailed setup.
Technical Highlights
1. Multi-Agent Orchestration Architecture
- 12+ specialized agents coordinated via Microsoft Agent Framework
- Each agent has distinct responsibility: curriculum design, chapter writing, image generation, fact-checking, export
- Agents communicate through structured Pydantic schemas ensuring type safety
- Horizontal scalability—agents can run in parallel or sequence based on task dependencies
2. Provider-Agnostic LLM Layer
- Abstraction layer supporting GitHub Models, Qwen DashScope, Anthropic Claude, Azure OpenAI
- Single configuration point (
LLM_PROVIDERenv var) switches backends without code changes - Cost tracking per provider for transparent economics
- Fallback logic handles provider rate limits gracefully
3. Integrated Multimodal Capabilities
- Content generation, image creation, voice cloning, and fact-checking in single workflow
- Leverages DashScope's unified API (TTS, images, web search)
- Exports to 5+ formats (HTML, PDF, Markdown, JSON, audio)
- HTML templates use responsive design for multiple devices
4. Cost Optimization at Scale
- Development: GitHub Models free tier (~$0 cost)
- Production: Qwen DashScope (~$0.05/book, 40x cheaper than alternatives)
- Token counting and cost estimation before generation
- Batch processing with ThreadPoolExecutor for parallel multi-provider testing
5. Production-Ready Observability
- OpenTelemetry instrumentation for tracing agent calls
- Structured logging with context propagation
- Performance metrics: avg 3m 46s per 4-chapter book
- Agent Inspector integration for debugging
Challenges & Learnings
Challenge 1: Agent Coordination Complexity
Problem: Coordinating 17 agents with different input/output formats was error-prone.
Solution: Implemented strict Pydantic schemas for all inter-agent communication. Schemas acted as contracts, catching misalignments early and enabling debugging.
Learning: Strong typing in multi-agent systems is non-negotiable—it prevents silent failures and makes workflows auditable.
Challenge 2: Cost vs. Quality Tradeoff
Problem: Needed production-viable books at <$0.10/book, but quality concerns with cheaper models.
Solution: Implemented different model providers.
Learning: Provider diversity is a feature, not a limitation. Customers value optionality.
Challenge 3: Streaming Multimodal Output
Problem: Users want real-time feedback while agents work, but coordinating image gen + text + audio creates bottlenecks.
Solution: Built WebSocket server for streaming agent updates. Images generate in parallel while chapters are being written. Exports happen asynchronously.
Learning: Async-first architecture is essential for good UX in agent systems. Users tolerate slower operations if they see progress.
Challenge 4: Fact-Checking at Scale
Problem: Educational content must be accurate, but fact-checking every statement is expensive.
Solution: Integrated DashScope web search agent that samples facts (every 3rd paragraph) and verifies against live sources. Flags inconsistencies for human review.
Learning: Imperfect automation + human-in-the-loop beats expensive perfect automation. Transparency about limitations builds trust.
Challenge 5: Emerging Market Connectivity
Problem: Some regions have unreliable internet; users need offline-capable books.
Solution: All exports (PDF, HTML, Markdown) work offline. Audio is pre-generated and packaged. Only generation step requires internet.
Learning: Think beyond the happy path. Accessibility includes offline-first design.
Contact Information
https://www.linkedin.com/in/cristopher-olivares/
Country/Region
MEXICO