Project: [Creative Apps] - Latam Book Generator

### Track

Creative Apps (GitHub Copilot)

### Project Name

Latam Book Generator

### GitHub Username

@crissins

### Repository URL

https://github.com/crissins/Agent-Framework

### Project Description

LATAM Book Generator is an Agent Framework app that creates educational book creation for 350+ million children across Latin America. 

Using **17 specialized AI agents**, it transforms simple prompts into publication-ready, culturally-relevant children's books in minutes. Our **multi-provider LLM strategy** enables:

- **Cost optimization**: Production books cost ~$0.05 on Qwen vs. $0.30 on Claude
- **No vendor lock-in**: Switch providers (GitHub Models, Qwen, Claude, Azure OpenAI) with environment variables
- **Multimodal output**: HTML, PDF, Markdown, audiobooks with voice cloning, and JSON exports
- **Intelligence layer**: Curriculum design, fact-checking with live web search, educational activity generation, and AI illustrations

Built with Python/Streamlit and Microsoft Agent Framework, it orchestrates agents for Chat, Curriculum, Chapter Writing, Image Generation, Voice Cloning, Fact-Checking, and Export across multiple formats.

The platform demonstrates enterprise-grade agent coordination while maintaining cost efficiency—critical for emerging markets.

### Demo Video or Screenshots

Demo video:  https://youtu.be/Tr-6JnxO-9k
Demo files:  demo_files


### Primary Programming Language

Python

### Key Technologies Used

- **Agent Framework**: Microsoft Agent Framework 1.0.0b260107
- **LLM SDKs**: OpenAI Python SDK (GitHub Models), DashScope SDK, Anthropic SDK
- **UI/Backend**: Streamlit 1.54.0, FastAPI (HTTP server mode)
- **AI Services**: DashScope (TTS, image generation, web search), Qwen3, Claude
- **Data**: Pydantic (schema validation), JSON, Markdown
- **Export**: fpdf2 (PDF), Pillow (images), HTML/CSS (10 templates)
- **Audio**: QwenTtsRealtime (WebSocket TTS), Edge TTS, voice cloning
- **Observability**: OpenTelemetry, AI Toolkit Agent Inspector
- **Search**: DuckDuckGo integration, DashScope web search
- **Utilities**: qrcode, ThreadPoolExecutor (batch processing)
- **Deployment**: Docker, docker-compose, Streamlit Cloud ready

### Submission Type

Individual

### Team Members

_No response_

### Submission Requirements

- [x] My project meets the track-specific challenge requirements
- [x] My repository includes a comprehensive README.md with setup instructions
- [x] My code does not contain hardcoded API keys or secrets
- [x] I have included demo materials (video or screenshots)
- [x] My project is my own work with proper attribution for any third-party code
- [x] I agree to the [Code of Conduct](https://github.com/microsoft/agentsleague/blob/main/CODE_OF_CONDUCT.md)
- [x] I have read and agree to the [Disclaimer](https://github.com/microsoft/agentsleague/blob/main/DISCLAIMER.md)
- [x] My submission does NOT contain any confidential, proprietary, or sensitive information
- [x] I confirm I have the rights to submit this content and grant the necessary licenses

### Quick Setup Summary

**Option 1: Streamlit UI (Recommended for demo)**
```bash
git clone https://github.com/latam-book-generator/latam-book-generator
cd latam-book-generator
pip install -r requirements.txt
cp .env.example .env
# Add your API keys to .env
streamlit run app.py
```
Navigate to `http://localhost:8501` and start creating books.

**Option 2: CLI Mode**
```bash
python main.py --prompt "Describe your book..." --provider qwen
```

**Option 3: HTTP Server**
```bash
python server.py
# API available at http://localhost:8000
```

**Option 4: Batch Testing (Compare providers)**
```bash
python main.py --batch --providers github qwen claude azure
```

All modes use the same `.env` configuration. See README.md for detailed setup.

---

### Technical Highlights

**1. Multi-Agent Orchestration Architecture**
- 12+ specialized agents coordinated via Microsoft Agent Framework
- Each agent has distinct responsibility: curriculum design, chapter writing, image generation, fact-checking, export
- Agents communicate through structured Pydantic schemas ensuring type safety
- Horizontal scalability—agents can run in parallel or sequence based on task dependencies

**2. Provider-Agnostic LLM Layer**
- Abstraction layer supporting GitHub Models, Qwen DashScope, Anthropic Claude, Azure OpenAI
- Single configuration point (`LLM_PROVIDER` env var) switches backends without code changes
- Cost tracking per provider for transparent economics
- Fallback logic handles provider rate limits gracefully

**3. Integrated Multimodal Capabilities**
- Content generation, image creation, voice cloning, and fact-checking in single workflow
- Leverages DashScope's unified API (TTS, images, web search)
- Exports to 5+ formats (HTML, PDF, Markdown, JSON, audio)
- HTML templates use responsive design for multiple devices

**4. Cost Optimization at Scale**
- Development: GitHub Models free tier (~$0 cost)
- Production: Qwen DashScope (~$0.05/book, 40x cheaper than alternatives)
- Token counting and cost estimation before generation
- Batch processing with ThreadPoolExecutor for parallel multi-provider testing

**5. Production-Ready Observability**
- OpenTelemetry instrumentation for tracing agent calls
- Structured logging with context propagation
- Performance metrics: avg 3m 46s per 4-chapter book
- Agent Inspector integration for debugging

---

### Challenges & Learnings

**Challenge 1: Agent Coordination Complexity**
*Problem*: Coordinating 17 agents with different input/output formats was error-prone.
*Solution*: Implemented strict Pydantic schemas for all inter-agent communication. Schemas acted as contracts, catching misalignments early and enabling debugging.
*Learning*: Strong typing in multi-agent systems is non-negotiable—it prevents silent failures and makes workflows auditable.

**Challenge 2: Cost vs. Quality Tradeoff**
*Problem*: Needed production-viable books at <$0.10/book, but quality concerns with cheaper models.
*Solution*: Implemented different model providers.
*Learning*: Provider diversity is a feature, not a limitation. Customers value optionality.

**Challenge 3: Streaming Multimodal Output**
*Problem*: Users want real-time feedback while agents work, but coordinating image gen + text + audio creates bottlenecks.
*Solution*: Built WebSocket server for streaming agent updates. Images generate in parallel while chapters are being written. Exports happen asynchronously.
*Learning*: Async-first architecture is essential for good UX in agent systems. Users tolerate slower operations if they see progress.

**Challenge 4: Fact-Checking at Scale**
*Problem*: Educational content must be accurate, but fact-checking every statement is expensive.
*Solution*: Integrated DashScope web search agent that samples facts (every 3rd paragraph) and verifies against live sources. Flags inconsistencies for human review.
*Learning*: Imperfect automation + human-in-the-loop beats expensive perfect automation. Transparency about limitations builds trust.

**Challenge 5: Emerging Market Connectivity**
*Problem*: Some regions have unreliable internet; users need offline-capable books.
*Solution*: All exports (PDF, HTML, Markdown) work offline. Audio is pre-generated and packaged. Only generation step requires internet.
*Learning*: Think beyond the happy path. Accessibility includes offline-first design.

---

### Contact Information

https://www.linkedin.com/in/cristopher-olivares/

### Country/Region

MEXICO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project: [Creative Apps] - Latam Book Generator #131

Track

Project Name

GitHub Username

Repository URL

Project Description

Demo Video or Screenshots

Primary Programming Language

Key Technologies Used

Submission Type

Team Members

Submission Requirements

Quick Setup Summary

Technical Highlights

Challenges & Learnings

Contact Information

Country/Region

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Project: [Creative Apps] - Latam Book Generator #131

Description

Track

Project Name

GitHub Username

Repository URL

Project Description

Demo Video or Screenshots

Primary Programming Language

Key Technologies Used

Submission Type

Team Members

Submission Requirements

Quick Setup Summary

Technical Highlights

Challenges & Learnings

Contact Information

Country/Region

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions