A comprehensive AI-powered productivity platform with email management, task/calendar integration, and live Google Meet transcription for mobile devices.
- Email Management: AI-powered email summarization and reply drafting
- Task & Calendar Integration: Google Calendar sync with task management
- Live Meeting Transcription: Real-time Google Meet transcription using Whisper
- Mobile-First Audio Streaming: Capture device audio and stream to backend via WebSocket
- AI Summaries: Structured meeting summaries with key points, decisions, and action items
- Background Processing: Auto-detect calendar meetings and start transcription
- Grace Period Handling: 90-second grace period for unexpected disconnections
- Architecture
- Quick Start
- Environment Setup
- Running the Application
- API Reference
- Mobile Integration
- WebSocket Protocol
- Database Models
- Testing
- Docker Deployment
- Troubleshooting
- Security & Privacy
Mobile App (React Native/Flutter)
β (Audio Capture: MediaProjection/ReplayKit)
β (WebSocket: Binary audio chunks)
FastAPI Backend
β (Whisper: Real-time transcription)
β (WebSocket: JSON transcript chunks)
Mobile App (Live transcript display)
β (Meeting ends)
Backend (LLM: Structured summary)
β (Database: Store transcript + summary)
Key Components:
- Whisper (local): Real-time speech-to-text transcription
- LLM (Ollama/OpenAI-compatible): Structured meeting summaries
- PostgreSQL: Persistent data storage
- WebSocket: Bidirectional real-time streaming
- Background Tasks: Calendar polling, inactive session cleanup
- Python 3.10+
- PostgreSQL
- ffmpeg (for audio processing)
- Google Cloud Project with OAuth credentials
- Ollama (for local LLM) or OpenAI API key
# Clone repository
git clone https://github.com/Covenantmondei/The_Agent.git
cd AI_Agent
# Create virtual environment
python -m venv .venv
source venv/bin/activate # On Windows: venv\Scripts\Activate.ps1
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Install Whisper
pip install openai-whisperCreate a .env file in the project root:
# Security
SECRET_KEY=your-super-secret-key-here
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/productivity_db
# Google OAuth
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret
# AI Services
OPENAI_API_KEY=your-openai-key # If using OpenAI cloud
OLLAMA_BASE_URL=http://localhost:11434 # If using local Ollama
# Whisper
WHISPER_DEVICE=cpu # or 'cuda' for GPU# Run migrations
alembic upgrade head
# Create new migration (if models changed)
alembic revision --autogenerate -m "description"
alembic upgrade head# Start the server
uvicorn main:app --reload
# Or using the main.py directly
python main.pyThe application will:
- Start on
http://localhost:8000 - Initialize database tables
- Start background services:
- Task scheduler
- Calendar poller (checks every 60s)
- Inactive session checker (checks every 30s)
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4Most endpoints require JWT authentication via Authorization: Bearer <token> header.
| Endpoint | Method | Description |
|---|---|---|
/email/unread-list |
GET | List unread emails with pagination |
/email/summarize |
POST | Summarize email (supports ?force=true) |
/email/process |
POST | Full email processing (summary + draft reply) |
/email/draft-reply |
POST | Generate reply (streaming) |
/email/send-reply |
POST | Send drafted or custom reply |
| Endpoint | Method | Description |
|---|---|---|
/meetings/join |
POST | Start ad-hoc meeting session |
/meetings/live |
GET | Get active and upcoming meetings |
/meetings/{id}/stop |
POST | Stop meeting and generate summary |
/meetings/{id}/transcript |
GET | Get full transcript and summary |
/meetings/{id}/summary |
GET | Get summary only |
/meetings/{id}/summary/retry |
POST | Retry failed summarization |
/meetings/{id}/status |
GET | Get meeting status and stats |
/meetings/ |
GET | List all meetings (with filters) |
/meetings/{id} |
DELETE | Delete meeting |
ws://<host>/ws/meeting/{meeting_id}?token=<jwt>
// Ad-hoc join
const response = await fetch('http://localhost:8000/meetings/join', {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
meet_url: 'https://meet.google.com/abc-defg-hij',
title: 'Project Sync'
})
});
const { session_id, websocket_url } = await response.json();
// Returns: { session_id: 123, websocket_url: "/ws/meeting/123", ... }const ws = new WebSocket(`ws://localhost:8000/ws/meeting/${session_id}?token=${token}`);
ws.onopen = () => {
console.log('Connected to meeting transcription');
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'transcript') {
// Display transcript
console.log(`[${data.sequence_number}] ${data.text}`);
}
};// React Native example with expo-av
import { Audio } from 'expo-av';
const { recording } = await Audio.Recording.createAsync({
android: {
extension: '.m4a',
outputFormat: Audio.RECORDING_OPTION_ANDROID_OUTPUT_FORMAT_MPEG_4,
audioEncoder: Audio.RECORDING_OPTION_ANDROID_AUDIO_ENCODER_AAC,
sampleRate: 16000,
numberOfChannels: 1,
bitRate: 128000,
},
ios: {
extension: '.m4a',
audioQuality: Audio.RECORDING_OPTION_IOS_AUDIO_QUALITY_HIGH,
sampleRate: 16000,
numberOfChannels: 1,
bitRate: 128000,
},
});
// Send chunks every 3 seconds
setInterval(async () => {
const uri = recording.getURI();
const response = await fetch(uri);
const blob = await response.blob();
const arrayBuffer = await blob.arrayBuffer();
if (ws.readyState === WebSocket.OPEN) {
ws.send(arrayBuffer);
}
}, 3000);// Close WebSocket
ws.close();
// Or call API
await fetch(`http://localhost:8000/meetings/${session_id}/stop`, {
method: 'POST',
headers: { 'Authorization': `Bearer ${token}` }
});const response = await fetch(`http://localhost:8000/meetings/${session_id}/transcript`, {
headers: { 'Authorization': `Bearer ${token}` }
});
const { meeting, transcripts, summary } = await response.json();
console.log('Key Points:', summary.key_points);
console.log('Action Items:', summary.action_items);Audio Chunks (Binary)
Send raw audio bytes every 1-3 seconds
Format: PCM, WAV, WebM, M4A
Preferred: 16kHz, mono, 16-bit
Control Messages (JSON)
{"action": "ping"}
{"action": "status"}Connection Confirmation
{
"type": "connection",
"message": "Connected to meeting transcription",
"meeting_id": 123,
"status": "active"
}Transcript Chunks
{
"type": "transcript",
"meeting_id": 123,
"timestamp": "2025-10-26T10:30:00Z",
"text": "Let's discuss the Q4 roadmap.",
"sequence_number": 5,
"is_final": true
}Status Response
{
"type": "status",
"meeting_id": 123,
"is_recording": true,
"sequence_number": 42,
"buffer_size": 48000
}- id: int (PK)
- user_id: int (FK)
- meet_link: str
- title: str
- start_time: datetime
- end_time: datetime (nullable)
- status: str (scheduled, active, finalizing, completed, failed)
- is_manual: bool
- last_activity: datetime
- calendar_event_id: str (nullable)- id: int (PK)
- meeting_id: int (FK)
- timestamp: datetime
- text: str
- sequence_number: int
- is_final: bool
- speaker: str (nullable)- id: int (PK)
- meeting_id: int (FK)
- full_transcript: text
- key_points: text
- decisions: text
- action_items: json (array)
- follow_ups: text
- summary_unavailable: bool
- error_message: text (nullable)# Unit tests
pytest
# Integration test
python test_meeting.py# Install wscat
npm install -g wscat
# Connect and test
wscat -c "ws://localhost:8000/ws/meeting/123?token=YOUR_JWT_TOKEN"
# Send ping
> {"action": "ping"}
# Send audio (from file)
# Use a tool to send binary data or modify wscat# Start meeting
curl -X POST http://localhost:8000/meetings/join \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"meet_url":"https://meet.google.com/test","title":"Test Meeting"}'
# Get status
curl http://localhost:8000/meetings/123/status \
-H "Authorization: Bearer YOUR_TOKEN"
# Stop meeting
curl -X POST http://localhost:8000/meetings/123/stop \
-H "Authorization: Bearer YOUR_TOKEN"version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://user:password@db:5432/productivity_db
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- db
- ollama
volumes:
- ./logs:/app/logs
db:
image: postgres:15
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: password
POSTGRES_DB: productivity_db
volumes:
- postgres_data:/var/lib/postgresql/data
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
postgres_data:
ollama_data:# Build and start
docker-compose up --build
# Stop
docker-compose down
# View logs
docker-compose logs -f api1. Google OAuth Errors
Error: Token refresh failed
Solution:
- Check GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET in .env
- User may need to re-authenticate
- Verify OAuth scopes match those in auth.py
2. Whisper Model Load Failures
Error: Model not found
Solution:
- Ensure internet connection for first-time download
- Check ~/.cache/whisper/ for model files
- For GPU: ensure CUDA installed and fp16=True in code
3. LLM Connection Issues
Error: Invalid URL 'None/api/tags'
Solution:
- Set OLLAMA_BASE_URL in .env
- Verify Ollama is running: curl http://localhost:11434/api/tags
- Check firewall/Docker network configuration
4. WebSocket Disconnections
Error: WebSocket closed unexpectedly
Solution:
- Verify token passed as query parameter: ?token=JWT
- Check logs/transcription.log for errors
- Ensure mobile maintains network connection
- Grace period (90s) allows reconnection
5. No Summary Generated
Error: summary_unavailable = true
Solution:
- Check logs for AI summarization errors
- Verify transcript has sufficient content (>10 chars)
- Use POST /meetings/{id}/summary/retry to regenerate
- Check LLM service is accessible
6. Audio Processing Errors
Error: Could not detect audio format
Solution:
- Install ffmpeg: sudo apt install ffmpeg (Linux) or brew install ffmpeg (Mac)
- Ensure pydub is installed: pip install pydub
- Check audio chunk format from mobile (prefer 16kHz WAV/PCM)
logs/
βββ app.log # General application logs
βββ meeting.log # Meeting service logs
βββ transcription.log # Whisper transcription logs
Enable detailed logging in main.py:
logging.basicConfig(
level=logging.DEBUG, # Change from INFO to DEBUG
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)-
Transport Security
- Use TLS/SSL in production (
wss://for WebSocket) - Enable HTTPS for all API endpoints
- Never send tokens in URL paths (use headers/query params over secure connection)
- Use TLS/SSL in production (
-
Data Protection
- Encrypt database at rest
- Implement data retention policies
- Provide transcript deletion endpoints
- Redact PII from logs
-
Authentication
- Use short-lived JWT tokens (1-24 hours)
- Implement token refresh mechanism
- Validate WebSocket token on connect
- Rate limit API endpoints
-
Audio Privacy
- Audio chunks are processed and discarded (not stored permanently)
- Only transcripts are persisted
- Implement GDPR-compliant data export/deletion
# Never commit .env file
echo ".env" >> .gitignore
# Use strong SECRET_KEY (generate with)
python -c "import secrets; print(secrets.token_urlsafe(32))"AI_Agent/
βββ main.py # FastAPI application entry point
βββ requirements.txt # Python dependencies
βββ alembic.ini # Alembic configuration
βββ .env # Environment variables (not in git)
βββ alembic/
β βββ env.py # Alembic environment
β βββ versions/ # Database migrations
βββ app/
β βββ api/
β β βββ v1/
β β βββ auth.py # Authentication endpoints
β β βββ email_manage.py # Email endpoints
β β βββ meeting.py # Meeting REST API
β β βββ meeting_ws.py # WebSocket handler
β β βββ task.py # Task endpoints
β β βββ calendar.py # Calendar endpoints
β β βββ summary.py # Summary endpoints
β βββ core/
β β βββ config.py # Configuration
β β βββ security.py # Security utilities
β β βββ oauth.py # OAuth handlers
β βββ db/
β β βββ base.py # Database base
β β βββ session.py # DB session
β β βββ models/
β β βββ meeting.py # Meeting models
β β βββ email_manage.py # Email models
β β βββ user.py # User model
β β βββ task.py # Task model
β βββ schemas/
β β βββ meeting.py # Meeting Pydantic schemas
β β βββ email.py # Email schemas
β β βββ ...
β βββ services/
β β βββ ai_processor.py # LLM integration
β β βββ meeting_service.py # Meeting business logic
β β βββ transcription_service.py # Whisper transcription
β β βββ email_service.py # Gmail integration
β β βββ calendar_service.py # Google Calendar
β β βββ scheduler.py # Background tasks
β βββ utils/
β βββ logger.py # Logging utilities
β βββ notifications.py # Notification helpers
βββ logs/ # Application logs
βββ tests/ # Unit tests
βββ test_meeting.py # Integration test
-
Speaker Diarization
- Identify and label different speakers
- Update
MeetingTranscript.speakerfield
-
Partial Results
- Stream non-final transcripts for ultra-low latency
- Add confidence scores
-
Multi-language Support
- Auto-detect language with Whisper
- Translate summaries
-
Advanced Audio Processing
- Noise reduction
- Echo cancellation
- Audio quality metrics
-
Analytics Dashboard
- Meeting duration statistics
- Word clouds from transcripts
- Action item tracking
-
Integrations
- Slack notifications for summaries
- Export to Google Docs/Notion
- Calendar event updates with summary
- FastAPI Documentation
- Whisper Documentation
- Ollama Documentation
- Google Calendar API
- WebSocket Protocol
[Add your license here]
[Add contribution guidelines]
[Add authors/maintainers]
Built with β€οΈ using FastAPI, Whisper, and Ollama