An advanced AI-powered chatbot designed to help students and professionals solve mathematics problems, from basic arithmetic to advanced calculus, linear algebra, statistics, and more. This project demonstrates expertise in full-stack development, AI/LLM integration, multimodal input handling, and software engineering best practices.
- Step-by-step math problem solving powered by Google Gemini LLM
- Mathematical concept explanations with real examples
- LaTeX rendering for beautiful math display (KaTeX)
- File upload support (images, PDFs, text, DOCX) with intelligent backend processing (up to 5 files at once)
- Speech-to-text for voice input (Whisper integration)
- Streaming responses with typing effect for real-time, interactive UX
- Chat history management with persistent storage (SQLite + SQLAlchemy)
- Robust error handling and accessibility (A11y) features
- ChatGPT-like UI with Shadcn UI components
- Chat History Sidebar with search, rename, and delete functionality
- User's response with copy to clipboard, regenerate, and edit functionality
- Model's response with copy to clipboard and regenerate functionality
- Context Management with multi-turn chat history for coherent, contextual conversations
Here are some screenshots demonstrating the AI Math Chatbot in action:
- Large Language Model (LLM) Integration: Google Gemini Pro API for advanced math reasoning, step-by-step explanations, and conceptual understanding
- Prompt Engineering: Custom system instructions to define chatbot persona and capabilities
- Multimodal Input: Backend supports text, images, PDFs, and DOCX; intelligently extracts and routes content for optimal LLM use
- Streaming & Real-Time UX: Implements streaming LLM responses for instant feedback
- Speech-to-Text: Whisper (via Hugging Face) for accessible, voice-driven input
- Context Management: Multi-turn chat history for coherent, contextual conversations
- File API Usage: Handles large files with Gemini's File API, including TTL management
- Error Handling: User-friendly error messages, robust backend exception handling, and logging
- Accessibility & UX: WCAG 2.1 AA compliance, semantic HTML, keyboard navigation, and responsive design
- Modern Full-Stack Engineering: FastAPI backend, Next.js/React frontend, Zustand state management, Tailwind CSS, Dockerized deployment
- Python 3.9+
- FastAPI
- Google Gen AI SDK (Gemini 2.5 Flash)
- SQLAlchemy with SQLite
- Hugging Face Inference API (Whisper for speech-to-text)
- Next.js with TypeScript
- Tailwind CSS with Shadcn UI components
- KaTeX for LaTeX rendering
- Zustand for state management
Refer to the project-structure.md file for a detailed breakdown of the project structure.
backend/: FastAPI app, API endpoints, business logic, DB models, Gemini integrationfrontend/: Next.js/React app, UI components, state management, API servicesdocs/: Project documentation, rules, and references.env.example: Template for environment variables (located inbackend/)LICENSE: MIT License (see LICENSE)docker-compose.yml: Orchestrates local development for frontend & backend
- Python 3.9 or higher
- Node.js 16 or higher
- Google Gemini API key
- Hugging Face API token (optional, for speech-to-text)
Copy .env.example from the backend/ directory and create your own .env file in the same location:
GEMINI_API_KEY=your_gemini_api_key
GEMINI_MODEL_NAME=gemini-2.5-flash-preview-04-17
HUGGINGFACE_API_TOKEN=your_huggingface_token
UPLOAD_DIR=/path/to/upload/directory
AUDIO_DIR=/path/to/audio/directory
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
- Navigate to the backend directory:
cd backend - Create a virtual environment:
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate - Install dependencies:
pip install -r requirements.txt - Run the server:
uvicorn app.main:app --reload
- Navigate to the frontend directory:
cd frontend - Install dependencies:
npm install # or pnpm install - Create a
.env.localfile with:NEXT_PUBLIC_API_BASE_URL=http://localhost:8000 - Run the development server:
The frontend will be available at http://localhost:3000
npm run dev # or pnpm dev
Alternatively, use Docker Compose to run both frontend and backend:
- Ensure Docker and Docker Compose are installed
- Create a
.envfile in thebackend/directory with your API keys - From the project root, run:
docker-compose up --build - Access the app at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
/chats,/chats/{chat_id}: Chat CRUD/chats/{chat_id}/messages/: Message CRUD/chats/{chat_id}/stream: Streaming LLM responses/upload-file: File upload (PDF, image, DOCX, text; up to 5 files per request)/stt: Speech-to-text (Whisper)
See app/routers/ for implementation details.
POST /api/chat: Send a message to the chatbot (proxies to backend)
- Chat Management: Fetch, create, and manage chats via backend API
- Streaming Responses: Real-time LLM responses via Server-Sent Events (SSE)
- File Upload: Multi-part form data for sending up to 5 files to backend
- Speech-to-Text: Audio recordings sent to backend for transcription
- Centralized API Logic: All API calls managed in
frontend/lib/api-service.ts
- Speech-to-Text: The chatbot supports voice input using Whisper (via Hugging Face Inference API). Users can record their math questions or problems directly in the chat interface using the microphone button.
- Real-Time Transcription: Audio is streamed to the backend, transcribed to text, and inserted into the chat input for review and editing before sending.
- Supported Formats: Accepts standard audio formats (e.g., WAV, MP3, M4A, WebM) as supported by browsers and Whisper API.
- Accessibility: Voice input is fully accessible via keyboard navigation and screen readers. Visual feedback is provided during recording and transcription.
- Error Handling: User-friendly error messages are shown for unsupported formats, transcription failures, or permission issues.
- Privacy: Audio is processed in-memory and not stored on the server beyond the request lifecycle.
- Frontend Display: A modal or inline UI provides recording controls and displays transcription progress. See the demo screenshot below:
- Supported Types:
- PDF:
application/pdf - Images:
image/jpeg,image/png,image/webp,image/heic,image/heif - Text:
text/plain - Word:
application/vnd.openxmlformats-officedocument.wordprocessingml.document(.docx)
- PDF:
- Size Limits:
- Inline processing for files < ~20MB
- Gemini Files API for files up to 2GB (48-hour TTL)
- File Count Limit:
- Up to 5 files can be uploaded at once per request
- Processing:
.txtand.docx: Text extracted before sending to Gemini- PDF/Images: Sent directly to Gemini
- Frontend Display: Uploaded files shown as chips with remove functionality (max 5)
- No persistent backend storage beyond Gemini's 48-hour TTL
Note: Backend and frontend tests will be implemented in a later stage. The project is structured for easy test integration using:
- Backend:
pytestfor unit/integration tests,httpxfor API testing - Frontend:
jestandreact-testing-libraryfor component and integration tests - E2E: Playwright or Cypress for end-to-end user flow testing
This project is licensed under the MIT License. See the LICENSE file for details.
- Advanced Math Capabilities: Integrate symbolic computation (e.g., SymPy) for algebraic manipulation and equation solving
- User Authentication: Optional login for saving personal chat history and preferences
- Admin Dashboard: Analytics, usage stats, and moderation tools
- Multilingual Support: Expand to support multiple languages for global accessibility
- Mobile App: React Native or Flutter client for mobile devices
- Enhanced A11y: Further improvements for screen readers and cognitive accessibility
- Cloud Deployment: One-click deploy to GCP, AWS, or Azure
- Automated Testing: CI/CD integration for automated test runs and deployments
- Plugin System: Allow users to extend chatbot capabilities with custom plugins
- Calculator: Add a calculator with basic and scientific modes to the chatbot
- Graphs and Charts: Add the ability to create graphs and charts from data
- Canvas drawing: Add the ability to draw on a canvas mathematical expressions, geometry, graphs, etc.
- Unit/Integration/E2E Testing: Add unit, integration, and E2E tests
For questions or feedback, please reach out via:
- GitHub: EvanGks
- X (Twitter): @Evan6471133782
- LinkedIn: Evangelos Gakias
- Kaggle: evangelosgakias
- Email: evangks88@gmail.com




