Adept AI Interview Trainer is an AI-powered platform designed to help users practice and improve their interview skills. It records video and audio of a mock interview, analyzes various aspects of the user's performance using machine learning models, and provides comprehensive feedback generated by Google Gemini.
- Video and Audio Recording: Captures the user's interview performance.
- Multi-Modal Analysis:
- Audio Analysis:
- Speech-to-text transcription (including filler words) using Crisper Whisper.
- Metrics: Pace (WPM), pitch, volume, pause duration, and filler word count.
- Video Analysis:
- Emotion detection (dominant emotion and distribution).
- Posture analysis (upright posture percentage).
- Eye tracking (future enhancement).
- Audio Analysis:
- AI-Generated Feedback: Google Gemini synthesizes the audio and video analysis to provide:
- Strengths and areas for improvement.
- Scores for clarity, relevance, tone, and vocabulary.
- Actionable insights and suggestions.
- Personalized Experience: Tailors feedback to the specific interview question asked.
- User-Friendly Interface: Web-based platform for easy access and interaction.
Backend (Python - Flask):
- Flask: Web framework.
- Librosa: Audio analysis (pitch, volume, pace).
- OpenAI Whisper: Speech-to-text transcription (customized for filler word detection).
- OpenCV: Video processing (emotion detection, posture analysis).
- TensorFlow/Keras: For the deep learning model (
best_model_full.h5) used in video analysis. - Google Gemini API: For generating comprehensive analytical feedback.
- FFmpeg: For video file conversion (e.g., webm to mp4).
Frontend (React - Vite):
- React: JavaScript library for building user interfaces.
- TypeScript: Superset of JavaScript adding static typing.
- Vite: Next-generation frontend tooling (fast build and dev server).
- Tailwind CSS: Utility-first CSS framework.
- Shadcn/ui: Re-usable UI components.
- Fetch: For making API requests to the backend.
- React Router: For client-side routing.
Development & Other:
- Jupyter Notebooks: For model development, experimentation, and API testing.
- Git & GitHub: Version control and repository hosting.
Backend:
- Python 3.9+
- Pip (Python package installer)
- FFmpeg: Ensure FFmpeg is installed and accessible in your system PATH.
- On Debian/Ubuntu:
sudo apt update && sudo apt install ffmpeg - On macOS (using Homebrew):
brew install ffmpeg - On Windows: Download from ffmpeg.org and add to PATH.
- On Debian/Ubuntu:
- A Google Gemini API Key.
Frontend:
- Node.js (which includes npm) or Bun.
1. Clone the Repository:
git clone <repository-url>
cd ai-interview-trainer2. Backend Setup (Server):
a. Navigate to the Server directory:
bash cd Server
b. Create a virtual environment (recommended):
bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
c. Install Python dependencies:
bash pip install -r ../requirements.txt
(Note: requirements.txt is in the root, so ../requirements.txt)
d. Set up Environment Variables:
Create a .env file in the Server directory and add your Google Gemini API Key:
env GEMINI_API_KEY=YOUR_GEMINI_API_KEY
The Server/gemini.py file loads this key.
e. Run the Flask Server:
bash python server.py
The backend server will typically start on http://127.0.0.1:5000.
3. Frontend Setup (Website):
a. Navigate to the Website directory (from the root project folder):
bash cd Website
b. Install Node.js dependencies:
If using npm:
bash npm install
If using Bun (as indicated by bun.lockb):
bash bun install
c. Run the Frontend Development Server:
If using npm:
bash npm run dev
If using Bun:
bash bun run dev
The frontend development server will typically start on http://localhost:5173
4. Accessing the Application:
Open your web browser and navigate to the frontend URL (e.g., http://localhost:5173). The frontend will make requests to the backend API.
The primary backend API endpoint is served by the Flask application:
-
POST /process- Description: Processes the uploaded video interview. It extracts audio, transcribes it, analyzes audio and video features, and then sends the data to Google Gemini for a comprehensive report.
- Request Type:
multipart/form-data - Form Data:
video: The video file (e.g., .mp4, .webm).interview_question: A string containing the interview question the user was answering.
- Success Response (200 OK):
- A JSON object containing:
video_file: Filename of the processed video.interview_question: The question asked.status: "complete".transcript: The audio transcript.audio_metrics: Object with audio analysis data (e.g.,avg_pause_duration_s,pace_wpm).video_metrics: Object with video analysis data (e.g.,dominant_emotion,upright_posture_percentage).gemini_analysis: Object or string containing the feedback from Gemini (strengths, areas for improvement, scores).
- Example: See
test_results_confident_interview.jsonfor the structure.
- A JSON object containing:
- Error Responses:
400 Bad Request: Ifvideopart orinterview_questionis missing, no file selected, or file type not allowed.500 Internal Server Error: If any part of the processing pipeline fails (e.g., audio extraction, transcription, Gemini API error, video conversion). The JSON response will contain anerrormessage.
-
GET /- Description: A simple HTML form for direct upload and testing of the
/processendpoint without needing the React frontend. Useful for quick backend tests. - Response Type:
text/html
- Description: A simple HTML form for direct upload and testing of the
best_model_full.h5: Pre-trained deep learning model for video analysis.*.ipynb(e.g.,gemini.ipynb,model.ipynb): Jupyter Notebooks used for development, experimentation, and testing of different modules like Gemini API integration and model building.requirements.txt: Lists Python dependencies for the backend.test_results_*.json: Example JSON outputs from the/processendpoint, showing the structure of the final analysis.Server/: Contains all backend Python code.server.py: Main Flask application file, defines API endpoints and orchestrates the processing.audio.py: Handles audio extraction, transcription (Whisper), and feature analysis (Librosa).video.py: Handles video feature analysis (OpenCV,best_model_full.h5).gemini.py: Interface for interacting with the Google Gemini API.
uploads/: Default directory where uploaded videos are temporarily stored and processed.Website/: Contains all frontend React application code.public/: Static assets.src/: Main source code for the React application.App.tsx: Main application component.main.tsx: Entry point for the React application.components/: Reusable UI components (e.g., for interview controls, layout, Shadcn/ui components).hooks/: Custom React hooks.lib/: Utility functions.pages/: Top-level page components (e.g., Home, Practice, About).utils/: Utility functions, including API service calls (api.ts,apiService.ts).
package.json: Lists Node.js dependencies and scripts for the frontend.vite.config.ts: Configuration for the Vite build tool.tailwind.config.ts: Configuration for Tailwind CSS.
- Real-time feedback during the interview session.
- More detailed eye-tracking analysis.
- User accounts and history of past interviews.
- Customizable interview question sets.
- Advanced STAR method adherence scoring.