Skip to content

MatricalDefunkt/ai-interview-trainer

 
 

Repository files navigation

Adept AI Interview Trainer

Adept AI Interview Trainer is an AI-powered platform designed to help users practice and improve their interview skills. It records video and audio of a mock interview, analyzes various aspects of the user's performance using machine learning models, and provides comprehensive feedback generated by Google Gemini.

Features

  • Video and Audio Recording: Captures the user's interview performance.
  • Multi-Modal Analysis:
    • Audio Analysis:
      • Speech-to-text transcription (including filler words) using Crisper Whisper.
      • Metrics: Pace (WPM), pitch, volume, pause duration, and filler word count.
    • Video Analysis:
      • Emotion detection (dominant emotion and distribution).
      • Posture analysis (upright posture percentage).
      • Eye tracking (future enhancement).
  • AI-Generated Feedback: Google Gemini synthesizes the audio and video analysis to provide:
    • Strengths and areas for improvement.
    • Scores for clarity, relevance, tone, and vocabulary.
    • Actionable insights and suggestions.
  • Personalized Experience: Tailors feedback to the specific interview question asked.
  • User-Friendly Interface: Web-based platform for easy access and interaction.

Technologies Used

Backend (Python - Flask):

  • Flask: Web framework.
  • Librosa: Audio analysis (pitch, volume, pace).
  • OpenAI Whisper: Speech-to-text transcription (customized for filler word detection).
  • OpenCV: Video processing (emotion detection, posture analysis).
  • TensorFlow/Keras: For the deep learning model (best_model_full.h5) used in video analysis.
  • Google Gemini API: For generating comprehensive analytical feedback.
  • FFmpeg: For video file conversion (e.g., webm to mp4).

Frontend (React - Vite):

  • React: JavaScript library for building user interfaces.
  • TypeScript: Superset of JavaScript adding static typing.
  • Vite: Next-generation frontend tooling (fast build and dev server).
  • Tailwind CSS: Utility-first CSS framework.
  • Shadcn/ui: Re-usable UI components.
  • Fetch: For making API requests to the backend.
  • React Router: For client-side routing.

Development & Other:

  • Jupyter Notebooks: For model development, experimentation, and API testing.
  • Git & GitHub: Version control and repository hosting.

Prerequisites

Backend:

  • Python 3.9+
  • Pip (Python package installer)
  • FFmpeg: Ensure FFmpeg is installed and accessible in your system PATH.
    • On Debian/Ubuntu: sudo apt update && sudo apt install ffmpeg
    • On macOS (using Homebrew): brew install ffmpeg
    • On Windows: Download from ffmpeg.org and add to PATH.
  • A Google Gemini API Key.

Frontend:

  • Node.js (which includes npm) or Bun.

Project Setup and Running

1. Clone the Repository:

git clone <repository-url>
cd ai-interview-trainer

2. Backend Setup (Server):

a. Navigate to the Server directory: bash cd Server

b. Create a virtual environment (recommended): bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

c. Install Python dependencies: bash pip install -r ../requirements.txt (Note: requirements.txt is in the root, so ../requirements.txt)

d. Set up Environment Variables: Create a .env file in the Server directory and add your Google Gemini API Key: env GEMINI_API_KEY=YOUR_GEMINI_API_KEY The Server/gemini.py file loads this key.

e. Run the Flask Server: bash python server.py The backend server will typically start on http://127.0.0.1:5000.

3. Frontend Setup (Website):

a. Navigate to the Website directory (from the root project folder): bash cd Website

b. Install Node.js dependencies: If using npm: bash npm install If using Bun (as indicated by bun.lockb): bash bun install

c. Run the Frontend Development Server: If using npm: bash npm run dev If using Bun: bash bun run dev The frontend development server will typically start on http://localhost:5173

4. Accessing the Application:

Open your web browser and navigate to the frontend URL (e.g., http://localhost:5173). The frontend will make requests to the backend API.

API Endpoints

The primary backend API endpoint is served by the Flask application:

  • POST /process

    • Description: Processes the uploaded video interview. It extracts audio, transcribes it, analyzes audio and video features, and then sends the data to Google Gemini for a comprehensive report.
    • Request Type: multipart/form-data
    • Form Data:
      • video: The video file (e.g., .mp4, .webm).
      • interview_question: A string containing the interview question the user was answering.
    • Success Response (200 OK):
      • A JSON object containing:
        • video_file: Filename of the processed video.
        • interview_question: The question asked.
        • status: "complete".
        • transcript: The audio transcript.
        • audio_metrics: Object with audio analysis data (e.g., avg_pause_duration_s, pace_wpm).
        • video_metrics: Object with video analysis data (e.g., dominant_emotion, upright_posture_percentage).
        • gemini_analysis: Object or string containing the feedback from Gemini (strengths, areas for improvement, scores).
      • Example: See test_results_confident_interview.json for the structure.
    • Error Responses:
      • 400 Bad Request: If video part or interview_question is missing, no file selected, or file type not allowed.
      • 500 Internal Server Error: If any part of the processing pipeline fails (e.g., audio extraction, transcription, Gemini API error, video conversion). The JSON response will contain an error message.
  • GET /

    • Description: A simple HTML form for direct upload and testing of the /process endpoint without needing the React frontend. Useful for quick backend tests.
    • Response Type: text/html

Project Structure Explanation

  • best_model_full.h5: Pre-trained deep learning model for video analysis.
  • *.ipynb (e.g., gemini.ipynb, model.ipynb): Jupyter Notebooks used for development, experimentation, and testing of different modules like Gemini API integration and model building.
  • requirements.txt: Lists Python dependencies for the backend.
  • test_results_*.json: Example JSON outputs from the /process endpoint, showing the structure of the final analysis.
  • Server/: Contains all backend Python code.
    • server.py: Main Flask application file, defines API endpoints and orchestrates the processing.
    • audio.py: Handles audio extraction, transcription (Whisper), and feature analysis (Librosa).
    • video.py: Handles video feature analysis (OpenCV, best_model_full.h5).
    • gemini.py: Interface for interacting with the Google Gemini API.
  • uploads/: Default directory where uploaded videos are temporarily stored and processed.
  • Website/: Contains all frontend React application code.
    • public/: Static assets.
    • src/: Main source code for the React application.
      • App.tsx: Main application component.
      • main.tsx: Entry point for the React application.
      • components/: Reusable UI components (e.g., for interview controls, layout, Shadcn/ui components).
      • hooks/: Custom React hooks.
      • lib/: Utility functions.
      • pages/: Top-level page components (e.g., Home, Practice, About).
      • utils/: Utility functions, including API service calls (api.ts, apiService.ts).
    • package.json: Lists Node.js dependencies and scripts for the frontend.
    • vite.config.ts: Configuration for the Vite build tool.
    • tailwind.config.ts: Configuration for Tailwind CSS.

Future Enhancements

  • Real-time feedback during the interview session.
  • More detailed eye-tracking analysis.
  • User accounts and history of past interviews.
  • Customizable interview question sets.
  • Advanced STAR method adherence scoring.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 45.5%
  • Jupyter Notebook 44.2%
  • Python 9.3%
  • Other 1.0%