Mock Interview Agent

An AI-powered interview practice system that generates role-specific questions and provides personalized feedback using Retrieval-Augmented Generation (RAG) and Chain-of-Thought evaluation.

Features

Intelligent Question Generation: Uses RAG to generate role-specific questions grounded in curated job descriptions
Chain-of-Thought Evaluation: Provides detailed reasoning for feedback scores with transparent analysis
Performance Analytics: Track your progress across multiple interview sessions
Multi-Category Questions: Technical, behavioral, and situational questions tailored to your role
Progress Tracking: Visualize improvement over time with detailed metrics
Personalized Feedback: Tailored suggestions based on comprehensive response analysis
Resume Integration: Upload your resume for more personalized question generation
Session History: Review all past interviews with complete feedback

Demo

Check out the demo video, to see the application in action!

mock.interview.agent.mp4

Architecture

┌─────────────────────────────────┐
│     Streamlit Web Interface     │
│  (User Input & Visualization)   │
└───────────────┬─────────────────┘
                │
        ┌───────▼────────┐
        │ Question Gen   │
        │  (RAG-based)   │
        └───────┬────────┘
                │
        ┌───────▼────────┐
        │ FAISS Vector   │
        │   Database     │
        │  (Embeddings)  │
        └───────┬────────┘
                │
        ┌───────▼────────┐
        │   Evaluator    │
        │(Chain-of-Thought)│
        └────────────────┘

Technology Stack

Frontend: Streamlit
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Vector Database: FAISS (Facebook AI Similarity Search)
Evaluation: Custom Chain-of-Thought reasoning engine
File Processing: PyPDF2, python-docx
Data Storage: CSV, JSON

Getting Started

Prerequisites

Python 3.8 or higher
pip (Python package manager)
2GB RAM minimum (for model loading)

Installation

Clone the repository

git clone https://github.com/trishthakur/mock-interview-agent.git
cd mock-interview-agent

Create and activate virtual environment (recommended)

# On macOS/Linux
python3 -m venv venv
source venv/bin/activate

# On Windows
python -m venv venv
venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Set up data directories

mkdir -p data

The application will automatically create necessary CSV files on first run.

Run the application

streamlit run app.py

The app will open automatically in your default browser at http://localhost:8501

Project Structure

mock-interview-agent/
├── app.py                      # Main Streamlit application
├── requirements.txt            # Python dependencies
├── .gitignore                 # Git ignore rules
├── README.md                  # This file
├── demo_video.mp4             # Application demonstration
│
├── data/                      # Data storage
│   ├── job_descriptions.json  # Curated job descriptions
│   ├── questions_bank.json    # Question database by category
│   └── user_history.csv       # Interview session history (auto-generated)
│
├── src/                       # Core application logic
│   ├── __init__.py
│   ├── rag_engine.py         # RAG implementation with FAISS
│   ├── question_generator.py # Question generation logic
│   ├── evaluator.py          # Chain-of-Thought evaluator
│   └── prompts.py            # Prompt templates
│
├── utils/                     # Utility functions
│   ├── __init__.py
│   ├── file_handler.py       # File processing (PDF, DOCX, TXT)
│   └── embeddings.py         # Embedding utilities
│
└── tests/                     # Unit tests
    ├── test_rag.py           # RAG engine tests
    └── test_evaluator.py     # Evaluator tests

How It Works

1. RAG-Based Question Generation

The system uses Retrieval-Augmented Generation to generate relevant questions:

Step 1: Embedding Creation

Job descriptions and questions are encoded using Sentence Transformers
Embeddings are stored in a FAISS vector database for fast retrieval

Step 2: Semantic Search

When a job description is selected, the system searches for similar questions
FAISS performs cosine similarity search across the question bank

Step 3: Context-Aware Filtering

Questions are filtered by category (technical/behavioral/situational)
Difficulty level matching
Skill alignment with job requirements
Duplicate prevention (won't ask same question twice)

Step 4: Question Selection

Top 5 most relevant questions are identified
Random selection from top results for variety
Fallback to curated question bank if needed

# Example: Generate a question
question = question_gen.generate_question(
    job_context={
        'title': 'Software Engineer',
        'description': 'Full-stack development with React and Node.js...'
    },
    category='technical',
    difficulty='Medium'
)

2. Chain-of-Thought Evaluation

The evaluator analyzes responses using explicit reasoning steps:

Evaluation Process:

Step 1: Length Analysis
├─ Count words in response
├─ Compare against thresholds (20, 50, 100+ words)
└─ Score: 0-100 based on detail level

Step 2: Structure Analysis (STAR Method)
├─ Search for Situation indicators
├─ Search for Task indicators
├─ Search for Action indicators  
├─ Search for Result indicators
└─ Score: 25 points per component found

Step 3: Specificity Analysis
├─ Detect percentages/metrics (e.g., "increased by 40%")
├─ Detect timeframes (e.g., "6 months", "2 years")
├─ Detect technologies (e.g., "React", "Python", "AWS")
├─ Detect quantifiable improvements
└─ Score: 0-100 based on concrete examples

Step 4: Relevance Analysis
├─ Extract keywords from question
├─ Extract keywords from response
├─ Calculate keyword overlap
└─ Score: 0-100 based on alignment

Final Score = Weighted Average:
├─ Length:      20%
├─ Structure:   25%
├─ Specificity: 25%
└─ Relevance:   30%

Example Evaluation Output:

Score: 85%

Chain-of-Thought Reasoning:
Step 1 - Length Analysis: Response contains 127 words.
→ Excellent length, provides detailed information

Step 2 - Structure Analysis: Checking for clear organization...
→ Found 4/4 STAR components: SITUATION, TASK, ACTION, RESULT

Step 3 - Specificity Analysis: Checking for concrete examples...
→ Strong specificity with: percentages/metrics, timeframes, specific technologies

Step 4 - Relevance Analysis: Assessing alignment with question...
→ High relevance: 75% keyword alignment

Strengths:
- Comprehensive response with 127 words
- Well-structured answer using SITUATION, TASK, ACTION, RESULT
- Included specific details: metrics, timeframes, technologies

Areas for Improvement:
- Consider mentioning team collaboration aspects

3. Personalized Feedback System

Based on the evaluation, the system provides:

Strengths: Specific things done well with reasoning
Improvements: Actionable, concrete suggestions
Follow-up Questions: Generated when score < 70%
Score Breakdown: Transparent reasoning for every point
Progress Tracking: Historical performance analysis

Usage Guide

Tab 1: Setup

Select Job Description

Choose from pre-loaded library of common roles
Upload custom job description (PDF/TXT)
Paste job description text manually

Optional: Upload Resume

Helps personalize questions to your background
Supports PDF, TXT, and DOCX formats

Tab 2: Interview

Practice Workflow

Click "Generate New Question" to get a role-specific question
Optionally filter by category (Technical/Behavioral/Situational)
Select difficulty level (Easy/Medium/Hard)
Type your response (aim for 50+ words)
Click "Submit for Evaluation"
Review detailed feedback and scoring

Question Categories

Technical: Coding, system design, debugging, technologies
Behavioral: Past experiences, teamwork, conflict resolution
Situational: Hypothetical scenarios, decision-making

Tab 3: History

Review Past Interviews

View all previous questions and responses
Filter by category
Sort by date, highest score, or lowest score
Review feedback for each response
Track improvement areas

Tab 4: Analytics

Performance Metrics

Total questions completed
Average score across all responses
Highest and lowest scores
Performance by category breakdown
Score progression over time (line chart)
Common improvement areas

Tips for Better Results

Use the STAR Method

Every behavioral/situational response should include:

Situation: Set the context (when, where, what was happening)
Task: Explain your responsibility (what needed to be done)
Action: Describe what YOU did (specific steps taken)
Result: Share the outcome (metrics, achievements, learnings)

Example:

❌ Bad: "I fixed a bug in production."

✅ Good: "Last quarter at TechCorp (Situation), our payment system 
was experiencing 5% transaction failures (Task). I analyzed server 
logs, identified a race condition in our Redis cache, implemented 
mutex locks, and deployed the fix (Action). This reduced failures 
to 0.1% and saved $50K in lost revenue (Result)."

✅ Be Specific

Include concrete details:

Numbers: "increased by 40%", "reduced time from 2 hours to 15 minutes"
Technologies: "React", "PostgreSQL", "AWS Lambda", "Docker"
Timeframes: "over 6 months", "within 2 weeks"
Team size: "led a team of 5 engineers"
Impact: "affected 10,000 users", "saved 20 hours per week"

✅ Length Guidelines

Minimum: 50 words (brief responses score poorly)
Optimal: 75-150 words (detailed but concise)
Maximum: 200 words (avoid rambling)

✅ Address the Question

Read carefully and answer ALL parts
Use keywords from the question in your response
Stay on topic throughout

Customization

Adding New Job Descriptions

Edit data/job_descriptions.json:

{
  "id": 4,
  "title": "DevOps Engineer",
  "company": "Cloud Systems Inc",
  "description": "We're seeking a DevOps Engineer to manage our cloud infrastructure. You'll work with Kubernetes, Terraform, and CI/CD pipelines. Requirements include 3+ years of experience, AWS/GCP expertise, and scripting skills in Python or Bash.",
  "skills": ["Kubernetes", "Terraform", "AWS", "Python", "CI/CD"],
  "level": "Mid-Senior"
}

Adding New Questions

Edit data/questions_bank.json:

{
  "technical": [
    {
      "question": "Explain how you would implement a rate limiter for an API.",
      "difficulty": "Hard",
      "skills": ["System Design", "Backend", "Scalability"],
      "category": "technical"
    }
  ]
}

Adjusting Evaluation Weights

Modify src/evaluator.py:

self.evaluation_criteria = {
    'length': {'weight': 0.2, 'threshold': 50},      # 20% of score
    'structure': {'weight': 0.25, 'threshold': 0.6}, # 25% of score
    'specificity': {'weight': 0.25, 'threshold': 0.5}, # 25% of score
    'relevance': {'weight': 0.3, 'threshold': 0.6}   # 30% of score
}

Changing Embedding Model

In src/rag_engine.py and utils/embeddings.py:

# Current: all-MiniLM-L6-v2 (fast, 384 dimensions)
# Alternative options:
model = SentenceTransformer('all-mpnet-base-v2')  # More accurate, slower
model = SentenceTransformer('paraphrase-MiniLM-L3-v2')  # Faster, smaller

Testing

Run All Tests

python -m pytest tests/ -v

Run Specific Test File

python -m pytest tests/test_rag.py -v

Test Coverage

pip install pytest-cov
python -m pytest tests/ --cov=src --cov-report=html

Deployment Options

Local Development

streamlit run app.py --server.port 8501

Streamlit Cloud (Free)

Push code to GitHub
Go to share.streamlit.io
Connect your GitHub repository
Deploy with one click

Example Outputs

High-Scoring Response (85%)

Question: "Tell me about a time you optimized system performance."

Response: "At TechCorp last year, our API response times increased to 3 seconds during peak traffic (Situation). I was tasked with reducing this to under 500ms (Task). I profiled the application, identified N+1 queries in our ORM, implemented Redis caching for frequently accessed data, and added database indexes on commonly queried fields (Action). Response times dropped to 200ms average, handling 5x more traffic, and user satisfaction increased by 40% (Result)."

Feedback:

Score: 85%
Strengths: Excellent STAR structure, specific metrics, concrete technologies
Improvements: Could mention team collaboration

Low-Scoring Response (35%)

Question: "Tell me about a time you optimized system performance."

Response: "I made the system faster by fixing some code issues."

Feedback:

Score: 35%
Improvements: Too brief (only 10 words), lacks structure, no specifics, no metrics

Contribution Ideas

Add more job descriptions and questions
Implement voice recording for responses
Add GPT integration for more advanced evaluation
Create mobile-responsive design
Add multi-language support
Implement user authentication
Add interview scheduling features
Create Chrome extension for LinkedIn integration

Additional Resources

Research & Methodology

This project implements concepts from:

Retrieval-Augmented Generation (RAG): Combines retrieval systems with generation for more accurate, grounded responses
Chain-of-Thought Prompting: Explicit reasoning steps for transparent AI decision-making
Semantic Search: Uses embeddings for meaning-based question retrieval
STAR Framework: Industry-standard method for structured interview responses

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
src		src
tests		tests
utils		utils
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Mock Interview Agent

Features

Demo

Architecture

Technology Stack

Getting Started

Prerequisites

Installation

Project Structure

How It Works

1. RAG-Based Question Generation

2. Chain-of-Thought Evaluation

3. Personalized Feedback System

Usage Guide

Tab 1: Setup

Tab 2: Interview

Tab 3: History

Tab 4: Analytics

Tips for Better Results

Use the STAR Method

✅ Be Specific

✅ Length Guidelines

✅ Address the Question

Customization

Adding New Job Descriptions

Adding New Questions

Adjusting Evaluation Weights

Changing Embedding Model

Testing

Run All Tests

Run Specific Test File

Test Coverage

Deployment Options

Local Development

Streamlit Cloud (Free)

Example Outputs

High-Scoring Response (85%)

Low-Scoring Response (35%)

Contribution Ideas

Additional Resources

Research & Methodology

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages