An AI-powered interview practice system that generates role-specific questions and provides personalized feedback using Retrieval-Augmented Generation (RAG) and Chain-of-Thought evaluation.
- Intelligent Question Generation: Uses RAG to generate role-specific questions grounded in curated job descriptions
- Chain-of-Thought Evaluation: Provides detailed reasoning for feedback scores with transparent analysis
- Performance Analytics: Track your progress across multiple interview sessions
- Multi-Category Questions: Technical, behavioral, and situational questions tailored to your role
- Progress Tracking: Visualize improvement over time with detailed metrics
- Personalized Feedback: Tailored suggestions based on comprehensive response analysis
- Resume Integration: Upload your resume for more personalized question generation
- Session History: Review all past interviews with complete feedback
Check out the demo video, to see the application in action!
mock.interview.agent.mp4
┌─────────────────────────────────┐
│ Streamlit Web Interface │
│ (User Input & Visualization) │
└───────────────┬─────────────────┘
│
┌───────▼────────┐
│ Question Gen │
│ (RAG-based) │
└───────┬────────┘
│
┌───────▼────────┐
│ FAISS Vector │
│ Database │
│ (Embeddings) │
└───────┬────────┘
│
┌───────▼────────┐
│ Evaluator │
│(Chain-of-Thought)│
└────────────────┘
- Frontend: Streamlit
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Vector Database: FAISS (Facebook AI Similarity Search)
- Evaluation: Custom Chain-of-Thought reasoning engine
- File Processing: PyPDF2, python-docx
- Data Storage: CSV, JSON
- Python 3.8 or higher
- pip (Python package manager)
- 2GB RAM minimum (for model loading)
- Clone the repository
git clone https://github.com/trishthakur/mock-interview-agent.git
cd mock-interview-agent- Create and activate virtual environment (recommended)
# On macOS/Linux
python3 -m venv venv
source venv/bin/activate
# On Windows
python -m venv venv
venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up data directories
mkdir -p dataThe application will automatically create necessary CSV files on first run.
- Run the application
streamlit run app.pyThe app will open automatically in your default browser at http://localhost:8501
mock-interview-agent/
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
├── README.md # This file
├── demo_video.mp4 # Application demonstration
│
├── data/ # Data storage
│ ├── job_descriptions.json # Curated job descriptions
│ ├── questions_bank.json # Question database by category
│ └── user_history.csv # Interview session history (auto-generated)
│
├── src/ # Core application logic
│ ├── __init__.py
│ ├── rag_engine.py # RAG implementation with FAISS
│ ├── question_generator.py # Question generation logic
│ ├── evaluator.py # Chain-of-Thought evaluator
│ └── prompts.py # Prompt templates
│
├── utils/ # Utility functions
│ ├── __init__.py
│ ├── file_handler.py # File processing (PDF, DOCX, TXT)
│ └── embeddings.py # Embedding utilities
│
└── tests/ # Unit tests
├── test_rag.py # RAG engine tests
└── test_evaluator.py # Evaluator tests
The system uses Retrieval-Augmented Generation to generate relevant questions:
Step 1: Embedding Creation
- Job descriptions and questions are encoded using Sentence Transformers
- Embeddings are stored in a FAISS vector database for fast retrieval
Step 2: Semantic Search
- When a job description is selected, the system searches for similar questions
- FAISS performs cosine similarity search across the question bank
Step 3: Context-Aware Filtering
- Questions are filtered by category (technical/behavioral/situational)
- Difficulty level matching
- Skill alignment with job requirements
- Duplicate prevention (won't ask same question twice)
Step 4: Question Selection
- Top 5 most relevant questions are identified
- Random selection from top results for variety
- Fallback to curated question bank if needed
# Example: Generate a question
question = question_gen.generate_question(
job_context={
'title': 'Software Engineer',
'description': 'Full-stack development with React and Node.js...'
},
category='technical',
difficulty='Medium'
)The evaluator analyzes responses using explicit reasoning steps:
Evaluation Process:
Step 1: Length Analysis
├─ Count words in response
├─ Compare against thresholds (20, 50, 100+ words)
└─ Score: 0-100 based on detail level
Step 2: Structure Analysis (STAR Method)
├─ Search for Situation indicators
├─ Search for Task indicators
├─ Search for Action indicators
├─ Search for Result indicators
└─ Score: 25 points per component found
Step 3: Specificity Analysis
├─ Detect percentages/metrics (e.g., "increased by 40%")
├─ Detect timeframes (e.g., "6 months", "2 years")
├─ Detect technologies (e.g., "React", "Python", "AWS")
├─ Detect quantifiable improvements
└─ Score: 0-100 based on concrete examples
Step 4: Relevance Analysis
├─ Extract keywords from question
├─ Extract keywords from response
├─ Calculate keyword overlap
└─ Score: 0-100 based on alignment
Final Score = Weighted Average:
├─ Length: 20%
├─ Structure: 25%
├─ Specificity: 25%
└─ Relevance: 30%
Example Evaluation Output:
Score: 85%
Chain-of-Thought Reasoning:
Step 1 - Length Analysis: Response contains 127 words.
→ Excellent length, provides detailed information
Step 2 - Structure Analysis: Checking for clear organization...
→ Found 4/4 STAR components: SITUATION, TASK, ACTION, RESULT
Step 3 - Specificity Analysis: Checking for concrete examples...
→ Strong specificity with: percentages/metrics, timeframes, specific technologies
Step 4 - Relevance Analysis: Assessing alignment with question...
→ High relevance: 75% keyword alignment
Strengths:
- Comprehensive response with 127 words
- Well-structured answer using SITUATION, TASK, ACTION, RESULT
- Included specific details: metrics, timeframes, technologies
Areas for Improvement:
- Consider mentioning team collaboration aspects
Based on the evaluation, the system provides:
- Strengths: Specific things done well with reasoning
- Improvements: Actionable, concrete suggestions
- Follow-up Questions: Generated when score < 70%
- Score Breakdown: Transparent reasoning for every point
- Progress Tracking: Historical performance analysis
Select Job Description
- Choose from pre-loaded library of common roles
- Upload custom job description (PDF/TXT)
- Paste job description text manually
Optional: Upload Resume
- Helps personalize questions to your background
- Supports PDF, TXT, and DOCX formats
Practice Workflow
- Click "Generate New Question" to get a role-specific question
- Optionally filter by category (Technical/Behavioral/Situational)
- Select difficulty level (Easy/Medium/Hard)
- Type your response (aim for 50+ words)
- Click "Submit for Evaluation"
- Review detailed feedback and scoring
Question Categories
- Technical: Coding, system design, debugging, technologies
- Behavioral: Past experiences, teamwork, conflict resolution
- Situational: Hypothetical scenarios, decision-making
Review Past Interviews
- View all previous questions and responses
- Filter by category
- Sort by date, highest score, or lowest score
- Review feedback for each response
- Track improvement areas
Performance Metrics
- Total questions completed
- Average score across all responses
- Highest and lowest scores
- Performance by category breakdown
- Score progression over time (line chart)
- Common improvement areas
Every behavioral/situational response should include:
- Situation: Set the context (when, where, what was happening)
- Task: Explain your responsibility (what needed to be done)
- Action: Describe what YOU did (specific steps taken)
- Result: Share the outcome (metrics, achievements, learnings)
Example:
❌ Bad: "I fixed a bug in production."
✅ Good: "Last quarter at TechCorp (Situation), our payment system
was experiencing 5% transaction failures (Task). I analyzed server
logs, identified a race condition in our Redis cache, implemented
mutex locks, and deployed the fix (Action). This reduced failures
to 0.1% and saved $50K in lost revenue (Result)."
Include concrete details:
- Numbers: "increased by 40%", "reduced time from 2 hours to 15 minutes"
- Technologies: "React", "PostgreSQL", "AWS Lambda", "Docker"
- Timeframes: "over 6 months", "within 2 weeks"
- Team size: "led a team of 5 engineers"
- Impact: "affected 10,000 users", "saved 20 hours per week"
- Minimum: 50 words (brief responses score poorly)
- Optimal: 75-150 words (detailed but concise)
- Maximum: 200 words (avoid rambling)
- Read carefully and answer ALL parts
- Use keywords from the question in your response
- Stay on topic throughout
Edit data/job_descriptions.json:
{
"id": 4,
"title": "DevOps Engineer",
"company": "Cloud Systems Inc",
"description": "We're seeking a DevOps Engineer to manage our cloud infrastructure. You'll work with Kubernetes, Terraform, and CI/CD pipelines. Requirements include 3+ years of experience, AWS/GCP expertise, and scripting skills in Python or Bash.",
"skills": ["Kubernetes", "Terraform", "AWS", "Python", "CI/CD"],
"level": "Mid-Senior"
}Edit data/questions_bank.json:
{
"technical": [
{
"question": "Explain how you would implement a rate limiter for an API.",
"difficulty": "Hard",
"skills": ["System Design", "Backend", "Scalability"],
"category": "technical"
}
]
}Modify src/evaluator.py:
self.evaluation_criteria = {
'length': {'weight': 0.2, 'threshold': 50}, # 20% of score
'structure': {'weight': 0.25, 'threshold': 0.6}, # 25% of score
'specificity': {'weight': 0.25, 'threshold': 0.5}, # 25% of score
'relevance': {'weight': 0.3, 'threshold': 0.6} # 30% of score
}In src/rag_engine.py and utils/embeddings.py:
# Current: all-MiniLM-L6-v2 (fast, 384 dimensions)
# Alternative options:
model = SentenceTransformer('all-mpnet-base-v2') # More accurate, slower
model = SentenceTransformer('paraphrase-MiniLM-L3-v2') # Faster, smallerpython -m pytest tests/ -vpython -m pytest tests/test_rag.py -vpip install pytest-cov
python -m pytest tests/ --cov=src --cov-report=htmlstreamlit run app.py --server.port 8501- Push code to GitHub
- Go to share.streamlit.io
- Connect your GitHub repository
- Deploy with one click
Question: "Tell me about a time you optimized system performance."
Response: "At TechCorp last year, our API response times increased to 3 seconds during peak traffic (Situation). I was tasked with reducing this to under 500ms (Task). I profiled the application, identified N+1 queries in our ORM, implemented Redis caching for frequently accessed data, and added database indexes on commonly queried fields (Action). Response times dropped to 200ms average, handling 5x more traffic, and user satisfaction increased by 40% (Result)."
Feedback:
- Score: 85%
- Strengths: Excellent STAR structure, specific metrics, concrete technologies
- Improvements: Could mention team collaboration
Question: "Tell me about a time you optimized system performance."
Response: "I made the system faster by fixing some code issues."
Feedback:
- Score: 35%
- Improvements: Too brief (only 10 words), lacks structure, no specifics, no metrics
- Add more job descriptions and questions
- Implement voice recording for responses
- Add GPT integration for more advanced evaluation
- Create mobile-responsive design
- Add multi-language support
- Implement user authentication
- Add interview scheduling features
- Create Chrome extension for LinkedIn integration
This project implements concepts from:
- Retrieval-Augmented Generation (RAG): Combines retrieval systems with generation for more accurate, grounded responses
- Chain-of-Thought Prompting: Explicit reasoning steps for transparent AI decision-making
- Semantic Search: Uses embeddings for meaning-based question retrieval
- STAR Framework: Industry-standard method for structured interview responses