Worked on a simulation demonstrating the critical role of data science in British Airways' success. The project involves scraping and analyzing customer review data to uncover key insights and building a predictive model to identify factors influencing buying behaviour.
| Name | Description |
|---|---|
Review |
Customers' comments on specific airlines flown |
Rating |
Overall rating on specific airlines flown |
Country |
Customers' country of origin |
Type Of Traveller |
Type of customer (e.g., Business Traveler, Leisure Traveler, etc.) |
Seat Type |
Class of the seat (e.g., Economy, Business, First Class) |
Date Flown |
Date when the customer flew with the airline |
Recommended |
Indicates if the customer recommends the airline (Yes/No) |
Aircraft |
Type or model of the aircraft flown |
Seat Comfort |
Customer's rating of seat comfort |
Cabin Staff Service |
Customer's rating of the service provided by cabin staff |
Food & Beverages |
Customer's rating of food and beverages |
Inflight Entertainment |
Customer's rating of inflight entertainment options |
Ground Service |
Customer's rating of ground services provided by the airline |
Wifi & Connectivity |
Customer's rating of the onboard wifi and connectivity |
Value For Money |
Customer's rating of the overall value for money |
Month Flown |
Month when the flight was taken |
Year Flown |
Year when the flight was taken |
Departures |
Departure airport or city |
Arrivals |
Arrival airport or city |
Layover |
Indicates if there was a layover and its details (if applicable) |
- Data Cleaning & Preprocessing: Handling missing values, normalizing text, and feature cleaning.
- Exploratory Data Analysis (EDA): Understanding distribution patterns, trends, and relationships.
- Natural Language Processing (NLP): Sentiment analysis using VADER and word cloud analysis.
- Predictive Modeling: Implementing classification models to predict customer recommendations.
- Model Evaluation & Interpretation: Assessing model accuracy and analyzing feature importance.
- Natural Language Processing (NLP): VADER sentiment analysis, word cloud visualization.
- Random Forest Classifier: Used for predicting customer recommendations.
- Feature Importance Analysis: Identifying key factors that influence customer recommendations.
- Evaluation Metrics: Accuracy score, confusion matrix, classification report.
- Programming Language: Python
- Libraries: pandas, numpy, scikit-learn, regex, matplotlib
- Visualization Tools: plotly, seaborn
- Data Collection: Gather customer reviews and flight booking data.
- Data Cleaning: Process missing values, normalize text data, and clean categorical fields.
- Exploratory Data Analysis: Generate summary statistics, visualizations, and sentiment insights.
- Feature Engineering: Extract relevant attributes for better prediction accuracy.
- Model Training: Train a Random Forest Classifier for recommendation prediction.
- Model Evaluation: Analyze accuracy scores, classification reports, and confusion matrices.
- Results Interpretation: Identify key insights from feature importance and sentiment analysis.