Yandex Practicum Data Science Projects

This repository contains a series of hands-on projects completed as part of the Yandex Practicum Data Science program. Each folder or notebook corresponds to a real-world case study, showcasing skills in data analysis, statistical modeling, machine learning, and more.

📘 About the Course

Yandex Practicum’s Data Science track provides an immersive curriculum covering Python programming, data preprocessing, exploratory data analysis (EDA), statistical testing, machine learning model development, and deployment. Students work on diverse projects simulating business challenges in industries such as automotive, oil & gas, micromobility, gaming, real estate, agriculture, HR, telecom, retail, transportation, and content moderation.

📁 Project List

1. Used Car Price Estimation

Folder: auto-ml (gradient boosting)
Notebook: auto-ml.ipynb
Description:
A used-car pricing service builds an app to help owners quickly estimate their vehicle’s market value. Using historical data on technical specifications, trim levels, and sale prices, you will develop and tune a gradient-boosting regression model optimized for prediction quality, inference speed, and training time.

2. Oil Well Location & Profitability Analysis

Folder: boreholes-ml
Notebook: boreholes-ml.ipynb
Description:
An energy company must decide where to drill new wells. Given samples from three regions with oil quality and reserve volume data, build a model to predict future yields. Then apply bootstrap sampling to estimate total profit and risk for each region, and recommend the region with the highest expected return.

3. Electric Scooter Rental Analysis

Folder: data-analysis
Notebook: e-scooters-analysis.ipynb
Description:
Perform statistical and exploratory analysis for a scooter-sharing service. Load user, trip, and subscription data; preprocess and merge; calculate revenue metrics for free vs. subscription plans; and test hypotheses to guide pricing and marketing strategies.

4. Video Game Sales Analysis

Folder: data-analysis
Notebook: games-analysis.ipynb
Description:
Analyze global video game sales, user and critic ratings, genres, and platforms. Identify patterns that drive success and provide actionable insights for product development and promotional campaigns.

5. Real Estate Price Data Analysis

Folder: data-analysis
Notebook: real-estate-analysis.ipynb
Description:
Explore historical housing listings: clean anomalies, engineer features (price per square meter, floor type, distance to center, etc.), visualize distributions, and determine the key factors influencing property prices.

6. Dairy Farm Yield & Taste Prediction

Folder: farm-ml
Notebook: farm-ml.ipynb
Description:
A dairy farm owner wants to select cows that will produce at least 6,000 kg of milk per year with a desirable taste. Build two models: a regression model to predict annual yield and a classification model to predict milk taste. Finally, recommend which cows to purchase based on both criteria.

7. Employee Satisfaction & Turnover Modeling

Folder: hr-ml
Notebook: hr-ml.ipynb
Description:
HR analytics for a large organization: predict employee satisfaction scores from survey data and model the probability of churn. Provide actionable recommendations to reduce turnover and associated costs.

8. Customer Purchasing Activity Segmentation

Folder: market-ml
Notebook: market-ml.ipynb
Description:
An e-commerce retailer wants to personalize offers for loyal customers to boost engagement. Using transaction, web-behavior, and communication data, label customers by activity level, build supervised models with pipelines, evaluate feature importance via SHAP, and perform customer segmentation with business recommendations.

9. Telecom Customer Churn Prediction

Folder: final-project
Notebook: final-project.ipynb
Description:
For a telecom operator, develop a model to predict contract cancellations. Combine contract, personal, internet, and phone service data to train and evaluate churn-prediction models. Deliver insights to target high-risk subscribers with retention offers.

10. Age Detection at Supermarket Checkout (Computer Vision)

Folder: neural-networks
Notebook / Script:

computer-vision-project.ipynb
computer-vision-project.py
Description:
Implement a computer-vision pipeline to estimate customer age group at supermarket checkouts. Preprocess image data, train a neural network, and build an inference workflow for personalized marketing and age-restricted sales compliance.

11. Airport Taxi Demand Forecasting

Folder: taxi-ml (time-series)
Notebook: taxi-ml.ipynb
Description:
Forecast hour-ahead taxi demand at airports using time-series modeling. The goal is to achieve RMSE ≤ 48 on the test set to ensure reliable driver allocation during peak periods.

12. Toxic Comment Classification

Folder: text-ml
Notebooks:

text-ml.ipynb
text-ml-v1 BERT.ipynb
training-task-texts.ipynb
Description:
Train a text-classification model to detect toxic user comments on a product review platform. Using labeled data (toxic_comments.csv), achieve F1 ≥ 0.75 to enable automated moderation of harmful content.

🛠 Technologies & Tools

Languages: Python
Libraries: pandas, NumPy, matplotlib, scikit-learn, statsmodels, XGBoost / LightGBM, TensorFlow / PyTorch, SHAP
Environments: Jupyter Notebook, GitHub
Techniques:
- Data cleaning & preprocessing
- Exploratory Data Analysis (EDA)
- Statistical hypothesis testing & A/B testing
- Regression (Linear, Gradient Boosting)
- Classification (Logistic Regression, Neural Networks, Transformers)
- Time-series forecasting
- Bootstrap & risk analysis
- Computer Vision pipelines
- Model pipelines & hyperparameter tuning
- Feature importance & interpretability (SHAP)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Yandex Practicum Data Science Projects

📘 About the Course

📁 Project List

1. Used Car Price Estimation

2. Oil Well Location & Profitability Analysis

3. Electric Scooter Rental Analysis

4. Video Game Sales Analysis

5. Real Estate Price Data Analysis

6. Dairy Farm Yield & Taste Prediction

7. Employee Satisfaction & Turnover Modeling

8. Customer Purchasing Activity Segmentation

9. Telecom Customer Churn Prediction

10. Age Detection at Supermarket Checkout (Computer Vision)

11. Airport Taxi Demand Forecasting

12. Toxic Comment Classification

🛠 Technologies & Tools

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
auto-ml (gradient boosting)		auto-ml (gradient boosting)
boreholes-ml		boreholes-ml
data-analysis		data-analysis
farm-ml		farm-ml
final-project		final-project
hr-ml		hr-ml
market-ml		market-ml
neural-networks		neural-networks
taxi-ml (time-series)		taxi-ml (time-series)
text-ml		text-ml
.gitignore		.gitignore
README.md		README.md

nikkhav/yandex-practicum-projects

Folders and files

Latest commit

History

Repository files navigation

Yandex Practicum Data Science Projects

📘 About the Course

📁 Project List

1. Used Car Price Estimation

2. Oil Well Location & Profitability Analysis

3. Electric Scooter Rental Analysis

4. Video Game Sales Analysis

5. Real Estate Price Data Analysis

6. Dairy Farm Yield & Taste Prediction

7. Employee Satisfaction & Turnover Modeling

8. Customer Purchasing Activity Segmentation

9. Telecom Customer Churn Prediction

10. Age Detection at Supermarket Checkout (Computer Vision)

11. Airport Taxi Demand Forecasting

12. Toxic Comment Classification

🛠 Technologies & Tools

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages