Skip to content

nikkhav/yandex-practicum-projects

Repository files navigation

Yandex Practicum Data Science Projects

This repository contains a series of hands-on projects completed as part of the Yandex Practicum Data Science program. Each folder or notebook corresponds to a real-world case study, showcasing skills in data analysis, statistical modeling, machine learning, and more.


📘 About the Course

Yandex Practicum’s Data Science track provides an immersive curriculum covering Python programming, data preprocessing, exploratory data analysis (EDA), statistical testing, machine learning model development, and deployment. Students work on diverse projects simulating business challenges in industries such as automotive, oil & gas, micromobility, gaming, real estate, agriculture, HR, telecom, retail, transportation, and content moderation.


📁 Project List

1. Used Car Price Estimation

Folder: auto-ml (gradient boosting)
Notebook: auto-ml.ipynb
Description:
A used-car pricing service builds an app to help owners quickly estimate their vehicle’s market value. Using historical data on technical specifications, trim levels, and sale prices, you will develop and tune a gradient-boosting regression model optimized for prediction quality, inference speed, and training time.

2. Oil Well Location & Profitability Analysis

Folder: boreholes-ml
Notebook: boreholes-ml.ipynb
Description:
An energy company must decide where to drill new wells. Given samples from three regions with oil quality and reserve volume data, build a model to predict future yields. Then apply bootstrap sampling to estimate total profit and risk for each region, and recommend the region with the highest expected return.

3. Electric Scooter Rental Analysis

Folder: data-analysis
Notebook: e-scooters-analysis.ipynb
Description:
Perform statistical and exploratory analysis for a scooter-sharing service. Load user, trip, and subscription data; preprocess and merge; calculate revenue metrics for free vs. subscription plans; and test hypotheses to guide pricing and marketing strategies.

4. Video Game Sales Analysis

Folder: data-analysis
Notebook: games-analysis.ipynb
Description:
Analyze global video game sales, user and critic ratings, genres, and platforms. Identify patterns that drive success and provide actionable insights for product development and promotional campaigns.

5. Real Estate Price Data Analysis

Folder: data-analysis
Notebook: real-estate-analysis.ipynb
Description:
Explore historical housing listings: clean anomalies, engineer features (price per square meter, floor type, distance to center, etc.), visualize distributions, and determine the key factors influencing property prices.

6. Dairy Farm Yield & Taste Prediction

Folder: farm-ml
Notebook: farm-ml.ipynb
Description:
A dairy farm owner wants to select cows that will produce at least 6,000 kg of milk per year with a desirable taste. Build two models: a regression model to predict annual yield and a classification model to predict milk taste. Finally, recommend which cows to purchase based on both criteria.

7. Employee Satisfaction & Turnover Modeling

Folder: hr-ml
Notebook: hr-ml.ipynb
Description:
HR analytics for a large organization: predict employee satisfaction scores from survey data and model the probability of churn. Provide actionable recommendations to reduce turnover and associated costs.

8. Customer Purchasing Activity Segmentation

Folder: market-ml
Notebook: market-ml.ipynb
Description:
An e-commerce retailer wants to personalize offers for loyal customers to boost engagement. Using transaction, web-behavior, and communication data, label customers by activity level, build supervised models with pipelines, evaluate feature importance via SHAP, and perform customer segmentation with business recommendations.

9. Telecom Customer Churn Prediction

Folder: final-project
Notebook: final-project.ipynb
Description:
For a telecom operator, develop a model to predict contract cancellations. Combine contract, personal, internet, and phone service data to train and evaluate churn-prediction models. Deliver insights to target high-risk subscribers with retention offers.

10. Age Detection at Supermarket Checkout (Computer Vision)

Folder: neural-networks
Notebook / Script:

  • computer-vision-project.ipynb
  • computer-vision-project.py
    Description:
    Implement a computer-vision pipeline to estimate customer age group at supermarket checkouts. Preprocess image data, train a neural network, and build an inference workflow for personalized marketing and age-restricted sales compliance.

11. Airport Taxi Demand Forecasting

Folder: taxi-ml (time-series)
Notebook: taxi-ml.ipynb
Description:
Forecast hour-ahead taxi demand at airports using time-series modeling. The goal is to achieve RMSE ≤ 48 on the test set to ensure reliable driver allocation during peak periods.

12. Toxic Comment Classification

Folder: text-ml
Notebooks:

  • text-ml.ipynb
  • text-ml-v1 BERT.ipynb
  • training-task-texts.ipynb
    Description:
    Train a text-classification model to detect toxic user comments on a product review platform. Using labeled data (toxic_comments.csv), achieve F1 ≥ 0.75 to enable automated moderation of harmful content.

🛠 Technologies & Tools

  • Languages: Python
  • Libraries: pandas, NumPy, matplotlib, scikit-learn, statsmodels, XGBoost / LightGBM, TensorFlow / PyTorch, SHAP
  • Environments: Jupyter Notebook, GitHub
  • Techniques:
    • Data cleaning & preprocessing
    • Exploratory Data Analysis (EDA)
    • Statistical hypothesis testing & A/B testing
    • Regression (Linear, Gradient Boosting)
    • Classification (Logistic Regression, Neural Networks, Transformers)
    • Time-series forecasting
    • Bootstrap & risk analysis
    • Computer Vision pipelines
    • Model pipelines & hyperparameter tuning
    • Feature importance & interpretability (SHAP)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published