This repository contains practical exercises and mini-projects related to data preprocessing and data statistics.
📂Data Practice/
Exercises focused on data preprocessing techniques.
- Titanic_Data Cleaning:
Data cleaningwith Pandas, handling missing values, outliers, duplicates, text and datetime processing. - Salary_Data Transformation:
Data transformationwith Pandas, merging datasets, summarizing, handling missing values and outliers, aggregation, pivot tables, log transformation, one-hot encoding, scaling, PCA. - Speed Dating_Feature Engineering:
Feature engineeringwith Pandas using the Speed Dating dataset to derive additional insights from the provided data.
📁 Data Project/
Projects that apply data preprocessing techniques to real-world datasets.
- Segmentation_Project: Analysis based on
RFM (Recency, Frequency, Monetary) segmentation. - TaxiFare_Project: Analysis based on
data cleaningtechniques. - Used Car Prices_Project: Analysis based on
data transformationtechniques. - Credit Transaction Anomaly Detection_Project: Analysis based on
feature engineeringtechniques.
- AARRR & Statistical Analysis: Analysis based on the
AARRRframework - Basic Statistics: Understanding the
fundamentals of statistics.
🛠️ Tech Stack
Pandas, Numpy, SQL, BigQuery, Jupyter Notebook, google colab
📊 Techniques
- Missing value handling, Outlier detection, Duplicate removal, One-hot encoding, Scaling, Principal Component Analysis (PCA), feature engineering
- Acquisition, Activation, Retention, Revenue(ARPU), Revenue(CLV), Distribution Visualization, One-Sample t-Test, Independent Sample t-Test, Paired Sample t-Test, Sampling, Confidence Interval, Hypothesis Testing, A/B Test