Skip to content

A collection of Python scripts demonstrating core statistical concepts like percentile analysis, Z-scores, modified Z-tests, and cosine similarity with real datasets and visualizations.

License

Notifications You must be signed in to change notification settings

SaurabhSSB/statistics_workout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š statistics_workout

This repository contains a collection of Python scripts that explore fundamental concepts in statistics using real-world datasets. These exercises cover techniques such as percentile-based filtering, Z-score calculations, modified Z-tests, and cosine similarity, enhanced with data visualization using Seaborn and Matplotlib.

πŸ“ Contents

File Name Description
1_percentile.py Calculates percentiles and removes outliers based on the 90th percentile of household size.
2_mean_absolute_deviation_standard_deviation_z_value.py Performs outlier detection using standard deviation and Z-scores on BMI data.
3_log.py Visualizes highway population data and introduces logarithmic plotting.
4_Normal.py Plots income vs credit limit with log-scaled axes using Seaborn.
5_cosine.py Demonstrates cosine similarity and cosine distance for basic NLP-like document vectors.
6_modified_z_test.py Implements both standard Z-score and Modified Z-score methods for income-based outlier detection.
modified_z_score.xlsx Example Excel sheet supporting the modified Z-score implementation.

πŸ”§ Technologies Used

  • Python 3.x
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

πŸ“Œ Key Concepts

  • Descriptive Statistics
  • Percentile Analysis
  • Outlier Detection (Z-score, Modified Z-score)
  • Data Cleaning & Preprocessing
  • Cosine Similarity & Distance
  • Data Visualization

πŸš€ How to Run

  1. Clone the repository:
    git clone https://github.com/your-username/statistics_workout.git
    cd statistics_workout
  2. Install required libraries:
    pip install pandas numpy matplotlib seaborn scikit-learn
  3. Run any script using:
    python filename.py

⚠️ Ensure that the necessary CSV files are placed in the correct paths or update the paths in the scripts accordingly.

πŸ“₯ Download

Click here to download this repository as a ZIP file


πŸ“« Contact

If you have questions or suggestions, feel free to reach out via GitHub Issues.

About

A collection of Python scripts demonstrating core statistical concepts like percentile analysis, Z-scores, modified Z-tests, and cosine similarity with real datasets and visualizations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages