Skip to content

ᵀʰⁱˢ ʳᵉᵖᵒˢⁱᵗᵒʳʸ ˢʰᵒʷᶜᵃˢᵉˢ ᵃ ᶜᵒˡˡᵉᶜᵗⁱᵒⁿ ᵒᶠ ʷᵉˡˡ-ᵈᵒᶜᵘᵐᵉⁿᵗᵉᵈ ᵖʳᵒʲᵉᶜᵗˢ ᶠᵒᶜᵘˢᵉᵈ ᵒⁿ ᴴʸᵖᵒᵗʰᵉˢⁱˢ ᵀᵉˢᵗⁱⁿᵍ ᵃⁿᵈ ᵇᵒᵗʰ ᴰᵉˢᶜʳⁱᵖᵗⁱᵛᵉ ᵃⁿᵈ ᴵⁿᶠᵉʳᵉⁿᵗⁱᵃˡ ˢᵗᵃᵗⁱˢᵗⁱᶜᵃˡ ᵃⁿᵃˡʸˢⁱˢ. ᴵᵗ ⁱⁿᶜˡᵘᵈᵉˢ ᵖʳᵃᶜᵗⁱᶜᵃˡ ⁱᵐᵖˡᵉᵐᵉⁿᵗᵃᵗⁱᵒⁿˢ ᵒᶠ ˢᵗᵃᵗⁱˢᵗⁱᶜᵃˡ ᵗᵉˢᵗˢ ˢᵘᶜʰ ᵃˢ ᵗ-ᵗᵉˢᵗˢ, ᴬᴺᴼⱽᴬ, ᶜʰⁱ-ˢqᵘᵃʳᵉ, ᴹᵃⁿⁿ-ᵂʰⁱᵗⁿᵉʸ ᵁ, ᵃⁿᵈ ᶜᵒʳʳᵉˡᵃᵗⁱᵒⁿ ᵃⁿᵃˡʸˢⁱˢ, ᵃˡᵒⁿᵍˢⁱᵈᵉ ˢᵘᵐᵐᵃʳʸ ˢᵗᵃᵗⁱˢᵗⁱᶜˢ ᵃⁿᵈ ᵛⁱˢᵘᵃˡ ⁱⁿˢⁱᵍʰᵗˢ

License

Notifications You must be signed in to change notification settings

Jabulente/Statistical-Testing-in-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 𝔖𝔱𝔞𝔱𝔦𝔰𝔱𝔦𝔠𝔞𝔩 𝔗𝔢𝔰𝔱𝔦𝔫𝔤 𝔦𝔫 𝔇𝔞𝔱𝔞 𝔄𝔫𝔞𝔩𝔶𝔰𝔦𝔰 – 𝔓𝔯𝔞𝔠𝔱𝔦𝔠𝔞𝔩 ℑ𝔪𝔭𝔩𝔢𝔪𝔢𝔫𝔱𝔞𝔱𝔦𝔬𝔫

📜 Overview

This repository is a comprehensive and hands-on resource for implementing statistical tests within modern data analysis workflows. It bridges the gap between statistical theory and practical application by providing ready-to-use Python scripts and annotated Jupyter notebooks designed to work with both real-world and synthetically generated datasets.

The structure and content of the repository are carefully organized to guide users from basic descriptive statistics to advanced inferential testing, covering concepts such as:

  • Comparing group means and variances
  • Testing relationships between variables
  • Evaluating model assumptions before applying statistical methods
  • Translating test results into actionable business, scientific, or policy insights

Whether your focus is academic research, business intelligence, agriculture, healthcare, or general analytics, this repository equips you with clear, reusable, and scalable tools for performing rigorous and interpretable statistical analyses.

Objectives

The core mission of this repository is to demystify statistical testing and make it a reliable companion in data-driven decision-making.

Key goals include:

  1. Practical Accessibility – Provide ready-to-deploy scripts for commonly used tests (t-tests, ANOVA, correlation, regression diagnostics, non-parametric tests, etc.).
  2. Interpretation Beyond p-values – Equip users with statistical reasoning skills, encouraging context-aware conclusions rather than blind reliance on significance thresholds.
  3. Workflow Integration – Ensure that tests fit seamlessly into data analysis pipelines, from data cleaning to result reporting.
  4. Educational Clarity – Combine in-line explanations, interpretation guides, and best practices so that both beginners and experienced analysts can benefit.
  5. Reusability and Scalability – Provide well-structured code that adapts to various datasets and domains without heavy modification.

📦 Scope and Features

This repository goes beyond simple code snippets. It covers:

  • Descriptive Statistics – Summarization of datasets with measures of central tendency, dispersion, and distribution shape.

  • Inferential Statistics

    • Parametric tests (t-tests, ANOVA, Pearson correlation)
    • Non-parametric tests (Mann–Whitney U, Kruskal–Wallis, Spearman correlation)
    • Proportion tests (Chi-square, Fisher’s Exact)
    • Regression diagnostics and residual analysis
  • Effect Size Metrics – Quantifying the magnitude of observed effects to support practical significance.

  • Assumption Checking – Ensuring conditions for valid statistical inference (normality, homogeneity, independence).

  • Data Visualization for Statistics – Tailored plots to illustrate statistical findings (boxplots, violin plots, Q-Q plots, regression plots, confidence intervals).

  • Reproducible Jupyter Notebooks – Combining code, outputs, and commentary for self-contained learning and reference.


🛠️ Technologies and Libraries

This repository is implemented in Python 3.x and leverages the following core packages:

  • Pandas – High-performance data manipulation and analysis.
  • NumPy – Foundational package for numerical computing.
  • SciPy (scipy.stats) – Implementation of a wide range of statistical tests.
  • Statsmodels – Advanced statistical modeling, regression, and inference.
  • Matplotlib – Flexible and highly customizable plotting library.
  • Seaborn – Statistical data visualization built on Matplotlib.
  • Jupyter Notebooks – Interactive execution with embedded explanations and results.

📂 Structure

├── /notebooks            # Annotated Jupyter Notebooks for each statistical test
├── /scripts              # Modular Python scripts for integration into projects
├── /data                 # Sample real-world and synthetic datasets
├── /visualizations       # Exported statistical charts and diagnostic plots
├── README.md             # Project documentation
└── requirements.txt      # Dependencies list

🚀 How to Use

  1. Clone this repository

    git clone https://github.com/yourusername/statistical-testing.git
    cd statistical-testing
  2. Install dependencies

    pip install -r requirements.txt
  3. Explore example notebooks Open Jupyter Lab or Notebook and browse the /notebooks directory for step-by-step demonstrations.

  4. Adapt scripts for your project Import the /scripts functions into your analysis pipeline for quick deployment.


📖 Learning Path

The repository is structured to gradually increase complexity:

  1. Foundations – Basic descriptive statistics and visual summaries.
  2. Basic Inferential Tests – t-tests, correlations, chi-square tests.
  3. Intermediate Level – One-way and two-way ANOVA, non-parametric equivalents.
  4. Advanced Analysis – Regression inference, mixed-effects models, multivariate testing.
  5. Interpretation Skills – Effect size, confidence intervals, real-world implications.

🧠 Why This Matters

In modern data science, statistical literacy is not optional — it’s essential. Misapplied tests, misunderstood p-values, or ignored assumptions can lead to faulty conclusions with real-world consequences. This repository helps analysts, researchers, and decision-makers:

  • Avoid statistical pitfalls
  • Validate claims with rigorous methods
  • Communicate findings with clarity and impact

📜 License

This project is released under the MIT License, allowing full use, modification, and distribution with attribution.

About

ᵀʰⁱˢ ʳᵉᵖᵒˢⁱᵗᵒʳʸ ˢʰᵒʷᶜᵃˢᵉˢ ᵃ ᶜᵒˡˡᵉᶜᵗⁱᵒⁿ ᵒᶠ ʷᵉˡˡ-ᵈᵒᶜᵘᵐᵉⁿᵗᵉᵈ ᵖʳᵒʲᵉᶜᵗˢ ᶠᵒᶜᵘˢᵉᵈ ᵒⁿ ᴴʸᵖᵒᵗʰᵉˢⁱˢ ᵀᵉˢᵗⁱⁿᵍ ᵃⁿᵈ ᵇᵒᵗʰ ᴰᵉˢᶜʳⁱᵖᵗⁱᵛᵉ ᵃⁿᵈ ᴵⁿᶠᵉʳᵉⁿᵗⁱᵃˡ ˢᵗᵃᵗⁱˢᵗⁱᶜᵃˡ ᵃⁿᵃˡʸˢⁱˢ. ᴵᵗ ⁱⁿᶜˡᵘᵈᵉˢ ᵖʳᵃᶜᵗⁱᶜᵃˡ ⁱᵐᵖˡᵉᵐᵉⁿᵗᵃᵗⁱᵒⁿˢ ᵒᶠ ˢᵗᵃᵗⁱˢᵗⁱᶜᵃˡ ᵗᵉˢᵗˢ ˢᵘᶜʰ ᵃˢ ᵗ-ᵗᵉˢᵗˢ, ᴬᴺᴼⱽᴬ, ᶜʰⁱ-ˢqᵘᵃʳᵉ, ᴹᵃⁿⁿ-ᵂʰⁱᵗⁿᵉʸ ᵁ, ᵃⁿᵈ ᶜᵒʳʳᵉˡᵃᵗⁱᵒⁿ ᵃⁿᵃˡʸˢⁱˢ, ᵃˡᵒⁿᵍˢⁱᵈᵉ ˢᵘᵐᵐᵃʳʸ ˢᵗᵃᵗⁱˢᵗⁱᶜˢ ᵃⁿᵈ ᵛⁱˢᵘᵃˡ ⁱⁿˢⁱᵍʰᵗˢ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published