Skip to content

Add comprehensive statistical testing utilities#17

Open
dgenio wants to merge 1 commit intodevelopfrom
feature/statistical-testing
Open

Add comprehensive statistical testing utilities#17
dgenio wants to merge 1 commit intodevelopfrom
feature/statistical-testing

Conversation

@dgenio
Copy link
Owner

@dgenio dgenio commented Sep 14, 2025

  • Add t_test, mann_whitney_u_test for comparing samples
  • Add chi_square_test for goodness of fit and independence
  • Add kolmogorov_smirnov_test for distribution testing
  • Add bootstrap_confidence_interval for robust CI estimation
  • Add permutation_test for non-parametric testing
  • Add multiple_comparison_correction for multiple testing
  • Add power_analysis and sample_size_calculation for study design
  • Include comprehensive error handling and validation
  • Add StatisticalTest dataclass for structured results
  • Update init.py to export statistical functions
  • Add comprehensive test suite for all statistical functions

Pull Request

📋 Description

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🧹 Code refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔧 Build/CI improvements

🔗 Related Issues

  • Fixes #
  • Related to #

🧪 Testing

Test Coverage

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have added integration tests if applicable

Manual Testing

  • Tested locally with make check
  • Tested with example scripts
  • Tested edge cases

Test commands run:

# List the commands you used to test
make check
python examples/quickstart.py

📝 Changes Made

Code Changes

API Changes

  • No API changes
  • Backward compatible API additions
  • Breaking API changes (requires major version bump)

API changes:

📚 Documentation

  • I have updated the documentation accordingly
  • I have updated docstrings for new/modified functions
  • I have added examples if applicable
  • I have updated the CHANGELOG.md

✅ Checklist

Code Quality

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have run ruff check and ruff format
  • I have run mypy type checking

Testing & CI

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • All CI checks pass
  • Code coverage is maintained or improved

Documentation & Communication

  • I have made corresponding changes to the documentation
  • My commit messages follow the conventional commit format
  • I have updated the CHANGELOG.md if applicable

🔍 Review Notes

Focus Areas

Questions for Reviewers

📸 Screenshots/Examples

# Example usage of new feature
import skdr_eval

# Your example here

🚀 Deployment Notes

  • No special deployment considerations
  • Requires database migrations
  • Requires environment variable changes
  • Requires dependency updates

Additional Context:

- Add t_test, mann_whitney_u_test for comparing samples
- Add chi_square_test for goodness of fit and independence
- Add kolmogorov_smirnov_test for distribution testing
- Add bootstrap_confidence_interval for robust CI estimation
- Add permutation_test for non-parametric testing
- Add multiple_comparison_correction for multiple testing
- Add power_analysis and sample_size_calculation for study design
- Include comprehensive error handling and validation
- Add StatisticalTest dataclass for structured results
- Update __init__.py to export statistical functions
- Add comprehensive test suite for all statistical functions
@dgenio dgenio requested a review from Copilot September 14, 2025 10:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive statistical testing utilities to the skdr-eval library to enhance its analytical capabilities for offline policy evaluation. The changes include a wide range of statistical tests and utilities commonly needed in experimental design and hypothesis testing.

  • Implementation of parametric (t-test) and non-parametric (Mann-Whitney U) tests for sample comparison
  • Addition of goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) for distribution testing
  • Bootstrap and permutation testing methods for robust statistical inference
  • Power analysis and sample size calculation utilities for experimental planning
  • Multiple comparison correction methods and comprehensive error handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/skdr_eval/statistical.py Main implementation of statistical functions with comprehensive error handling and structured results
src/skdr_eval/init.py Export statistical functions and exception classes to the public API
tests/test_statistical.py Comprehensive test suite covering all statistical functions with edge cases and error conditions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +93 to +95
se = np.sqrt(np.var(sample1, ddof=1) / len(sample1) + np.var(sample2, ddof=1) / len(sample2))
if equal_var:
se = se * np.sqrt(1 / len(sample1) + 1 / len(sample2))
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard error calculation is incorrect when equal_var=True. Line 95 modifies the standard error computed on line 93, but for equal variance t-tests, the pooled standard error should be calculated differently using the pooled variance formula.

Suggested change
se = np.sqrt(np.var(sample1, ddof=1) / len(sample1) + np.var(sample2, ddof=1) / len(sample2))
if equal_var:
se = se * np.sqrt(1 / len(sample1) + 1 / len(sample2))
n1, n2 = len(sample1), len(sample2)
var1, var2 = np.var(sample1, ddof=1), np.var(sample2, ddof=1)
if equal_var:
pooled_var = (((n1 - 1) * var1) + ((n2 - 1) * var2)) / (n1 + n2 - 2)
se = np.sqrt(pooled_var * (1/n1 + 1/n2))
else:
se = np.sqrt(var1 / n1 + var2 / n2)

Copilot uses AI. Check for mistakes.

# Calculate effect size (Cramér's V)
n = np.sum(observed)
effect_size = np.sqrt(chi2_stat / (n * (min(observed.shape) - 1)))
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cramér's V calculation is incorrect for 1D arrays. observed.shape for a 1D array returns a tuple like (5,), so min(observed.shape) returns the length of the array, not the number of dimensions. For goodness-of-fit tests with 1D data, the formula should use len(observed) - 1 directly.

Suggested change
effect_size = np.sqrt(chi2_stat / (n * (min(observed.shape) - 1)))
effect_size = np.sqrt(chi2_stat / (n * (len(observed) - 1)))

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
"assess_propensity_calibration",
"assess_propensity_discrimination",
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all list includes many functions and classes that are not imported in the file. These exports will cause ImportError when users try to import them, as they are not available in the module namespace.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +51
"check_propensity_balance",
"check_propensity_overlap",
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all list includes many functions and classes that are not imported in the file. These exports will cause ImportError when users try to import them, as they are not available in the module namespace.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +56
"comprehensive_propensity_diagnostics",
"compute_balance_statistics",
"compute_propensity_log_loss",
"compute_propensity_statistics",
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all list includes many functions and classes that are not imported in the file. These exports will cause ImportError when users try to import them, as they are not available in the module namespace.

Copilot uses AI. Check for mistakes.
"compute_propensity_statistics",
"dr_value_with_clip",
"evaluate_pairwise_models",
"evaluate_propensity_diagnostics",
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all list includes many functions and classes that are not imported in the file. These exports will cause ImportError when users try to import them, as they are not available in the module namespace.

Copilot uses AI. Check for mistakes.
"evaluate_sklearn_models",
"fit_outcome_crossfit",
"fit_propensity_timecal",
"generate_propensity_report",
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all list includes many functions and classes that are not imported in the file. These exports will cause ImportError when users try to import them, as they are not available in the module namespace.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +88
"PropensityDiagnostics",
"BootstrapError",
"ConfigurationError",
"ConvergenceError",
"DataValidationError",
"EstimationError",
"InsufficientDataError",
"MemoryError",
"ModelValidationError",
"OutcomeModelError",
"PairwiseEvaluationError",
"PolicyInductionError",
"PropensityScoreError",
"SkdrEvalError",
"VersionError",
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all list includes many functions and classes that are not imported in the file. These exports will cause ImportError when users try to import them, as they are not available in the module namespace.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants