Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,76 @@

All notable changes to this project will be documented in this file.

## [0.6.0] - 2026-02-26

### Breaking changes
- **Removed `set_backend()`, `get_backend_info()`, `reset_backend()`** — only one backend (C++ native) exists since v0.5.0, so the multi-backend API was dead code. Use `from mcpower.backends import get_backend` if you need the backend instance directly
- **Removed `set_heterogeneity()` and `set_heteroskedasticity()`** — heterogeneity and heteroskedasticity are now controlled exclusively through scenario configurations (`set_scenario_configs()`). The optimistic scenario uses zero perturbation; realistic/doomer scenarios apply these automatically
- **Removed dead scipy fallback code** from `distributions.py` — scipy was never a runtime dependency since v0.5.0, so the fallback paths were unreachable dead code. The module now cleanly fails with an `ImportError` if the C++ native backend is missing
- **`_create_power_plot()` returns `fig`** — the function now accepts a `show=True` parameter and always returns the matplotlib figure object. Set `show=False` to suppress `plt.show()` for programmatic use
- **`apply()` made private (`_apply()`)** — the method is now `_apply()` and called automatically by `find_power()` / `find_sample_size()`. Direct calls should use `model._apply()` instead
- **`[all]` extra no longer includes `statsmodels`** — use `pip install mcpower[lme]` to get statsmodels for mixed-effects models

### Added
- **`test_formula` parameter** on `find_power()` and `find_sample_size()` — test a reduced model against data generated from the full model to evaluate power under model misspecification. For example, generate data with `y = x1 + x2 + x3` but test with `test_formula="y ~ x1 + x2"` to see power when `x3` is omitted. Supports interactions, factors, and mixed models.
- **C++ non-normal residual generation** — scenario perturbations now generate heavy-tailed (Student-t) and skewed (chi-squared) residuals directly in C++ via `residual_dist`/`residual_df` parameters in `generate_y()`, replacing the Python-side post-hoc perturbation approach. Applies to all model types (OLS and LME)
- **`optimistic` scenario** is now a first-class entry in `DEFAULT_SCENARIO_CONFIG` with all-zero perturbation values, eliminating the special `scenario_config=None` code path. Custom scenarios inherit from the optimistic baseline, ensuring all required keys exist

### Fixed
- **`set_variable_type()` docstring listed wrong distribution types** — documented non-existent `"skewed"` type; now lists all supported types: `right_skewed`, `left_skewed`, `high_kurtosis`, `uniform`
- **`set_scenario_configs()` docstring referenced non-existent keys** — `"effect_size_jitter"` and `"distribution_jitter"` replaced with actual keys (`correlation_noise_sd`, `distribution_change_prob`, etc.)
- **String factor levels crash in LME variance computation** — `proportions[level - 1]` crashed when factor levels were strings (e.g. `"Japan"`). Now looks up level position in the label list
- **Division by zero on constant-variance columns** — `upload_data()` normalization produced `inf`/`NaN` when a column had zero variance. Now raises `ValueError` with the column name
- **Pending state not cleared after `_apply()`** — calling `_apply()` twice could re-apply the same effects. Pending fields are now reset after each `_apply()` call
- **Parser crash on unbalanced parentheses** — unmatched `)` caused `paren_count` to go negative, producing silent misparses. Now raises `ValueError`
- **Update checker wrote cache inside installed package** — moved cache file to `~/.cache/mcpower/update_cache.json`
- **Update checker unbounded response read** — `response.read()` now limited to 1 MB
- **`scenario_config` dict access on `None`** — added `None` guards for optional scenario configuration lookups
- **NaN values in uploaded data** — `upload_data()` now rejects data containing NaN values with a clear error message listing affected columns
- **Formula minus-sign silently dropped terms** — `y = x1 - x2` silently ignored `x2`. Now raises `ValueError` explaining that term removal with `-` is not supported
- **`_create_table` crash on empty rows** — formatter now handles empty row lists by computing column widths from headers only
- **`_create_power_plot` crash when `first_achieved` not in sample sizes** — added bounds check before `.index()` call
- **Redundant `_validate_cluster_sample_size` call** — removed duplicate validation in `find_power()` (already called per-sample-size in `find_sample_size()`)

### Changed
- **`upload_data()` returns `self`** for method chaining consistency
- **Assert statements replaced with `RuntimeError`** — internal assertions now raise proper exceptions instead of using `assert`
- **Removed "(not yet implemented)" from mixed-model docstrings** — mixed model testing has been implemented since v0.4.2
- **Thread-safe RNG in data generation** — replaced global `np.random.seed()` with local `np.random.RandomState()` for thread safety
- **Update checker runs in a background thread** — no longer blocks `import mcpower` on slow networks
- **Module-level deduplication for update checker** — prevents redundant version checks within the same Python session
- **Removed unused `cluster_column_indices` parameter** from `_lme_analysis_wrapper()` and `_lme_analysis_statsmodels()` — was explicitly marked unused and kept only for API compatibility
- **Scenario formatters iterate dynamically** — no longer hardcode scenario names, enabling custom scenario display

### Packaging
- **`tqdm` added as core dependency** (`>=4.60.0`) — used for progress bars
- **Removed stale pytest warning filter** for `"Mixed-effects models are experimental"` (warning was removed in v0.5.4)
- **NumPy minimum version relaxed** to `>=1.26.0` (was `>=2.0.0`) in both build-requires and runtime dependencies
- **`scikit-build-core` bumped** to `>=0.10` (was `>=0.5`)
- **`statsmodels` added to `[dev]` extras** for test/development convenience
- **Documentation URL** now points to the GitHub wiki
- **Changelog URL** added to project URLs
- **Removed unused pytest markers** (`unit`, `integration`) — only `lme` marker remains
- **Per-module mypy overrides** replace blanket `ignore_missing_imports`

### Documentation
- Updated README requirements section: added `tqdm`, specified `NumPy (>=1.26.0)`
- Changed `pip install mcpower[all]` → `pip install mcpower[lme]` for statsmodels installation
- Wiki documentation review and cleanup: fixed broken links, corrected API signatures (`set_scenario_configs` parameter name), removed stale `apply()` and `set_heterogeneity()` wiki pages, fixed formula redundancy in Model Specification, corrected Tukey return value docs, added mixed-model caveats

### Technical
- Removed ~150 lines of dead scipy fallback shims from `distributions.py`
- Removed `_BACKEND` sentinel variable (only one backend exists)
- C++ `generate_y()` now accepts `residual_dist` and `residual_df` parameters for non-normal error generation
- `suppress_output` test fixture now actually suppresses stdout (was a no-op)
- Removed unused `correlation_matrix_3x3` test fixture
- Removed empty `tests/mcpower/` artifact directory
- Added unit tests for `ResultsProcessor` (`test_results.py`)
- Added unit tests for `normalize_upload_input` (`test_upload_data_utils.py`)
- Added integration tests for `test_formula` feature (`test_test_formula.py`)
- Added unit tests for `test_formula_utils` (`test_test_formula_utils.py`)
- Rewrote optimizer tests to test native backend directly (removed dead scipy fallback tests)

## [0.5.4] - 2026-02-22

### Changed
Expand Down
81 changes: 53 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@

It's a Python package, but prefer a graphical interface? **[MCPower GUI](https://github.com/pawlenartowicz/mcpower-gui)** is a standalone desktop app — no Python installation required. Download ready-to-run executables for Windows, Linux, and macOS from the [releases page](https://github.com/pawlenartowicz/mcpower-gui/releases/latest).

| Model setup | Results |
|:---:|:---:|
| <img src="https://raw.githubusercontent.com/pawlenartowicz/MCPower/main/docs/screenshots/gui-model-setup.png" alt="MCPower GUI — model setup" width="400"> | <img src="https://raw.githubusercontent.com/pawlenartowicz/MCPower/main/docs/screenshots/gui-results.png" alt="MCPower GUI — results" width="400"> |

## Why MCPower?

Traditional power formulas break down with interactions, correlated predictors, categorical variables, or non-normal data. MCPower simulates instead — generates thousands of datasets like yours, fits your model, and counts how often the effects are detected.
Expand Down Expand Up @@ -297,19 +301,20 @@ model.set_effects("group[2]=0.4, group[3]=0.6, covariate=0.3")
# Use "vs" syntax for pairwise comparisons + correction="tukey"
model.find_power(
sample_size=150,
target_test="group[0] vs group[1], group[0] vs group[2]",
target_test="group[1] vs group[2], group[1] vs group[3]",
correction="tukey"
)
```

### Test Individual Assumption Violations
```python
# Manually add specific violations (without full scenario analysis)
model.set_heterogeneity(0.2) # Effect sizes vary between people
model.set_heteroskedasticity(0.15) # Violation of equal variance assumption
# Add specific violations via custom scenario configs
model.set_scenario_configs({
"my_test": {"heterogeneity": 0.2, "heteroskedasticity": 0.15}
})

# Run with your manual settings (no automatic scenario variations)
model.find_sample_size(target_test="treatment")
# Run with scenario variations
model.find_sample_size(target_test="treatment", scenarios=True)
```

### Mixed-Effects Models
Expand Down Expand Up @@ -392,7 +397,7 @@ model.find_power(sample_size=200, progress_callback=False)
| **Factor effects** | **`model.set_effects("var[2]=0.5, var[3]=0.7")`** |
| Correlated predictors | `model.set_correlations("corr(var1, var2)=0.4")` |
| Multiple testing correction | Add `correction="FDR"`, `"Holm"`, `"Bonferroni"`, or `"Tukey"`|
| Post-hoc pairwise comparison | `target_test="group[0] vs group[1]"` with `correction="tukey"` |
| Post-hoc pairwise comparison | `target_test="group[1] vs group[2]"` with `correction="tukey"` |
| Mixed model (random intercept) | `MCPower("y ~ x + (1\|group)")` + `model.set_cluster(...)` |
| Random slopes | `MCPower("y ~ x + (1+x\|group)")` + `set_cluster(..., random_slopes=["x"], slope_variance=0.1)` |
| Nested random effects | `MCPower("y ~ x + (1\|A/B)")` + two `set_cluster()` calls |
Expand Down Expand Up @@ -424,7 +429,7 @@ model.find_power(sample_size=200, progress_callback=False)
- For simple models where all assumptions are clearly met.
- For large analyses with tens of thousands of observations, tiny effects, or very low alpha levels.

## What Makes Scenarios Different? (Be careful, unvalidated, preliminary scenarios)
## What Makes Scenarios Different? (Rule-of-thumb scenarios)

**Traditional power analysis assumes perfect conditions.** MCPower's scenarios add realistic "messiness":

Expand Down Expand Up @@ -478,8 +483,8 @@ model.set_variable_type("treatment=(factor,3), education=(factor,4)")
# Set effects for specific levels
model.set_effects("treatment[2]=0.5, treatment[3]=0.7, education[2]=0.3")

# Or set same effect for all levels of a factor
model.set_effects("treatment=0.5") # Applies to treatment[2] and treatment[3]
# Each non-reference level needs its own effect
model.set_effects("treatment[2]=0.5, treatment[3]=0.7")

# Important: Factors cannot be used in correlations
# This will error: model.set_correlations("corr(treatment, education)=0.3")
Expand Down Expand Up @@ -508,12 +513,31 @@ model.set_alpha(0.01) # Stricter significance (p < 0.01)
model.set_simulations(10000) # High precision (slower)
```

### Model Misspecification Testing

Use `test_formula` to generate data with one model but test with a simpler one -- useful for evaluating the power impact of omitting variables:

```python
# Generate with 3 predictors, test with 2 (omitting x3)
model = MCPower("y = x1 + x2 + x3")
model.set_effects("x1=0.5, x2=0.3, x3=0.2")
model.find_power(100, test_formula="y = x1 + x2")

# Generate with clusters, test without (ignoring clustering)
model = MCPower("y ~ treatment + (1|school)")
model.set_cluster("school", ICC=0.2, n_clusters=20)
model.set_effects("treatment=0.5")
model.find_power(1000, test_formula="y ~ treatment")
```

See the [Test Formula Tutorial](https://github.com/pawlenartowicz/MCPower/wiki/Tutorial-Test-Formula) for details.

### Formula Syntax
```python
# These are equivalent:
"y = x1 + x2 + x1*x2" # Assignment style
"y ~ x1 + x2 + x1*x2" # R-style formula
"x1 + x2 + x1*x2" # Predictors only
"y = x1 + x2 + x1:x2" # Assignment style
"y ~ x1 + x2 + x1:x2" # R-style formula
"x1 + x2 + x1:x2" # Predictors only

# Interactions:
"x1*x2" # Main effects + interaction (x1 + x2 + x1:x2)
Expand All @@ -538,9 +562,8 @@ model.set_correlations("(x1, x2)=0.3, (x1, x3)=-0.2")
## Requirements

- Python ≥ 3.10
- NumPy, matplotlib, joblib
- NumPy (≥1.26.0), matplotlib, joblib, tqdm
- pandas (optional, for DataFrame input — install with `pip install mcpower[pandas]`)
- statsmodels (optional, for mixed-effects models — install with `pip install mcpower[all]`)


## Documentation
Expand All @@ -549,11 +572,11 @@ Full documentation is available on the **[MCPower Wiki](https://github.com/pawle

- [Quick Start](https://github.com/pawlenartowicz/MCPower/wiki/Quick-Start)
- [Model Specification](https://github.com/pawlenartowicz/MCPower/wiki/Model-Specification)
- [Variable Types](https://github.com/pawlenartowicz/MCPower/wiki/Variable-Types)
- [Effect Sizes](https://github.com/pawlenartowicz/MCPower/wiki/Effect-Sizes)
- [Mixed-Effects Models](https://github.com/pawlenartowicz/MCPower/wiki/Mixed-Effects-Models) (random intercepts, slopes, nested effects)
- [ANOVA & Post-Hoc Tests](https://github.com/pawlenartowicz/MCPower/wiki/ANOVA-and-Post-Hoc-Tests)
- [Scenario Analysis](https://github.com/pawlenartowicz/MCPower/wiki/Scenario-Analysis)
- [Variable Types](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Variable-Types)
- [Effect Sizes](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Effect-Sizes)
- [Mixed-Effects Models](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Mixed-Effects) (random intercepts, slopes, nested effects)
- [ANOVA & Post-Hoc Tests](https://github.com/pawlenartowicz/MCPower/wiki/Tutorial-ANOVA-PostHoc)
- [Scenario Analysis](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Scenario-Analysis)
- [API Reference](https://github.com/pawlenartowicz/MCPower/wiki/API-Reference)

## Need Help?
Expand All @@ -568,8 +591,8 @@ Full documentation is available on the **[MCPower Wiki](https://github.com/pawle
- ✅ C++ native backend (pybind11 + Eigen, 3x speedup)
- ✅ Mixed Effects Models (random intercepts, random slopes, nested effects) — [validated against lme4](https://github.com/pawlenartowicz/MCPower/wiki/Concept-LME-Validation)
- 🚧 Logistic Regression (coming soon)
- 🚧 ANOVA (coming soon)
- 🚧 Guide about methods, corrections (coming soon)
- ANOVA (factor variables as ANOVA, post-hoc pairwise comparisons)
- Guide about methods, corrections
- 📋 2 groups comparison with alternative tests
- 📋 Robust regression methods

Expand All @@ -578,16 +601,18 @@ Full documentation is available on the **[MCPower Wiki](https://github.com/pawle

GPL v3. If you use MCPower in research, please cite:

Lenartowicz, P. (2025). MCPower: Monte Carlo Power Analysis for Statistical Models. Zenodo. DOI: 10.5281/zenodo.16502734
Lenartowicz, P. (2025). MCPower: Monte Carlo Power Analysis for Complex Statistical Models (Version <your version>) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.16502734

*Replace `<your version>` with the version you used — check with `import mcpower; print(mcpower.__version__)`.*

```bibtex
@software{mcpower2025,
author = {Pawel Lenartowicz},
title = {MCPower: Monte Carlo Power Analysis for Statistical Models},
year = {2025},
author = {Lenartowicz, Pawe{\l}},
title = {{MCPower}: Monte Carlo Power Analysis for Complex Statistical Models},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.16502734},
url = {https://doi.org/10.5281/zenodo.16502734}
doi = {10.5281/zenodo.16502734},
url = {https://doi.org/10.5281/zenodo.16502734}
}
```

Expand Down
11 changes: 8 additions & 3 deletions cpp/src/bindings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,9 @@ py::array_t<double> generate_y_wrapper(
py::array_t<double> effects,
double heterogeneity,
double heteroskedasticity,
int seed
int seed,
int residual_dist,
double residual_df
) {
auto X_buf = X.request();
auto effects_buf = effects.request();
Expand All @@ -129,7 +131,8 @@ py::array_t<double> generate_y_wrapper(
);

Eigen::VectorXd y = generate_y(
X_map, effects_map, heterogeneity, heteroskedasticity, seed
X_map, effects_map, heterogeneity, heteroskedasticity, seed,
residual_dist, residual_df
);

py::array_t<double> result(n);
Expand Down Expand Up @@ -447,7 +450,9 @@ PYBIND11_MODULE(mcpower_native, m) {
py::arg("heterogeneity") = 0.0,
py::arg("heteroskedasticity") = 0.0,
py::arg("seed") = -1,
"Generate dependent variable with heterogeneity and heteroskedasticity"
py::arg("residual_dist") = 0,
py::arg("residual_df") = 10.0,
"Generate dependent variable with heterogeneity, heteroskedasticity, and non-normal residuals"
);

// LME analysis (q=1 random intercept)
Expand Down
51 changes: 39 additions & 12 deletions cpp/src/ols.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,19 +151,14 @@ Eigen::VectorXd generate_y(
const Eigen::Ref<const Eigen::VectorXd>& effects,
double heterogeneity,
double heteroskedasticity,
int seed
int seed,
int residual_dist,
double residual_df
) {
const int n = static_cast<int>(X.rows());
const int p = static_cast<int>(X.cols());

// Set up random generator
std::mt19937 gen;
if (seed >= 0) {
gen.seed(static_cast<unsigned int>(seed));
} else {
std::random_device rd;
gen.seed(rd());
}
std::normal_distribution<double> normal(0.0, 1.0);

// Linear predictor with heterogeneity
Expand All @@ -176,9 +171,12 @@ Eigen::VectorXd generate_y(
// Heterogeneity: vary effect sizes per observation
linear_pred.setZero();

// Change seed for heterogeneity noise
// Seed at offset +1 for heterogeneity noise
if (seed >= 0) {
gen.seed(static_cast<unsigned int>(seed + 1));
} else {
std::random_device rd;
gen.seed(rd());
}

for (int j = 0; j < p; ++j) {
Expand All @@ -192,14 +190,43 @@ Eigen::VectorXd generate_y(
}
}

// Generate errors
// Generate errors — seed at offset +2
if (seed >= 0) {
gen.seed(static_cast<unsigned int>(seed + 2));
} else {
std::random_device rd;
gen.seed(rd());
}

Eigen::VectorXd error(n);
for (int i = 0; i < n; ++i) {
error(i) = normal(gen);

if (residual_dist == 1) {
// Heavy-tailed: Student's t distribution
double df = std::max(residual_df, 3.0);
std::student_t_distribution<double> t_dist(df);
double theoretical_scale = 1.0 / std::sqrt(df / (df - 2.0));
for (int i = 0; i < n; ++i) {
error(i) = t_dist(gen) * theoretical_scale;
}
} else if (residual_dist == 2) {
// Skewed: chi-squared, centered and scaled
double df = std::max(residual_df, 3.0);
std::chi_squared_distribution<double> chi2_dist(df);
double scale = 1.0 / std::sqrt(2.0 * df);
for (int i = 0; i < n; ++i) {
error(i) = (chi2_dist(gen) - df) * scale;
}
} else {
// Normal (default)
for (int i = 0; i < n; ++i) {
error(i) = normal(gen);
}
}

// Empirical re-standardization to SD = 1
double empirical_sd = std::sqrt(error.array().square().mean());
if (empirical_sd > FLOAT_NEAR_ZERO) {
error /= empirical_sd;
}

// Apply heteroskedasticity
Expand Down
Loading
Loading