diff --git a/CHANGELOG.md b/CHANGELOG.md
index b654c49..24d9439 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,76 @@
 
 All notable changes to this project will be documented in this file.
 
+## [0.6.0] - 2026-02-26
+
+### Breaking changes
+- **Removed `set_backend()`, `get_backend_info()`, `reset_backend()`** — only one backend (C++ native) exists since v0.5.0, so the multi-backend API was dead code. Use `from mcpower.backends import get_backend` if you need the backend instance directly
+- **Removed `set_heterogeneity()` and `set_heteroskedasticity()`** — heterogeneity and heteroskedasticity are now controlled exclusively through scenario configurations (`set_scenario_configs()`). The optimistic scenario uses zero perturbation; realistic/doomer scenarios apply these automatically
+- **Removed dead scipy fallback code** from `distributions.py` — scipy was never a runtime dependency since v0.5.0, so the fallback paths were unreachable dead code. The module now cleanly fails with an `ImportError` if the C++ native backend is missing
+- **`_create_power_plot()` returns `fig`** — the function now accepts a `show=True` parameter and always returns the matplotlib figure object. Set `show=False` to suppress `plt.show()` for programmatic use
+- **`apply()` made private (`_apply()`)** — the method is now `_apply()` and called automatically by `find_power()` / `find_sample_size()`. Direct calls should use `model._apply()` instead
+- **`[all]` extra no longer includes `statsmodels`** — use `pip install mcpower[lme]` to get statsmodels for mixed-effects models
+
+### Added
+- **`test_formula` parameter** on `find_power()` and `find_sample_size()` — test a reduced model against data generated from the full model to evaluate power under model misspecification. For example, generate data with `y = x1 + x2 + x3` but test with `test_formula="y ~ x1 + x2"` to see power when `x3` is omitted. Supports interactions, factors, and mixed models.
+- **C++ non-normal residual generation** — scenario perturbations now generate heavy-tailed (Student-t) and skewed (chi-squared) residuals directly in C++ via `residual_dist`/`residual_df` parameters in `generate_y()`, replacing the Python-side post-hoc perturbation approach. Applies to all model types (OLS and LME)
+- **`optimistic` scenario** is now a first-class entry in `DEFAULT_SCENARIO_CONFIG` with all-zero perturbation values, eliminating the special `scenario_config=None` code path. Custom scenarios inherit from the optimistic baseline, ensuring all required keys exist
+
+### Fixed
+- **`set_variable_type()` docstring listed wrong distribution types** — documented non-existent `"skewed"` type; now lists all supported types: `right_skewed`, `left_skewed`, `high_kurtosis`, `uniform`
+- **`set_scenario_configs()` docstring referenced non-existent keys** — `"effect_size_jitter"` and `"distribution_jitter"` replaced with actual keys (`correlation_noise_sd`, `distribution_change_prob`, etc.)
+- **String factor levels crash in LME variance computation** — `proportions[level - 1]` crashed when factor levels were strings (e.g. `"Japan"`). Now looks up level position in the label list
+- **Division by zero on constant-variance columns** — `upload_data()` normalization produced `inf`/`NaN` when a column had zero variance. Now raises `ValueError` with the column name
+- **Pending state not cleared after `_apply()`** — calling `_apply()` twice could re-apply the same effects. Pending fields are now reset after each `_apply()` call
+- **Parser crash on unbalanced parentheses** — unmatched `)` caused `paren_count` to go negative, producing silent misparses. Now raises `ValueError`
+- **Update checker wrote cache inside installed package** — moved cache file to `~/.cache/mcpower/update_cache.json`
+- **Update checker unbounded response read** — `response.read()` now limited to 1 MB
+- **`scenario_config` dict access on `None`** — added `None` guards for optional scenario configuration lookups
+- **NaN values in uploaded data** — `upload_data()` now rejects data containing NaN values with a clear error message listing affected columns
+- **Formula minus-sign silently dropped terms** — `y = x1 - x2` silently ignored `x2`. Now raises `ValueError` explaining that term removal with `-` is not supported
+- **`_create_table` crash on empty rows** — formatter now handles empty row lists by computing column widths from headers only
+- **`_create_power_plot` crash when `first_achieved` not in sample sizes** — added bounds check before `.index()` call
+- **Redundant `_validate_cluster_sample_size` call** — removed duplicate validation in `find_power()` (already called per-sample-size in `find_sample_size()`)
+
+### Changed
+- **`upload_data()` returns `self`** for method chaining consistency
+- **Assert statements replaced with `RuntimeError`** — internal assertions now raise proper exceptions instead of using `assert`
+- **Removed "(not yet implemented)" from mixed-model docstrings** — mixed model testing has been implemented since v0.4.2
+- **Thread-safe RNG in data generation** — replaced global `np.random.seed()` with local `np.random.RandomState()` for thread safety
+- **Update checker runs in a background thread** — no longer blocks `import mcpower` on slow networks
+- **Module-level deduplication for update checker** — prevents redundant version checks within the same Python session
+- **Removed unused `cluster_column_indices` parameter** from `_lme_analysis_wrapper()` and `_lme_analysis_statsmodels()` — was explicitly marked unused and kept only for API compatibility
+- **Scenario formatters iterate dynamically** — no longer hardcode scenario names, enabling custom scenario display
+
+### Packaging
+- **`tqdm` added as core dependency** (`>=4.60.0`) — used for progress bars
+- **Removed stale pytest warning filter** for `"Mixed-effects models are experimental"` (warning was removed in v0.5.4)
+- **NumPy minimum version relaxed** to `>=1.26.0` (was `>=2.0.0`) in both build-requires and runtime dependencies
+- **`scikit-build-core` bumped** to `>=0.10` (was `>=0.5`)
+- **`statsmodels` added to `[dev]` extras** for test/development convenience
+- **Documentation URL** now points to the GitHub wiki
+- **Changelog URL** added to project URLs
+- **Removed unused pytest markers** (`unit`, `integration`) — only `lme` marker remains
+- **Per-module mypy overrides** replace blanket `ignore_missing_imports`
+
+### Documentation
+- Updated README requirements section: added `tqdm`, specified `NumPy (>=1.26.0)`
+- Changed `pip install mcpower[all]` → `pip install mcpower[lme]` for statsmodels installation
+- Wiki documentation review and cleanup: fixed broken links, corrected API signatures (`set_scenario_configs` parameter name), removed stale `apply()` and `set_heterogeneity()` wiki pages, fixed formula redundancy in Model Specification, corrected Tukey return value docs, added mixed-model caveats
+
+### Technical
+- Removed ~150 lines of dead scipy fallback shims from `distributions.py`
+- Removed `_BACKEND` sentinel variable (only one backend exists)
+- C++ `generate_y()` now accepts `residual_dist` and `residual_df` parameters for non-normal error generation
+- `suppress_output` test fixture now actually suppresses stdout (was a no-op)
+- Removed unused `correlation_matrix_3x3` test fixture
+- Removed empty `tests/mcpower/` artifact directory
+- Added unit tests for `ResultsProcessor` (`test_results.py`)
+- Added unit tests for `normalize_upload_input` (`test_upload_data_utils.py`)
+- Added integration tests for `test_formula` feature (`test_test_formula.py`)
+- Added unit tests for `test_formula_utils` (`test_test_formula_utils.py`)
+- Rewrote optimizer tests to test native backend directly (removed dead scipy fallback tests)
+
 ## [0.5.4] - 2026-02-22
 
 ### Changed
diff --git a/README.md b/README.md
index a230021..2bd003a 100644
--- a/README.md
+++ b/README.md
@@ -21,6 +21,10 @@
 
 It's a Python package, but prefer a graphical interface? **[MCPower GUI](https://github.com/pawlenartowicz/mcpower-gui)** is a standalone desktop app — no Python installation required. Download ready-to-run executables for Windows, Linux, and macOS from the [releases page](https://github.com/pawlenartowicz/mcpower-gui/releases/latest).
 
+| Model setup | Results |
+|:---:|:---:|
+| <img src="https://raw.githubusercontent.com/pawlenartowicz/MCPower/main/docs/screenshots/gui-model-setup.png" alt="MCPower GUI — model setup" width="400"> | <img src="https://raw.githubusercontent.com/pawlenartowicz/MCPower/main/docs/screenshots/gui-results.png" alt="MCPower GUI — results" width="400"> |
+
 ## Why MCPower?
 
 Traditional power formulas break down with interactions, correlated predictors, categorical variables, or non-normal data. MCPower simulates instead — generates thousands of datasets like yours, fits your model, and counts how often the effects are detected.
@@ -297,19 +301,20 @@ model.set_effects("group[2]=0.4, group[3]=0.6, covariate=0.3")
 # Use "vs" syntax for pairwise comparisons + correction="tukey"
 model.find_power(
     sample_size=150,
-    target_test="group[0] vs group[1], group[0] vs group[2]",
+    target_test="group[1] vs group[2], group[1] vs group[3]",
     correction="tukey"
 )
 ```
 
 ### Test Individual Assumption Violations
 ```python
-# Manually add specific violations (without full scenario analysis)
-model.set_heterogeneity(0.2)        # Effect sizes vary between people
-model.set_heteroskedasticity(0.15)  # Violation of equal variance assumption
+# Add specific violations via custom scenario configs
+model.set_scenario_configs({
+    "my_test": {"heterogeneity": 0.2, "heteroskedasticity": 0.15}
+})
 
-# Run with your manual settings (no automatic scenario variations)
-model.find_sample_size(target_test="treatment")
+# Run with scenario variations
+model.find_sample_size(target_test="treatment", scenarios=True)
 ```
 
 ### Mixed-Effects Models
@@ -392,7 +397,7 @@ model.find_power(sample_size=200, progress_callback=False)
 | **Factor effects** | **`model.set_effects("var[2]=0.5, var[3]=0.7")`** |
 | Correlated predictors | `model.set_correlations("corr(var1, var2)=0.4")` |
 | Multiple testing correction | Add `correction="FDR"`, `"Holm"`, `"Bonferroni"`, or `"Tukey"`|
-| Post-hoc pairwise comparison | `target_test="group[0] vs group[1]"` with `correction="tukey"` |
+| Post-hoc pairwise comparison | `target_test="group[1] vs group[2]"` with `correction="tukey"` |
 | Mixed model (random intercept) | `MCPower("y ~ x + (1\|group)")` + `model.set_cluster(...)` |
 | Random slopes | `MCPower("y ~ x + (1+x\|group)")` + `set_cluster(..., random_slopes=["x"], slope_variance=0.1)` |
 | Nested random effects | `MCPower("y ~ x + (1\|A/B)")` + two `set_cluster()` calls |
@@ -424,7 +429,7 @@ model.find_power(sample_size=200, progress_callback=False)
 - For simple models where all assumptions are clearly met.
 - For large analyses with tens of thousands of observations, tiny effects, or very low alpha levels.
 
-## What Makes Scenarios Different? (Be careful, unvalidated, preliminary scenarios)
+## What Makes Scenarios Different? (Rule-of-thumb scenarios)
 
 **Traditional power analysis assumes perfect conditions.** MCPower's scenarios add realistic "messiness":
 
@@ -478,8 +483,8 @@ model.set_variable_type("treatment=(factor,3), education=(factor,4)")
 # Set effects for specific levels
 model.set_effects("treatment[2]=0.5, treatment[3]=0.7, education[2]=0.3")
 
-# Or set same effect for all levels of a factor
-model.set_effects("treatment=0.5")  # Applies to treatment[2] and treatment[3]
+# Each non-reference level needs its own effect
+model.set_effects("treatment[2]=0.5, treatment[3]=0.7")
 
 # Important: Factors cannot be used in correlations
 # This will error: model.set_correlations("corr(treatment, education)=0.3")
@@ -508,12 +513,31 @@ model.set_alpha(0.01)         # Stricter significance (p < 0.01)
 model.set_simulations(10000)  # High precision (slower)
 ```
 
+### Model Misspecification Testing
+
+Use `test_formula` to generate data with one model but test with a simpler one -- useful for evaluating the power impact of omitting variables:
+
+```python
+# Generate with 3 predictors, test with 2 (omitting x3)
+model = MCPower("y = x1 + x2 + x3")
+model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+model.find_power(100, test_formula="y = x1 + x2")
+
+# Generate with clusters, test without (ignoring clustering)
+model = MCPower("y ~ treatment + (1|school)")
+model.set_cluster("school", ICC=0.2, n_clusters=20)
+model.set_effects("treatment=0.5")
+model.find_power(1000, test_formula="y ~ treatment")
+```
+
+See the [Test Formula Tutorial](https://github.com/pawlenartowicz/MCPower/wiki/Tutorial-Test-Formula) for details.
+
 ### Formula Syntax
 ```python
 # These are equivalent:
-"y = x1 + x2 + x1*x2"        # Assignment style
-"y ~ x1 + x2 + x1*x2"        # R-style formula  
-"x1 + x2 + x1*x2"            # Predictors only
+"y = x1 + x2 + x1:x2"        # Assignment style
+"y ~ x1 + x2 + x1:x2"        # R-style formula
+"x1 + x2 + x1:x2"            # Predictors only
 
 # Interactions:
 "x1*x2"         # Main effects + interaction (x1 + x2 + x1:x2)
@@ -538,9 +562,8 @@ model.set_correlations("(x1, x2)=0.3, (x1, x3)=-0.2")
 ## Requirements
 
 - Python ≥ 3.10
-- NumPy, matplotlib, joblib
+- NumPy (≥1.26.0), matplotlib, joblib, tqdm
 - pandas (optional, for DataFrame input — install with `pip install mcpower[pandas]`)
-- statsmodels (optional, for mixed-effects models — install with `pip install mcpower[all]`)
 
 
 ## Documentation
@@ -549,11 +572,11 @@ Full documentation is available on the **[MCPower Wiki](https://github.com/pawle
 
 - [Quick Start](https://github.com/pawlenartowicz/MCPower/wiki/Quick-Start)
 - [Model Specification](https://github.com/pawlenartowicz/MCPower/wiki/Model-Specification)
-- [Variable Types](https://github.com/pawlenartowicz/MCPower/wiki/Variable-Types)
-- [Effect Sizes](https://github.com/pawlenartowicz/MCPower/wiki/Effect-Sizes)
-- [Mixed-Effects Models](https://github.com/pawlenartowicz/MCPower/wiki/Mixed-Effects-Models) (random intercepts, slopes, nested effects)
-- [ANOVA & Post-Hoc Tests](https://github.com/pawlenartowicz/MCPower/wiki/ANOVA-and-Post-Hoc-Tests)
-- [Scenario Analysis](https://github.com/pawlenartowicz/MCPower/wiki/Scenario-Analysis)
+- [Variable Types](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Variable-Types)
+- [Effect Sizes](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Effect-Sizes)
+- [Mixed-Effects Models](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Mixed-Effects) (random intercepts, slopes, nested effects)
+- [ANOVA & Post-Hoc Tests](https://github.com/pawlenartowicz/MCPower/wiki/Tutorial-ANOVA-PostHoc)
+- [Scenario Analysis](https://github.com/pawlenartowicz/MCPower/wiki/Concept-Scenario-Analysis)
 - [API Reference](https://github.com/pawlenartowicz/MCPower/wiki/API-Reference)
 
 ## Need Help?
@@ -568,8 +591,8 @@ Full documentation is available on the **[MCPower Wiki](https://github.com/pawle
 - ✅ C++ native backend (pybind11 + Eigen, 3x speedup)
 - ✅ Mixed Effects Models (random intercepts, random slopes, nested effects) — [validated against lme4](https://github.com/pawlenartowicz/MCPower/wiki/Concept-LME-Validation)
 - 🚧 Logistic Regression (coming soon)
-- 🚧 ANOVA (coming soon)
-- 🚧 Guide about methods, corrections (coming soon)
+- ✅ ANOVA (factor variables as ANOVA, post-hoc pairwise comparisons)
+- ✅ Guide about methods, corrections
 - 📋 2 groups comparison with alternative tests
 - 📋 Robust regression methods
 
@@ -578,16 +601,18 @@ Full documentation is available on the **[MCPower Wiki](https://github.com/pawle
 
 GPL v3. If you use MCPower in research, please cite:
 
-Lenartowicz, P. (2025). MCPower: Monte Carlo Power Analysis for Statistical Models. Zenodo. DOI: 10.5281/zenodo.16502734
+Lenartowicz, P. (2025). MCPower: Monte Carlo Power Analysis for Complex Statistical Models (Version <your version>) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.16502734
+
+*Replace `<your version>` with the version you used — check with `import mcpower; print(mcpower.__version__)`.*
 
 ```bibtex
 @software{mcpower2025,
-  author = {Pawel Lenartowicz},
-  title = {MCPower: Monte Carlo Power Analysis for Statistical Models},
-  year = {2025},
+  author    = {Lenartowicz, Pawe{\l}},
+  title     = {{MCPower}: Monte Carlo Power Analysis for Complex Statistical Models},
+  year      = {2025},
   publisher = {Zenodo},
-  doi = {10.5281/zenodo.16502734},
-  url = {https://doi.org/10.5281/zenodo.16502734}
+  doi       = {10.5281/zenodo.16502734},
+  url       = {https://doi.org/10.5281/zenodo.16502734}
 }
 ```
 
diff --git a/cpp/src/bindings.cpp b/cpp/src/bindings.cpp
index 26fee22..8c02998 100644
--- a/cpp/src/bindings.cpp
+++ b/cpp/src/bindings.cpp
@@ -110,7 +110,9 @@ py::array_t<double> generate_y_wrapper(
     py::array_t<double> effects,
     double heterogeneity,
     double heteroskedasticity,
-    int seed
+    int seed,
+    int residual_dist,
+    double residual_df
 ) {
     auto X_buf = X.request();
     auto effects_buf = effects.request();
@@ -129,7 +131,8 @@ py::array_t<double> generate_y_wrapper(
     );
 
     Eigen::VectorXd y = generate_y(
-        X_map, effects_map, heterogeneity, heteroskedasticity, seed
+        X_map, effects_map, heterogeneity, heteroskedasticity, seed,
+        residual_dist, residual_df
     );
 
     py::array_t<double> result(n);
@@ -447,7 +450,9 @@ PYBIND11_MODULE(mcpower_native, m) {
         py::arg("heterogeneity") = 0.0,
         py::arg("heteroskedasticity") = 0.0,
         py::arg("seed") = -1,
-        "Generate dependent variable with heterogeneity and heteroskedasticity"
+        py::arg("residual_dist") = 0,
+        py::arg("residual_df") = 10.0,
+        "Generate dependent variable with heterogeneity, heteroskedasticity, and non-normal residuals"
     );
 
     // LME analysis (q=1 random intercept)
diff --git a/cpp/src/ols.cpp b/cpp/src/ols.cpp
index 7d04ec2..11bd62f 100644
--- a/cpp/src/ols.cpp
+++ b/cpp/src/ols.cpp
@@ -151,19 +151,14 @@ Eigen::VectorXd generate_y(
     const Eigen::Ref<const Eigen::VectorXd>& effects,
     double heterogeneity,
     double heteroskedasticity,
-    int seed
+    int seed,
+    int residual_dist,
+    double residual_df
 ) {
     const int n = static_cast<int>(X.rows());
     const int p = static_cast<int>(X.cols());
 
-    // Set up random generator
     std::mt19937 gen;
-    if (seed >= 0) {
-        gen.seed(static_cast<unsigned int>(seed));
-    } else {
-        std::random_device rd;
-        gen.seed(rd());
-    }
     std::normal_distribution<double> normal(0.0, 1.0);
 
     // Linear predictor with heterogeneity
@@ -176,9 +171,12 @@ Eigen::VectorXd generate_y(
         // Heterogeneity: vary effect sizes per observation
         linear_pred.setZero();
 
-        // Change seed for heterogeneity noise
+        // Seed at offset +1 for heterogeneity noise
         if (seed >= 0) {
             gen.seed(static_cast<unsigned int>(seed + 1));
+        } else {
+            std::random_device rd;
+            gen.seed(rd());
         }
 
         for (int j = 0; j < p; ++j) {
@@ -192,14 +190,43 @@ Eigen::VectorXd generate_y(
         }
     }
 
-    // Generate errors
+    // Generate errors — seed at offset +2
     if (seed >= 0) {
         gen.seed(static_cast<unsigned int>(seed + 2));
+    } else {
+        std::random_device rd;
+        gen.seed(rd());
     }
 
     Eigen::VectorXd error(n);
-    for (int i = 0; i < n; ++i) {
-        error(i) = normal(gen);
+
+    if (residual_dist == 1) {
+        // Heavy-tailed: Student's t distribution
+        double df = std::max(residual_df, 3.0);
+        std::student_t_distribution<double> t_dist(df);
+        double theoretical_scale = 1.0 / std::sqrt(df / (df - 2.0));
+        for (int i = 0; i < n; ++i) {
+            error(i) = t_dist(gen) * theoretical_scale;
+        }
+    } else if (residual_dist == 2) {
+        // Skewed: chi-squared, centered and scaled
+        double df = std::max(residual_df, 3.0);
+        std::chi_squared_distribution<double> chi2_dist(df);
+        double scale = 1.0 / std::sqrt(2.0 * df);
+        for (int i = 0; i < n; ++i) {
+            error(i) = (chi2_dist(gen) - df) * scale;
+        }
+    } else {
+        // Normal (default)
+        for (int i = 0; i < n; ++i) {
+            error(i) = normal(gen);
+        }
+    }
+
+    // Empirical re-standardization to SD = 1
+    double empirical_sd = std::sqrt(error.array().square().mean());
+    if (empirical_sd > FLOAT_NEAR_ZERO) {
+        error /= empirical_sd;
     }
 
     // Apply heteroskedasticity
diff --git a/cpp/src/ols.hpp b/cpp/src/ols.hpp
index ad1f9b6..9e046eb 100644
--- a/cpp/src/ols.hpp
+++ b/cpp/src/ols.hpp
@@ -65,6 +65,8 @@ class OLSAnalyzer {
  * @param heterogeneity SD of effect size variation
  * @param heteroskedasticity Correlation between predictor and error variance
  * @param seed Random seed (-1 for random)
+ * @param residual_dist Error distribution: 0=normal, 1=heavy_tailed (t), 2=skewed (chi2)
+ * @param residual_df Degrees of freedom for non-normal residuals (min clamped to 3)
  * @return Response vector (n_samples,)
  */
 Eigen::VectorXd generate_y(
@@ -72,7 +74,9 @@ Eigen::VectorXd generate_y(
     const Eigen::Ref<const Eigen::VectorXd>& effects,
     double heterogeneity,
     double heteroskedasticity,
-    int seed
+    int seed,
+    int residual_dist = 0,
+    double residual_df = 10.0
 );
 
 }  // namespace mcpower
diff --git a/docs/screenshots/gui-model-setup.png b/docs/screenshots/gui-model-setup.png
new file mode 100644
index 0000000..7f87a53
Binary files /dev/null and b/docs/screenshots/gui-model-setup.png differ
diff --git a/docs/screenshots/gui-results.png b/docs/screenshots/gui-results.png
new file mode 100644
index 0000000..f84152d
Binary files /dev/null and b/docs/screenshots/gui-results.png differ
diff --git a/mcpower/__init__.py b/mcpower/__init__.py
index a52560c..675a4f3 100644
--- a/mcpower/__init__.py
+++ b/mcpower/__init__.py
@@ -16,7 +16,6 @@
 
 from importlib.metadata import version as _get_version
 
-from .backends import get_backend_info, set_backend
 from .model import MCPower
 from .progress import PrintReporter, ProgressReporter, SimulationCancelled, TqdmReporter
 
@@ -27,14 +26,14 @@
 __all__ = [
     "MCPower",
     "SimulationCancelled",
-    "set_backend",
-    "get_backend_info",
     "ProgressReporter",
     "PrintReporter",
     "TqdmReporter",
 ]
 
 
+import threading as _threading
+
 from .utils.updates import _check_for_updates
 
-_check_for_updates(__version__)
+_threading.Thread(target=_check_for_updates, args=(__version__,), daemon=True).start()
diff --git a/mcpower/backends/__init__.py b/mcpower/backends/__init__.py
index 7bb03f8..8b24a73 100644
--- a/mcpower/backends/__init__.py
+++ b/mcpower/backends/__init__.py
@@ -3,11 +3,9 @@
 
 This module provides a unified interface for compute backends.
 The only supported backend is native C++ (compiled via pybind11).
-
-Users can override via set_backend('c++' | 'default') or pass a ComputeBackend instance.
 """
 
-from typing import Optional, Protocol, Union, runtime_checkable
+from typing import Optional, Protocol, runtime_checkable
 
 import numpy as np
 
@@ -24,6 +22,7 @@ def ols_analysis(
         f_crit: float,
         t_crit: float,
         correction_t_crits: np.ndarray,
+        # correction_method encoding: 0=none, 1=Bonferroni, 2=FDR (BH), 3=Holm
         correction_method: int,
     ) -> np.ndarray:
         """Run OLS regression and return significance flags.
@@ -40,9 +39,15 @@ def generate_y(
         heterogeneity: float,
         heteroskedasticity: float,
         seed: int,
+        residual_dist: int = 0,
+        residual_df: float = 10.0,
     ) -> np.ndarray:
         """Generate the dependent variable ``y = X @ effects + error``.
 
+        Args:
+            residual_dist: Error distribution (0=normal, 1=heavy_tailed, 2=skewed).
+            residual_df: Degrees of freedom for non-normal residuals.
+
         Returns:
             1-D array of length ``n_samples``.
         """
@@ -88,12 +93,8 @@ def lme_analysis(
         ...
 
 
-# Valid backend names for set_backend()
-_BACKEND_NAMES = {"default", "c++"}
-
 # Global backend instance
 _backend_instance: Optional[ComputeBackend] = None
-_backend_forced = False
 
 
 def get_backend() -> ComputeBackend:
@@ -101,7 +102,7 @@ def get_backend() -> ComputeBackend:
     Get the active compute backend.
 
     On first call, instantiates the C++ native backend.
-    Subsequent calls return the cached instance unless reset_backend() is called.
+    Subsequent calls return the cached instance.
 
     Raises:
         ImportError: If the C++ extension is not compiled/installed.
@@ -117,64 +118,7 @@ def get_backend() -> ComputeBackend:
     return _backend_instance
 
 
-def set_backend(backend: Union[str, ComputeBackend]) -> None:
-    """
-    Set the compute backend.
-
-    Args:
-        backend: One of:
-            - 'default' -- use native C++ backend
-            - 'c++'     -- force native C++ backend
-            - A ComputeBackend instance
-
-    Raises:
-        ImportError: If the C++ backend is not available.
-        ValueError: If the string is not recognized.
-    """
-    global _backend_instance, _backend_forced
-
-    if isinstance(backend, str):
-        name = backend.lower().strip()
-        if name not in _BACKEND_NAMES:
-            raise ValueError(f"Unknown backend {backend!r}. Choose from: {', '.join(sorted(_BACKEND_NAMES))}")
-
-        from .native import NativeBackend
-
-        _backend_instance = NativeBackend()
-        _backend_forced = name != "default"
-    else:
-        _backend_instance = backend
-        _backend_forced = True
-
-
-def reset_backend() -> None:
-    """Reset backend to automatic selection."""
-    global _backend_instance, _backend_forced
-    _backend_instance = None
-    _backend_forced = False
-
-
-def get_backend_info() -> dict:
-    """
-    Get information about the current backend.
-
-    Returns:
-        Dictionary with backend name, type, and whether it was forced.
-    """
-    backend = get_backend()
-    name = type(backend).__name__
-    return {
-        "name": name,
-        "is_native": name == "NativeBackend",
-        "module": type(backend).__module__,
-        "forced": _backend_forced,
-    }
-
-
 __all__ = [
     "ComputeBackend",
     "get_backend",
-    "set_backend",
-    "reset_backend",
-    "get_backend_info",
 ]
diff --git a/mcpower/backends/native.py b/mcpower/backends/native.py
index acd7633..2338cfe 100644
--- a/mcpower/backends/native.py
+++ b/mcpower/backends/native.py
@@ -17,6 +17,11 @@
     mcpower_native = None
 
 
+def _prep(arr: np.ndarray, dtype=np.float64) -> np.ndarray:
+    """Ensure array is contiguous with the expected dtype for C++ interop."""
+    return np.ascontiguousarray(arr, dtype=dtype)
+
+
 class NativeBackend:
     """
     C++ compute backend using pybind11 bindings.
@@ -46,8 +51,8 @@ def _initialize_tables(self) -> None:
         t3_ppf = manager.load_t3_ppf_table()
 
         # Ensure correct dtypes
-        norm_cdf = np.ascontiguousarray(norm_cdf.astype(np.float64))
-        t3_ppf = np.ascontiguousarray(t3_ppf.astype(np.float64))
+        norm_cdf = _prep(norm_cdf)
+        t3_ppf = _prep(t3_ppf)
 
         # Initialize C++ tables (generation tables only)
         mcpower_native.init_tables(norm_cdf, t3_ppf)
@@ -77,10 +82,10 @@ def ols_analysis(
         Returns:
             Array: [f_sig, uncorrected..., corrected...]
         """
-        X = np.ascontiguousarray(X, dtype=np.float64)
-        y = np.ascontiguousarray(y, dtype=np.float64)
-        target_indices = np.ascontiguousarray(target_indices, dtype=np.int32)
-        correction_t_crits = np.ascontiguousarray(correction_t_crits, dtype=np.float64)
+        X = _prep(X)
+        y = _prep(y)
+        target_indices = _prep(target_indices, np.int32)
+        correction_t_crits = _prep(correction_t_crits)
 
         return mcpower_native.ols_analysis(X, y, target_indices, f_crit, t_crit, correction_t_crits, correction_method)  # type: ignore[no-any-return]
 
@@ -91,6 +96,8 @@ def generate_y(
         heterogeneity: float,
         heteroskedasticity: float,
         seed: int,
+        residual_dist: int = 0,
+        residual_df: float = 10.0,
     ) -> np.ndarray:
         """
         Generate dependent variable.
@@ -101,14 +108,16 @@ def generate_y(
             heterogeneity: Effect size variation SD
             heteroskedasticity: Error-predictor correlation
             seed: Random seed (-1 for random)
+            residual_dist: Error distribution (0=normal, 1=heavy_tailed, 2=skewed)
+            residual_df: Degrees of freedom for non-normal residuals
 
         Returns:
             Response vector (n_samples,)
         """
-        X = np.ascontiguousarray(X, dtype=np.float64)
-        effects = np.ascontiguousarray(effects, dtype=np.float64)
+        X = _prep(X)
+        effects = _prep(effects)
 
-        return mcpower_native.generate_y(X, effects, heterogeneity, heteroskedasticity, seed)  # type: ignore[no-any-return]
+        return mcpower_native.generate_y(X, effects, heterogeneity, heteroskedasticity, seed, residual_dist, residual_df)  # type: ignore[no-any-return]
 
     def generate_X(
         self,
@@ -137,11 +146,11 @@ def generate_X(
         Returns:
             Design matrix (n_samples, n_vars)
         """
-        correlation_matrix = np.ascontiguousarray(correlation_matrix, dtype=np.float64)
-        var_types = np.ascontiguousarray(var_types, dtype=np.int32)
-        var_params = np.ascontiguousarray(var_params, dtype=np.float64)
-        upload_normal = np.ascontiguousarray(upload_normal, dtype=np.float64)
-        upload_data = np.ascontiguousarray(upload_data, dtype=np.float64)
+        correlation_matrix = _prep(correlation_matrix)
+        var_types = _prep(var_types, np.int32)
+        var_params = _prep(var_params)
+        upload_normal = _prep(upload_normal)
+        upload_data = _prep(upload_data)
 
         return mcpower_native.generate_X(  # type: ignore[no-any-return]
             n_samples,
@@ -185,12 +194,15 @@ def lme_analysis(
         Returns:
             Array: [f_sig, uncorrected..., corrected..., wald_flag]
             or empty array on failure
+
+            wald_flag: 1.0 if the Wald test was used as fallback for the overall
+            significance test (instead of the likelihood ratio test), 0.0 otherwise.
         """
-        X = np.ascontiguousarray(X, dtype=np.float64)
-        y = np.ascontiguousarray(y, dtype=np.float64)
-        cluster_ids = np.ascontiguousarray(cluster_ids, dtype=np.int32)
-        target_indices = np.ascontiguousarray(target_indices, dtype=np.int32)
-        correction_z_crits = np.ascontiguousarray(correction_z_crits, dtype=np.float64)
+        X = _prep(X)
+        y = _prep(y)
+        cluster_ids = _prep(cluster_ids, np.int32)
+        target_indices = _prep(target_indices, np.int32)
+        correction_z_crits = _prep(correction_z_crits)
 
         return mcpower_native.lme_analysis(  # type: ignore[no-any-return]
             X,
@@ -240,14 +252,17 @@ def lme_analysis_general(
         Returns:
             Array: [f_sig, uncorrected..., corrected..., wald_flag]
             or empty array on failure
+
+            wald_flag: 1.0 if the Wald test was used as fallback for the overall
+            significance test (instead of the likelihood ratio test), 0.0 otherwise.
         """
-        X = np.ascontiguousarray(X, dtype=np.float64)
-        y = np.ascontiguousarray(y, dtype=np.float64)
-        Z = np.ascontiguousarray(Z, dtype=np.float64)
-        cluster_ids = np.ascontiguousarray(cluster_ids, dtype=np.int32)
-        target_indices = np.ascontiguousarray(target_indices, dtype=np.int32)
-        correction_z_crits = np.ascontiguousarray(correction_z_crits, dtype=np.float64)
-        warm_theta = np.ascontiguousarray(warm_theta, dtype=np.float64)
+        X = _prep(X)
+        y = _prep(y)
+        Z = _prep(Z)
+        cluster_ids = _prep(cluster_ids, np.int32)
+        target_indices = _prep(target_indices, np.int32)
+        correction_z_crits = _prep(correction_z_crits)
+        warm_theta = _prep(warm_theta)
 
         return mcpower_native.lme_analysis_general(  # type: ignore[no-any-return]
             X,
@@ -301,15 +316,18 @@ def lme_analysis_nested(
         Returns:
             Array: [f_sig, uncorrected..., corrected..., wald_flag]
             or empty array on failure
+
+            wald_flag: 1.0 if the Wald test was used as fallback for the overall
+            significance test (instead of the likelihood ratio test), 0.0 otherwise.
         """
-        X = np.ascontiguousarray(X, dtype=np.float64)
-        y = np.ascontiguousarray(y, dtype=np.float64)
-        parent_ids = np.ascontiguousarray(parent_ids, dtype=np.int32)
-        child_ids = np.ascontiguousarray(child_ids, dtype=np.int32)
-        child_to_parent = np.ascontiguousarray(child_to_parent, dtype=np.int32)
-        target_indices = np.ascontiguousarray(target_indices, dtype=np.int32)
-        correction_z_crits = np.ascontiguousarray(correction_z_crits, dtype=np.float64)
-        warm_theta = np.ascontiguousarray(warm_theta, dtype=np.float64)
+        X = _prep(X)
+        y = _prep(y)
+        parent_ids = _prep(parent_ids, np.int32)
+        child_ids = _prep(child_ids, np.int32)
+        child_to_parent = _prep(child_to_parent, np.int32)
+        target_indices = _prep(target_indices, np.int32)
+        correction_z_crits = _prep(correction_z_crits)
+        warm_theta = _prep(warm_theta)
 
         return mcpower_native.lme_analysis_nested(  # type: ignore[no-any-return]
             X,
diff --git a/mcpower/core/results.py b/mcpower/core/results.py
index 45b178a..dfe6b77 100644
--- a/mcpower/core/results.py
+++ b/mcpower/core/results.py
@@ -54,6 +54,7 @@ def calculate_powers(
         # Individual powers
         individual_powers = {}
         individual_powers_corrected = {}
+        non_overall_tests = [t for t in target_tests if t != "overall"]
 
         for test in target_tests:
             if test == "overall":
@@ -62,7 +63,6 @@ def calculate_powers(
                 individual_powers_corrected[test] = np.mean(results_corrected_array[:, 0]) * 100
             else:
                 # Find position among non-'overall' tests and add 1 for F-test offset
-                non_overall_tests = [t for t in target_tests if t != "overall"]
                 pos = non_overall_tests.index(test)
                 col_idx = pos + 1  # +1 because column 0 is F-test
                 individual_powers[test] = np.mean(results_array[:, col_idx]) * 100
diff --git a/mcpower/core/scenarios.py b/mcpower/core/scenarios.py
index 454f8e3..2d2dd01 100644
--- a/mcpower/core/scenarios.py
+++ b/mcpower/core/scenarios.py
@@ -13,35 +13,51 @@
 from ..utils.visualization import _create_power_plot
 
 # Default scenario configurations.
+# "optimistic" is the zero-perturbation baseline — also used as the default
+# scenario_config when scenarios=False and as a template for custom scenarios
+# (ensures all required keys exist).
 # "realistic" introduces moderate assumption violations; "doomer" introduces
 # severe violations. Each simulation iteration draws random perturbations
 # from these parameters (correlation noise, distribution swaps, etc.).
 DEFAULT_SCENARIO_CONFIG = {
+    "optimistic": {
+        "heterogeneity": 0.0,
+        "heteroskedasticity": 0.0,
+        "correlation_noise_sd": 0.0,
+        "distribution_change_prob": 0.0,
+        "new_distributions": ["right_skewed", "left_skewed", "uniform"],
+        # Mixed model perturbations (only consumed when cluster_specs present)
+        "random_effect_dist": "normal",
+        "random_effect_df": 5,
+        "icc_noise_sd": 0.0,
+        # Residual distribution perturbations (all model types)
+        "residual_dists": ["heavy_tailed", "skewed"],
+        "residual_change_prob": 0.0,
+        "residual_df": 10,
+    },
     "realistic": {
         "heterogeneity": 0.2,
-        "heteroskedasticity": 0.1,
-        "correlation_noise_sd": 0.2,
-        "distribution_change_prob": 0.3,
+        "heteroskedasticity": 0.15,
+        "correlation_noise_sd": 0.15,
+        "distribution_change_prob": 0.5,
         "new_distributions": ["right_skewed", "left_skewed", "uniform"],
-        # LME-specific keys (only consumed when cluster_specs present)
         "random_effect_dist": "heavy_tailed",
-        "random_effect_df": 5,
+        "random_effect_df": 10,
         "icc_noise_sd": 0.15,
-        "residual_dist": "heavy_tailed",
-        "residual_change_prob": 0.3,
-        "residual_df": 10,
+        "residual_dists": ["heavy_tailed", "skewed"],
+        "residual_change_prob": 0.5,
+        "residual_df": 8,
     },
     "doomer": {
         "heterogeneity": 0.4,
-        "heteroskedasticity": 0.2,
-        "correlation_noise_sd": 0.4,
-        "distribution_change_prob": 0.6,
+        "heteroskedasticity": 0.35,
+        "correlation_noise_sd": 0.30,
+        "distribution_change_prob": 0.8,
         "new_distributions": ["right_skewed", "left_skewed", "uniform"],
-        # LME-specific keys (only consumed when cluster_specs present)
         "random_effect_dist": "heavy_tailed",
-        "random_effect_df": 3,
+        "random_effect_df": 5,
         "icc_noise_sd": 0.30,
-        "residual_dist": "heavy_tailed",
+        "residual_dists": ["heavy_tailed", "skewed"],
         "residual_change_prob": 0.8,
         "residual_df": 5,
     },
@@ -111,15 +127,7 @@ def run_power_analysis(
         if progress is not None:
             progress.start()
 
-        # Optimistic (user's original settings)
-        results["optimistic"] = run_find_power_func(
-            sample_size=sample_size,
-            target_tests=target_tests,
-            correction=correction,
-            scenario_config=None,
-        )
-
-        # Realistic & Doomer scenarios
+        # Run all scenarios (optimistic is always present as zero-perturbation baseline)
         for scenario_name, config in self.configs.items():
             results[scenario_name] = run_find_power_func(
                 sample_size=sample_size,
@@ -175,15 +183,7 @@ def run_sample_size_analysis(
         if progress is not None:
             progress.start()
 
-        # Optimistic
-        results["optimistic"] = run_sample_size_func(
-            sample_sizes=sample_sizes,
-            target_tests=target_tests,
-            correction=correction,
-            scenario_config=None,
-        )
-
-        # Other scenarios
+        # Run all scenarios (optimistic is always present as zero-perturbation baseline)
         for scenario_name, config in self.configs.items():
             results[scenario_name] = run_sample_size_func(
                 sample_sizes=sample_sizes,
@@ -209,8 +209,9 @@ def run_sample_size_analysis(
     def _create_scenario_plots(self, results: Dict) -> None:
         """Create visualizations for scenario analysis."""
         scenarios = results["scenarios"]
-        scenario_names = ["optimistic", "realistic", "doomer"]
-        scenario_labels = ["Optimistic", "Realistic", "Doomer"]
+        # Derive scenario order from results: optimistic first, then config keys
+        scenario_names = ["optimistic"] + [k for k in scenarios if k != "optimistic"]
+        scenario_labels = [name.title() for name in scenario_names]
 
         first_scenario = scenarios.get("optimistic", {})
         if "results" not in first_scenario or "sample_sizes_tested" not in first_scenario["results"]:
@@ -286,7 +287,7 @@ def apply_lme_perturbations(
     if icc_noise_sd == 0.0 and re_dist == "normal":
         return None
 
-    rng = np.random.RandomState(sim_seed + 5000 if sim_seed is not None else None)
+    rng = np.random.RandomState(sim_seed + 6 if sim_seed is not None else None)
 
     # ICC jitter: multiplicative noise on tau_squared per grouping variable
     tau_squared_multipliers: Dict[str, float] = {}
@@ -304,70 +305,6 @@ def apply_lme_perturbations(
     }
 
 
-def apply_lme_residual_perturbations(
-    y: np.ndarray,
-    scenario_config: Dict,
-    sim_seed: Optional[int],
-) -> np.ndarray:
-    """Replace normal residuals with non-normal if coin flip succeeds.
-
-    For each simulation, independently flips a coin (probability
-    ``residual_change_prob``) to decide whether residuals are replaced.
-    If activated, reproduces the original N(0,1) errors via the known
-    seed, generates replacements from t(df) or shifted χ², and applies
-    the correction ``y += (new_error - original_error)``.
-
-    Args:
-        y: Dependent variable array (modified in-place).
-        scenario_config: Scenario parameters with residual keys.
-        sim_seed: Random seed for reproducibility.
-
-    Returns:
-        The (possibly modified) dependent variable array.
-    """
-    residual_dist = scenario_config.get("residual_dist", "normal")
-    residual_change_prob = scenario_config.get("residual_change_prob", 0.0)
-    residual_df = scenario_config.get("residual_df", 10)
-
-    if residual_dist == "normal" or residual_change_prob <= 0.0:
-        return y
-
-    rng = np.random.RandomState(sim_seed + 6000 if sim_seed is not None else None)
-
-    # Coin flip: should this simulation have non-normal residuals?
-    if rng.random() > residual_change_prob:
-        return y
-
-    n = len(y)
-
-    # Reproduce the original N(0,1) errors using the same seed as generate_y
-    # generate_y uses sim_seed + 2 for error generation
-    original_rng = np.random.RandomState(sim_seed + 2 if sim_seed is not None else None)
-    original_errors = original_rng.standard_normal(n)
-
-    # Generate replacement errors
-    replacement_rng = np.random.RandomState(sim_seed + 6001 if sim_seed is not None else None)
-
-    if residual_dist == "heavy_tailed":
-        # t(df) scaled to have variance 1
-        df = max(residual_df, 3)
-        raw = replacement_rng.standard_t(df, size=n)
-        # t(df) has variance df/(df-2), scale to unit variance
-        scale = 1.0 / np.sqrt(df / (df - 2))
-        new_errors = raw * scale
-    elif residual_dist == "skewed":
-        # Shifted chi-squared: mean=0, variance=1
-        df = max(residual_df, 3)
-        raw = replacement_rng.chisquare(df, size=n)
-        new_errors = (raw - df) / np.sqrt(2 * df)
-    else:
-        return y
-
-    # Apply correction: swap out original errors for new ones
-    y = y + (new_errors - original_errors)
-    return y
-
-
 def apply_per_simulation_perturbations(
     correlation_matrix: np.ndarray,
     var_types: np.ndarray,
@@ -393,19 +330,22 @@ def apply_per_simulation_perturbations(
     if scenario_config is None:
         return correlation_matrix, var_types
 
-    rng = np.random.RandomState(sim_seed)
+    rng = np.random.RandomState(sim_seed + 5 if sim_seed is not None else None)
 
     # Perturb correlation matrix
     perturbed_corr = correlation_matrix
-    if correlation_matrix is not None and scenario_config["correlation_noise_sd"] > 0:
+    if correlation_matrix is not None and scenario_config.get("correlation_noise_sd", 0) > 0:
         perturbed_corr = correlation_matrix.copy()
         noise = rng.normal(0, scenario_config["correlation_noise_sd"], correlation_matrix.shape)
         noise = (noise + noise.T) / 2  # Keep symmetric
         perturbed_corr += noise
+        # Clip off-diagonal correlations to [-0.8, 0.8] to prevent near-singular
+        # matrices that cause Cholesky decomposition failures in data generation.
         perturbed_corr = np.clip(perturbed_corr, -0.8, 0.8)
         np.fill_diagonal(perturbed_corr, 1.0)
 
-        # Ensure positive semi-definiteness via eigenvalue clipping
+        # Nearest correlation matrix repair via spectral clipping: set negative
+        # eigenvalues to zero and reconstruct, then re-normalize to unit diagonal.
         eigvals, eigvecs = np.linalg.eigh(perturbed_corr)
         if np.any(eigvals < 0):
             eigvals = np.maximum(eigvals, 0.0)
@@ -417,7 +357,7 @@ def apply_per_simulation_perturbations(
 
     # Perturb variable types
     perturbed_var_types = var_types.copy()
-    if scenario_config["distribution_change_prob"] > 0:
+    if scenario_config.get("distribution_change_prob", 0) > 0:
         type_mapping = {"right_skewed": 2, "left_skewed": 3, "uniform": 5}
         new_type_codes = [type_mapping[distribution] for distribution in scenario_config["new_distributions"]]
 
diff --git a/mcpower/core/simulation.py b/mcpower/core/simulation.py
index 266a39e..2223324 100644
--- a/mcpower/core/simulation.py
+++ b/mcpower/core/simulation.py
@@ -61,7 +61,7 @@ def __init__(
         Args:
             n_simulations: Number of Monte Carlo iterations.
             seed: Base random seed. Each iteration uses
-                ``seed + 4 * sim_id``.
+                ``seed + 12 * sim_id``.
             alpha: Significance level for hypothesis tests.
             parallel: Parallel processing mode (unused inside the
                 runner itself; parallelism is handled at the
@@ -143,12 +143,19 @@ def run_power_simulations(
         if metadata.cluster_specs:
             from ..stats.lme_solver import compute_lme_critical_values
 
-            n_fixed = len(metadata.target_indices)
-            # n_fixed_effects = number of columns in X_expanded (excluding intercept)
-            # This equals the total effect count minus cluster effects
-            n_fixed_total = len(metadata.effect_sizes)
-            if metadata.cluster_effect_indices:
-                n_fixed_total -= len(metadata.cluster_effect_indices)
+            # Use test formula dimensions when subsetting with random effects
+            if metadata.test_column_indices is not None and metadata.test_has_random_effects:
+                if metadata.test_target_indices is None:
+                    raise RuntimeError("test_target_indices must be set when test_column_indices is present")
+                n_fixed = len(metadata.test_target_indices)
+                n_fixed_total = metadata.test_effect_count
+            else:
+                n_fixed = len(metadata.target_indices)
+                # n_fixed_effects = number of columns in X_expanded (excluding intercept)
+                # This equals the total effect count minus cluster effects
+                n_fixed_total = len(metadata.effect_sizes)
+                if metadata.cluster_effect_indices:
+                    n_fixed_total -= len(metadata.cluster_effect_indices)
             chi2_crit, z_crit, correction_z_crits = compute_lme_critical_values(
                 self.alpha, n_fixed_total, n_fixed, metadata.correction_method
             )
@@ -162,19 +169,18 @@ def run_power_simulations(
 
                 raise SimulationCancelled("Simulation cancelled by user")
 
-            sim_seed = self.seed + 4 * sim_id if self.seed is not None else None
-
-            # Apply perturbations if in scenario mode
-            if scenario_config is not None and apply_perturbations_func is not None:
-                perturbed_corr, perturbed_types = apply_perturbations_func(
-                    metadata.correlation_matrix,
-                    metadata.var_types,
-                    scenario_config,
-                    sim_seed,
-                )
-            else:
-                perturbed_corr = metadata.correlation_matrix
-                perturbed_types = metadata.var_types
+            sim_seed = self.seed + 12 * sim_id if self.seed is not None else None
+
+            # Apply per-simulation perturbations (correlation noise, distribution swaps)
+            # Zero-valued params in optimistic scenario are no-ops
+            if apply_perturbations_func is None:
+                raise RuntimeError("apply_perturbations_func must be provided")
+            perturbed_corr, perturbed_types = apply_perturbations_func(
+                metadata.correlation_matrix,
+                metadata.var_types,
+                scenario_config,
+                sim_seed,
+            )
 
             result = self._single_simulation(
                 sim_id=sim_id,
@@ -326,7 +332,9 @@ def _single_simulation(
                 first_spec = next(iter(metadata.cluster_specs.values()))
                 sample_size = first_spec["n_clusters"] * first_spec["cluster_size"]
 
-            # Check if strict mode with uploaded data
+            # Strict-mode bootstrap: resample whole rows from uploaded data to
+            # preserve exact inter-variable relationships, then generate y from
+            # the bootstrapped X. This bypasses the normal X-generation pipeline.
             if metadata.preserve_correlation == "strict" and metadata.uploaded_raw_data is not None:
                 # Strict mode: bootstrap uploaded data + generate created variables separately
                 from ..stats.data_generation import bootstrap_uploaded_data
@@ -336,7 +344,7 @@ def _single_simulation(
                     sample_size,
                     metadata.uploaded_raw_data,
                     metadata.uploaded_var_metadata,
-                    sim_seed,
+                    sim_seed + 3 if sim_seed is not None else None,
                 )
 
                 # Merge uploaded and created non-factor variables
@@ -367,7 +375,7 @@ def _single_simulation(
                     X_factors = X_uploaded_factors
                 else:
                     # Mixed: generate all factors, replace uploaded factor columns
-                    X_factors = _generate_factors(sample_size, metadata.factor_specs, sim_seed)
+                    X_factors = _generate_factors(sample_size, metadata.factor_specs, sim_seed + 3 if sim_seed is not None else None)
                     # Overwrite uploaded factor dummy columns with bootstrapped data
                     if X_uploaded_factors.shape[1] > 0:
                         col_offset = 0
@@ -400,14 +408,14 @@ def _single_simulation(
                     X_non_factors = np.empty((sample_size, 0), dtype=float)
 
                 # Generate factor variables (as dummy variables)
-                X_factors = _generate_factors(sample_size, metadata.factor_specs, sim_seed)
+                X_factors = _generate_factors(sample_size, metadata.factor_specs, sim_seed + 3 if sim_seed is not None else None)
 
             # Compute LME perturbations (ICC jitter, non-normal RE dist)
             lme_perturbations = None
-            if metadata.cluster_specs and scenario_config is not None:
+            if metadata.cluster_specs:
                 from ..core.scenarios import apply_lme_perturbations
 
-                lme_perturbations = apply_lme_perturbations(metadata.cluster_specs, scenario_config, sim_seed)
+                lme_perturbations = apply_lme_perturbations(metadata.cluster_specs, scenario_config or {}, sim_seed)
 
             # Generate cluster random effects (independent of upload mode)
             re_result = None  # Phase 2: random effects result for slopes/nesting
@@ -448,6 +456,16 @@ def _single_simulation(
             # Create extended design matrix with interactions (excludes cluster effects)
             X_expanded = create_X_extended_func(X)
 
+            # Test formula column subsetting: use reduced design matrix for analysis
+            if metadata.test_column_indices is not None:
+                X_test = X_expanded[:, metadata.test_column_indices]
+                if metadata.test_target_indices is None:
+                    raise RuntimeError("test_target_indices must be set when test_column_indices is present")
+                test_target_indices = metadata.test_target_indices
+            else:
+                X_test = X_expanded
+                test_target_indices = metadata.target_indices
+
             # Split effect sizes: fixed effects vs cluster effects
             # Use precomputed values (Phase 2 optimization)
             if metadata.cluster_effect_indices:
@@ -457,6 +475,21 @@ def _single_simulation(
                 fixed_effect_sizes = metadata.fixed_effect_sizes_cached
                 cluster_effect_sizes = None
 
+            # Residual coin flip: decide whether this simulation uses non-normal errors
+            residual_dist = 0  # normal
+            residual_df = 10.0
+            residual_change_prob = scenario_config.get("residual_change_prob", 0.0) if scenario_config else 0.0
+            if residual_change_prob > 0:
+                if scenario_config is None:
+                    raise RuntimeError("scenario_config must be provided when residual_change_prob > 0")
+                coin_rng = np.random.RandomState(sim_seed + 7 if sim_seed is not None else None)
+                if coin_rng.random() < residual_change_prob:
+                    residual_dists = scenario_config.get("residual_dists", ["heavy_tailed", "skewed"])
+                    picked = coin_rng.choice(residual_dists)
+                    dist_map = {"heavy_tailed": 1, "skewed": 2}
+                    residual_dist = dist_map.get(picked, 0)
+                    residual_df = float(scenario_config.get("residual_df", 10))
+
             # Generate dependent variable with fixed effects only
             y = generate_y_func(
                 X_expanded=X_expanded,
@@ -464,6 +497,8 @@ def _single_simulation(
                 heterogeneity=metadata.heterogeneity,
                 heteroskedasticity=metadata.heteroskedasticity,
                 sim_seed=sim_seed,
+                residual_dist=residual_dist,
+                residual_df=residual_df,
             )
 
             # Add cluster random effects contribution
@@ -478,12 +513,6 @@ def _single_simulation(
             if re_result is not None and not np.allclose(re_result.slope_contribution, 0):
                 y = y + re_result.slope_contribution
 
-            # Apply LME residual perturbations (non-normal residuals)
-            if metadata.cluster_specs and scenario_config is not None:
-                from ..core.scenarios import apply_lme_residual_perturbations
-
-                y = apply_lme_residual_perturbations(y, scenario_config, sim_seed)
-
             # Determine cluster IDs for the solver
             cluster_ids: Optional[np.ndarray]
             if re_result is not None:
@@ -496,23 +525,25 @@ def _single_simulation(
                 cluster_ids = metadata.cluster_ids_template
 
             # Route to correct analysis method
-            if cluster_ids is not None:
+            # When test_formula specifies no random effects, use OLS even if generation has clusters
+            use_lme = cluster_ids is not None and not (metadata.test_column_indices is not None and not metadata.test_has_random_effects)
+            if use_lme:
                 # Mixed model path (LME)
                 from ..stats.mixed_models import _lme_analysis_wrapper
 
+                assert cluster_ids is not None  # narrowed by use_lme guard above
                 lme_result = _lme_analysis_wrapper(
-                    X_expanded,
+                    X_test,
                     y,
-                    metadata.target_indices,
+                    test_target_indices,
                     cluster_ids,
-                    metadata.cluster_column_indices,
                     metadata.correction_method,
                     self.alpha,
                     backend="custom",
                     verbose=metadata.verbose,
-                    chi2_crit=getattr(metadata, "lme_chi2_crit", None),
-                    z_crit=getattr(metadata, "lme_z_crit", None),
-                    correction_z_crits=getattr(metadata, "lme_correction_z_crits", None),
+                    chi2_crit=metadata.lme_chi2_crit,
+                    z_crit=metadata.lme_z_crit,
+                    correction_z_crits=metadata.lme_correction_z_crits,
                     re_result=re_result,
                 )
 
@@ -539,16 +570,20 @@ def _single_simulation(
             else:
                 # Standard OLS path
                 results = analyze_func(
-                    X_expanded,
+                    X_test,
                     y,
-                    metadata.target_indices,
+                    test_target_indices,
                     self.alpha,
                     metadata.correction_method,
                 )
                 diagnostics = None
 
-            # Extract results: [f_sig, uncorr..., corr..., (wald_flag)]
-            n_targets = len(metadata.target_indices)
+            # Result array layout: [F_sig, uncorrected[n_targets], corrected[n_targets], wald_flag?]
+            # - F_sig (index 0): overall model F-test significance (1.0 or 0.0)
+            # - uncorrected[1..n]: per-target t-test significance without correction
+            # - corrected[n+1..2n]: per-target significance with multiple-comparison correction
+            # - wald_flag (optional, LME only): 1.0 if Wald test was used instead of LRT
+            n_targets = len(test_target_indices)
             f_significant = bool(results[0])
             uncorrected = results[1 : 1 + n_targets].astype(bool)
             corrected = results[1 + n_targets : 1 + 2 * n_targets].astype(bool)
@@ -560,19 +595,19 @@ def _single_simulation(
                 wald_flag = bool(results[expected_len])
 
             # Post-hoc pairwise contrasts (OLS path only)
-            if metadata.posthoc_specs and cluster_ids is None:
+            if metadata.posthoc_specs and not use_lme:
                 from ..stats.ols import compute_posthoc_contrasts
 
                 ph_uncorr, ph_corr, regular_override = compute_posthoc_contrasts(
-                    X_expanded,
+                    X_test,
                     y,
                     metadata.posthoc_specs,
                     metadata.posthoc_method,
                     metadata.posthoc_t_crit,
                     metadata.posthoc_tukey_crits,
-                    target_indices=metadata.target_indices,
+                    target_indices=test_target_indices,
                     correction_method=metadata.correction_method,
-                    correction_t_crits_combined=getattr(metadata, "posthoc_correction_t_crits_combined", None),
+                    correction_t_crits_combined=metadata.posthoc_correction_t_crits_combined,
                 )
 
                 # If FDR/Holm combined correction was applied, override regular corrected
@@ -645,7 +680,7 @@ class SimulationMetadata:
         correction_method: Encoded multiple-comparison correction
             (0=none, 1=Bonferroni, 2=BH, 3=Holm).
         heterogeneity: SD of random effect-size multiplier.
-        heteroskedasticity: Correlation between first predictor and error SD.
+        heteroskedasticity: Correlation between predicted values and error SD.
         preserve_correlation: Upload correlation mode
             (``"no"``/``"partial"``/``"strict"``).
         uploaded_raw_data: Normalised raw data for strict-mode bootstrap.
@@ -728,6 +763,13 @@ def __init__(
         self.posthoc_method: str = "t-test"
         self.posthoc_tukey_crits: Dict[str, float] = {}
         self.posthoc_t_crit: float = 0.0
+        self.posthoc_correction_t_crits_combined: Optional[np.ndarray] = None
+
+        # Test formula fields (for model misspecification testing)
+        self.test_column_indices: Optional[np.ndarray] = None
+        self.test_target_indices: Optional[np.ndarray] = None
+        self.test_effect_count: Optional[int] = None  # p for critical value computation
+        self.test_has_random_effects: bool = False  # Whether test formula has (1|group) etc.
 
 
 def _compute_fixed_effect_variance(registry) -> float:
@@ -779,8 +821,13 @@ def _compute_fixed_effect_variance(registry) -> float:
         factor_info = registry._factors[factor_name]
         proportions = factor_info.get("proportions")
         if proportions is not None:
-            # level is 1-indexed; proportions list is 0-indexed
-            p_k = proportions[level - 1]
+            level_labels = factor_info.get("level_labels")
+            if level_labels is not None:
+                # String level labels — look up position by label
+                p_k = proportions[level_labels.index(str(level))]
+            else:
+                # Integer levels are 1-indexed; proportions list is 0-indexed
+                p_k = proportions[level - 1]
         else:
             # Equal proportions (default)
             n_levels = factor_info["n_levels"]
@@ -814,6 +861,7 @@ def prepare_metadata(
     model,
     target_tests: List[str],
     correction: Optional[str] = None,
+    test_formula_effects: Optional[List[str]] = None,
 ) -> SimulationMetadata:
     """
     Prepare simulation metadata from model state.
@@ -825,6 +873,9 @@ def prepare_metadata(
         model: MCPowerModel instance
         target_tests: List of effects to test
         correction: Multiple comparison correction method
+        test_formula_effects: Optional list of effect names from a test
+            formula.  When provided, the metadata will include column
+            indices for subsetting X_expanded to the test model.
 
     Returns:
         SimulationMetadata instance
@@ -960,8 +1011,6 @@ def prepare_metadata(
         upload_data_values=model.upload_data_values if model.upload_data_values is not None else np.zeros((2, 2), dtype=np.float64),
         effect_sizes=effect_sizes,
         correction_method=correction_method,
-        heterogeneity=model.heterogeneity,
-        heteroskedasticity=model.heteroskedasticity,
         preserve_correlation=model._preserve_correlation,
         uploaded_raw_data=model._uploaded_raw_data,
         uploaded_var_metadata=model._uploaded_var_metadata,
@@ -982,4 +1031,20 @@ def prepare_metadata(
         metadata.posthoc_specs = model._posthoc_specs
         metadata.posthoc_method = "tukey" if is_tukey_correction else "t-test"
 
+    # Test formula column subsetting
+    if test_formula_effects is not None:
+        from ..utils.test_formula_utils import _compute_test_column_indices, _remap_target_indices
+
+        # Get all non-cluster effect names in registry order
+        all_effect_names = [name for name in registry._effects if name not in registry.cluster_effect_names]
+
+        test_col_indices = _compute_test_column_indices(all_effect_names, test_formula_effects)
+        metadata.test_column_indices = test_col_indices
+        metadata.test_effect_count = len(test_col_indices)
+
+        # Remap target indices to X_test space
+        # Only remap targets that exist in the test formula
+        valid_targets = np.array([idx for idx in target_indices if idx in test_col_indices], dtype=np.int64)
+        metadata.test_target_indices = _remap_target_indices(valid_targets, test_col_indices)
+
     return metadata
diff --git a/mcpower/core/variables.py b/mcpower/core/variables.py
index e311606..338d40b 100644
--- a/mcpower/core/variables.py
+++ b/mcpower/core/variables.py
@@ -367,76 +367,42 @@ def expand_factors(self) -> None:
             level_labels = factor_info.get("level_labels")
             reference_level = factor_info.get("reference_level", 1)
 
+            # Compute non-reference levels once
             if level_labels is not None:
-                # Named levels: skip the reference, create dummies for the rest
-                non_ref_labels = [lb for lb in level_labels if lb != str(reference_level)]
-                for label in non_ref_labels:
-                    dummy_name = f"{factor_name}[{label}]"
-
-                    # Create dummy predictor
-                    dummy_pred = PredictorVar(
-                        name=dummy_name,
-                        var_type="factor_dummy",
-                        is_dummy=True,
-                        factor_source=factor_name,
-                        factor_level=label,
-                        column_index=col_idx,
-                        level_labels=level_labels,
-                    )
-                    new_predictors[dummy_name] = dummy_pred
-
-                    # Create main effect for dummy
-                    dummy_eff = Effect(
-                        name=dummy_name,
-                        effect_type="main",
-                        var_names=[dummy_name],
-                        column_index=col_idx,
-                        factor_source=factor_name,
-                        factor_level=label,
-                    )
-                    new_effects[dummy_name] = dummy_eff
-
-                    # Store dummy mapping
-                    self._factor_dummies[dummy_name] = {
-                        "factor_name": factor_name,
-                        "level": label,
-                    }
-
-                    col_idx += 1
+                non_ref = [lb for lb in level_labels if lb != str(reference_level)]
             else:
-                # Original integer-indexed behavior
-                for level in range(2, n_levels + 1):
-                    dummy_name = f"{factor_name}[{level}]"
-
-                    # Create dummy predictor
-                    dummy_pred = PredictorVar(
-                        name=dummy_name,
-                        var_type="factor_dummy",
-                        is_dummy=True,
-                        factor_source=factor_name,
-                        factor_level=level,
-                        column_index=col_idx,
-                    )
-                    new_predictors[dummy_name] = dummy_pred
-
-                    # Create main effect for dummy
-                    dummy_eff = Effect(
-                        name=dummy_name,
-                        effect_type="main",
-                        var_names=[dummy_name],
-                        column_index=col_idx,
-                        factor_source=factor_name,
-                        factor_level=level,
-                    )
-                    new_effects[dummy_name] = dummy_eff
+                non_ref = list(range(2, n_levels + 1))
+
+            for level in non_ref:
+                dummy_name = f"{factor_name}[{level}]"
+
+                dummy_pred = PredictorVar(
+                    name=dummy_name,
+                    var_type="factor_dummy",
+                    is_dummy=True,
+                    factor_source=factor_name,
+                    factor_level=level,
+                    column_index=col_idx,
+                    level_labels=level_labels if level_labels is not None else None,
+                )
+                new_predictors[dummy_name] = dummy_pred
+
+                dummy_eff = Effect(
+                    name=dummy_name,
+                    effect_type="main",
+                    var_names=[dummy_name],
+                    column_index=col_idx,
+                    factor_source=factor_name,
+                    factor_level=level,
+                )
+                new_effects[dummy_name] = dummy_eff
 
-                    # Store dummy mapping
-                    self._factor_dummies[dummy_name] = {
-                        "factor_name": factor_name,
-                        "level": level,
-                    }
+                self._factor_dummies[dummy_name] = {
+                    "factor_name": factor_name,
+                    "level": level,
+                }
 
-                    col_idx += 1
+                col_idx += 1
 
         # Handle interactions involving factors — Cartesian product of
         # non-reference dummy levels across all factor components.
@@ -503,6 +469,13 @@ def get_effect_sizes(self) -> np.ndarray:
 
     def get_var_types(self) -> np.ndarray:
         """Get variable types as numpy array (for data generation)."""
+        # Type codes: 0-5 are parametric distributions generated from scratch.
+        # 97/98/99 are sentinel codes for uploaded-data variables whose values
+        # come from bootstrapped/quantile-matched empirical data rather than
+        # parametric generation:
+        #   97 = uploaded_factor  (factor from uploaded data)
+        #   98 = uploaded_binary  (binary from uploaded data)
+        #   99 = uploaded_data    (continuous from uploaded data)
         type_mapping = {
             "normal": 0,
             "binary": 1,
@@ -717,28 +690,20 @@ def register_cluster(
 
     def _reindex_predictors(self) -> None:
         """Reindex all predictors to maintain order: non_factor | cluster_effect | dummies."""
-        col_idx = 0
+        non_factor = []
+        cluster = []
+        dummies = []
 
-        # Non-factor predictors first
-        for name in sorted(self._predictors.keys(), key=lambda x: self._predictors[x].column_index or 0):
-            pred = self._predictors[name]
-            if not pred.is_factor and not pred.is_dummy and pred.var_type != "cluster_effect":
-                pred.column_index = col_idx
-                col_idx += 1
-
-        # Cluster effect predictors second
-        for name in sorted(self._predictors.keys(), key=lambda x: self._predictors[x].column_index or 0):
-            pred = self._predictors[name]
-            if pred.var_type == "cluster_effect":
-                pred.column_index = col_idx
-                col_idx += 1
-
-        # Factor dummies last
         for name in sorted(self._predictors.keys(), key=lambda x: self._predictors[x].column_index or 0):
             pred = self._predictors[name]
             if pred.is_dummy:
-                pred.column_index = col_idx
-                col_idx += 1
+                dummies.append(pred)
+            elif pred.var_type == "cluster_effect":
+                cluster.append(pred)
+            elif not pred.is_factor:
+                non_factor.append(pred)
+
+        for col_idx, pred in enumerate(non_factor + cluster + dummies):
+            pred.column_index = col_idx
 
-        # Update effect indices
         self._update_effect_indices()
diff --git a/mcpower/model.py b/mcpower/model.py
index 4fa2c4b..9a5f813 100644
--- a/mcpower/model.py
+++ b/mcpower/model.py
@@ -123,10 +123,9 @@ def __init__(self, data_generation_formula: str):
         self._pending_factor_levels: Optional[str] = None
         self._pending_effects: Optional[str] = None
         self._pending_correlations: Optional[Union[str, np.ndarray]] = None
-        self._pending_heterogeneity: Optional[float] = None
-        self._pending_heteroskedasticity: Optional[float] = None
         self._pending_data: Optional[Dict[str, Any]] = None
         self._pending_clusters: Dict[str, Dict] = {}  # {grouping_var: {n_clusters, cluster_size, icc}}
+        self._effects_set: bool = False  # True after set_effects() has been called
 
         # Detect mixed model formula
         if self._registry._random_effects_parsed:
@@ -134,8 +133,6 @@ def __init__(self, data_generation_formula: str):
 
         # Applied state
         self._applied = False
-        self.heterogeneity = 0.0
-        self.heteroskedasticity = 0.0
 
         # Data storage
         self.upload_normal_values: Optional[np.ndarray] = None
@@ -385,6 +382,7 @@ def set_effects(self, effects_string: str):
             raise ValueError("effects_string cannot be empty")
 
         self._pending_effects = effects_string
+        self._effects_set = True
         self._applied = False
         return self
 
@@ -432,13 +430,16 @@ def set_variable_type(self, variable_types_string: str):
                 - ``"normal"`` — standard normal (default).
                 - ``"binary"`` or ``"binary(p)"`` — Bernoulli with proportion *p*
                   (default 0.5).
-                - ``"skewed"`` — heavy-tailed (t-distribution, df=3).
+                - ``"right_skewed"`` — positively skewed distribution.
+                - ``"left_skewed"`` — negatively skewed distribution.
+                - ``"high_kurtosis"`` — heavy-tailed (t-distribution, df=3).
+                - ``"uniform"`` — uniform distribution.
                 - ``"factor(k)"`` — categorical with *k* levels (creates *k-1*
                   dummy variables).
                 - ``"factor(k, p1, p2, ...)"`` — factor with custom level
                   proportions.
 
-                Example: ``"x1=binary, x2=skewed, x3=factor(3)"``.
+                Example: ``"x1=binary, x2=right_skewed, x3=factor(3)"``.
 
         Returns:
             self: For method chaining.
@@ -479,62 +480,6 @@ def set_factor_levels(self, spec: str):
         self._applied = False
         return self
 
-    def set_heterogeneity(self, heterogeneity: float):
-        """Set heterogeneity (random variation) in effect sizes.
-
-        When non-zero, each simulation draws a per-simulation effect-size
-        multiplier from a normal distribution with mean 1 and the given
-        standard deviation. This models uncertainty about the true effect
-        size — for example, ``heterogeneity=0.1`` means effect sizes vary
-        by roughly +/- 10% across simulations.
-
-        This setting is deferred until ``apply()`` is called.
-
-        Args:
-            heterogeneity: Standard deviation of the random effect-size
-                multiplier. Must be non-negative. Default is 0 (no variation).
-
-        Returns:
-            self: For method chaining.
-
-        Raises:
-            TypeError: If *heterogeneity* is not numeric.
-        """
-        if not isinstance(heterogeneity, (int, float)):
-            raise TypeError("heterogeneity must be a number")
-
-        self._pending_heterogeneity = float(heterogeneity)
-        self._applied = False
-        return self
-
-    def set_heteroskedasticity(self, heteroskedasticity_correlation: float):
-        """Set heteroskedasticity (non-constant error variance).
-
-        Introduces a correlation between the first predictor's values and
-        the error standard deviation, producing variance that increases (or
-        decreases) with the predictor. This violates the homoskedasticity
-        assumption and typically reduces power.
-
-        This setting is deferred until ``apply()`` is called.
-
-        Args:
-            heteroskedasticity_correlation: Correlation between the first
-                predictor and the error standard deviation, in the range
-                [-1, 1]. Default is 0 (homoskedastic errors).
-
-        Returns:
-            self: For method chaining.
-
-        Raises:
-            TypeError: If the value is not numeric.
-        """
-        if not isinstance(heteroskedasticity_correlation, (int, float)):
-            raise TypeError("heteroskedasticity_correlation must be a number")
-
-        self._pending_heteroskedasticity = float(heteroskedasticity_correlation)
-        self._applied = False
-        return self
-
     def set_cluster(
         self,
         grouping_var: str,
@@ -769,6 +714,7 @@ def upload_data(
             "preserve_factor_level_names": preserve_factor_level_names,
         }
         self._applied = False
+        return self
 
     def set_scenario_configs(self, configs_dict: Dict):
         """Set custom scenario configurations for robustness analysis.
@@ -786,7 +732,9 @@ def set_scenario_configs(self, configs_dict: Dict):
             configs_dict: Mapping of scenario names to configuration dicts.
                 Each configuration may include keys such as
                 ``"heterogeneity"``, ``"heteroskedasticity"``,
-                ``"effect_size_jitter"``, and ``"distribution_jitter"``.
+                ``"correlation_noise_sd"``, and ``"distribution_change_prob"``.
+                See ``DEFAULT_SCENARIO_CONFIG`` in ``mcpower.core.scenarios``
+                for the full list of keys.
 
         Returns:
             self: For method chaining.
@@ -802,7 +750,8 @@ def set_scenario_configs(self, configs_dict: Dict):
             if scenario in merged:
                 merged[scenario].update(config)
             else:
-                merged[scenario] = config
+                # New custom scenarios inherit all keys from optimistic baseline
+                merged[scenario] = {**DEFAULT_SCENARIO_CONFIG["optimistic"], **config}
 
         self._scenario_configs = merged
         print(f"Custom scenario configs set: {', '.join(configs_dict.keys())}")
@@ -812,7 +761,7 @@ def set_scenario_configs(self, configs_dict: Dict):
     # Apply method (processes all pending settings)
     # =========================================================================
 
-    def apply(self):
+    def _apply(self):
         """
         Apply all pending settings to the model.
 
@@ -857,16 +806,22 @@ def apply(self):
         # 7. Apply correlations
         self._apply_correlations(_parser)
 
-        # 8. Apply heterogeneity/heteroskedasticity
-        self._apply_heterogeneity()
-
-        # 9. Validate model is ready
+        # 8. Validate model is ready
         model_result = _validate_model_ready(self)
         model_result.raise_if_invalid()
 
-        # Invalidate effect plan cache when settings change (Phase 2 optimization)
+        # Invalidate the effect plan cache — apply() rebuilds the variable
+        # registry state, so any cached column mappings are now stale.
         self._effect_plan_cache = None
 
+        # Clear pending state to prevent double-application
+        self._pending_variable_types = None
+        self._pending_factor_levels = None
+        self._pending_effects = None
+        self._pending_correlations = None
+        self._pending_data = None
+        self._pending_clusters = {}
+
         self._applied = True
         print("Model settings applied successfully")
         return self
@@ -1024,6 +979,21 @@ def _apply_data(self):
         # Extract matched data
         matched_data = data[:, matched_indices]
 
+        # Reject NaN values early
+        try:
+            if np.isnan(matched_data.astype(np.float64)).any():
+                nan_cols = [
+                    matched_columns[i] for i in range(matched_data.shape[1]) if np.isnan(matched_data[:, i].astype(np.float64)).any()
+                ]
+                raise ValueError(
+                    f"Uploaded data contains NaN values in columns: {', '.join(nan_cols)}. "
+                    f"Remove or impute missing values before uploading."
+                )
+        except (ValueError, TypeError):
+            # Object dtype columns (strings) can't be converted to float for NaN check.
+            # NaN check for numeric columns will happen after string encoding below.
+            pass
+
         # Convert to float64 if object dtype (common with mixed-type DataFrames)
         # String columns are encoded to integer indices; mapping is stored in string_col_indices
         string_col_indices = {}
@@ -1178,11 +1148,7 @@ def _apply_data_normal_mode(self, data, columns, type_info, mode, data_types_ove
                 level_labels = info.get("level_labels")
 
                 # Determine reference from data_types tuple override
-                reference_level = None
-                if col in data_types_override:
-                    dt = data_types_override[col]
-                    if isinstance(dt, tuple) and len(dt) == 2:
-                        reference_level = str(dt[1])
+                reference_level = self._extract_reference_level(data_types_override, col)
 
                 # Calculate proportions for each level
                 proportions = []
@@ -1200,7 +1166,10 @@ def _apply_data_normal_mode(self, data, columns, type_info, mode, data_types_ove
 
             else:  # continuous
                 # Normalize: mean=0, sd=1
-                normalized = (col_data - np.mean(col_data)) / np.std(col_data, ddof=1)
+                std = np.std(col_data, ddof=1)
+                if std < 1e-15:
+                    raise ValueError(f"Column '{col}' has zero variance (constant value). Remove it from the model or check your data.")
+                normalized = (col_data - np.mean(col_data)) / std
 
                 # Create lookup tables (type 99)
                 normal_vals, uploaded_vals = create_uploaded_lookup_tables(normalized.reshape(-1, 1))
@@ -1324,11 +1293,7 @@ def _apply_data_strict_mode(self, data, columns, type_info, data_types_override=
                 level_labels = info.get("level_labels")
 
                 # Determine reference from data_types tuple override
-                reference_level = None
-                if col in data_types_override:
-                    dt = data_types_override[col]
-                    if isinstance(dt, tuple) and len(dt) == 2:
-                        reference_level = str(dt[1])
+                reference_level = self._extract_reference_level(data_types_override, col)
 
                 self._uploaded_var_metadata[col] = {
                     "type": "factor",
@@ -1355,7 +1320,10 @@ def _apply_data_strict_mode(self, data, columns, type_info, data_types_override=
                 continuous_cols.append(idx)
                 # Normalize
                 col_data = data[:, idx]
-                normalized_data[:, idx] = (col_data - np.mean(col_data)) / np.std(col_data, ddof=1)
+                std = np.std(col_data, ddof=1)
+                if std < 1e-15:
+                    raise ValueError(f"Column '{col}' has zero variance (constant value). Remove it from the model or check your data.")
+                normalized_data[:, idx] = (col_data - np.mean(col_data)) / std
 
                 self._uploaded_var_metadata[col] = {
                     "type": "continuous",
@@ -1481,22 +1449,6 @@ def _apply_correlations(self, _parser):
             self._registry.set_correlation_matrix(correlations_input)
             print("Correlation matrix set")
 
-    def _apply_heterogeneity(self):
-        """Validate and apply pending heterogeneity and heteroskedasticity settings."""
-        if self._pending_heterogeneity is not None:
-            if self._pending_heterogeneity < 0:
-                raise ValueError("heterogeneity must be non-negative")
-            self.heterogeneity = self._pending_heterogeneity
-            if self.heterogeneity > 0:
-                print(f"Heterogeneity: SD = {self.heterogeneity}")
-
-        if self._pending_heteroskedasticity is not None:
-            if not -1 <= self._pending_heteroskedasticity <= 1:
-                raise ValueError("heteroskedasticity_correlation must be between -1 and 1")
-            self.heteroskedasticity = self._pending_heteroskedasticity
-            if abs(self.heteroskedasticity) > 1e-8:
-                print(f"Heteroskedasticity: correlation = {self.heteroskedasticity}")
-
     # =========================================================================
     # Analysis methods
     # =========================================================================
@@ -1507,7 +1459,7 @@ def find_power(
         target_test: str = "all",
         correction: Optional[str] = None,
         print_results: bool = True,
-        scenarios: bool = False,
+        scenarios: Union[bool, List[str]] = False,
         summary: str = "short",
         return_results: bool = False,
         test_formula: str = "",
@@ -1529,12 +1481,16 @@ def find_power(
                 Duplicate tests raise ``ValueError``.
             correction: Multiple comparison correction (None, "bonferroni", "benjamini-hochberg", "holm")
             print_results: Whether to print results
-            scenarios: Run scenario analysis
+            scenarios: Scenario analysis control:
+                - ``False`` (default): no scenario analysis.
+                - ``True``: run all configured scenarios.
+                - List of scenario names: run only the specified scenarios
+                  (e.g. ``["optimistic", "extreme"]``). Case-insensitive.
             summary: Output detail level ("short" or "long")
             return_results: Return results dict
             test_formula: Formula for statistical testing (default: use data generation formula).
                 If the formula contains random effects like (1|school), analysis switches to
-                mixed model testing (not yet implemented).
+                mixed model testing.
             progress_callback: Progress reporting control:
                 - ``None`` (default): auto-use ``PrintReporter`` when
                   *print_results* is ``True``.
@@ -1549,7 +1505,10 @@ def find_power(
         """
         # Auto-apply if settings have changed
         if not self._applied:
-            self.apply()
+            self._apply()
+
+        # Resolve scenarios parameter
+        scenario_filter = self._resolve_scenarios(scenarios)
 
         # Validate sample size (basic: >= 20, type check)
         _validate_sample_size(sample_size).raise_if_invalid()
@@ -1558,9 +1517,6 @@ def find_power(
         n_variables = len(self._registry.effect_names)
         _validate_sample_size_for_model(sample_size, n_variables).raise_if_invalid()
 
-        # Validate and adjust cluster sample sizes
-        self._validate_cluster_sample_size(sample_size)
-
         # Warn if sample size is much larger than uploaded data
         if self._uploaded_data_n > 0 and sample_size > 3 * self._uploaded_data_n:
             print(
@@ -1570,33 +1526,13 @@ def find_power(
             )
 
         self._validate_analysis_inputs(correction)
-        resolved_test_formula = self._resolve_test_formula(test_formula)
-        target_tests = self._parse_target_tests(target_test)
-
-        if correction and correction.lower() == "tukey" and not self._posthoc_specs:
-            raise ValueError(
-                "Tukey correction requires at least one post-hoc comparison "
-                "(e.g., target_test='group[0] vs group[1]'). "
-                "Tukey HSD only applies to pairwise contrasts between factor levels."
-            )
-
-        # Resolve progress callback
-        from .progress import PrintReporter, ProgressReporter, compute_total_simulations
-
-        if progress_callback is None:
-            effective_cb = PrintReporter() if print_results else None
-        elif progress_callback is False:
-            effective_cb = None
-        else:
-            effective_cb = progress_callback
+        resolved_test_formula, test_formula_effects, test_random_effects = self._resolve_test_formula(test_formula)
+        target_tests = self._parse_target_tests(target_test, test_formula_effects=test_formula_effects)
+        self._validate_tukey_posthoc(correction)
 
-        reporter = None
-        if effective_cb is not None:
-            n_scenarios = (len(self._scenario_configs or DEFAULT_SCENARIO_CONFIG) + 1) if scenarios else 1
-            total = compute_total_simulations(self._effective_n_simulations, 1, n_scenarios)
-            reporter = ProgressReporter(total, effective_cb)
+        reporter = self._resolve_progress(progress_callback, print_results, scenario_filter)
 
-        if scenarios:
+        if scenario_filter is not None:
             result = self._run_scenario_analysis(
                 "power",
                 sample_size=sample_size,
@@ -1605,8 +1541,11 @@ def find_power(
                 summary=summary,
                 print_results=print_results,
                 test_formula=resolved_test_formula,
+                test_formula_effects=test_formula_effects,
+                test_random_effects=test_random_effects,
                 progress=reporter,
                 cancel_check=cancel_check,
+                scenario_filter=scenario_filter,
             )
         else:
             if reporter is not None:
@@ -1615,7 +1554,10 @@ def find_power(
                 sample_size,
                 target_tests,
                 correction,
+                scenario_config=DEFAULT_SCENARIO_CONFIG["optimistic"],
                 test_formula=resolved_test_formula,
+                test_formula_effects=test_formula_effects,
+                test_random_effects=test_random_effects,
                 progress=reporter,
                 cancel_check=cancel_check,
             )
@@ -1623,7 +1565,7 @@ def find_power(
         if reporter is not None:
             reporter.finish()
 
-        if not scenarios and print_results:
+        if scenario_filter is None and print_results:
             print(f"\n{'=' * 80}")
             print("MONTE CARLO POWER ANALYSIS RESULTS")
             print(f"{'=' * 80}")
@@ -1641,7 +1583,7 @@ def find_sample_size(
         by: int = 5,
         correction: Optional[str] = None,
         print_results: bool = True,
-        scenarios: bool = False,
+        scenarios: Union[bool, List[str]] = False,
         summary: str = "short",
         return_results: bool = False,
         test_formula: str = "",
@@ -1659,12 +1601,16 @@ def find_sample_size(
             by: Step size between sample sizes
             correction: Multiple comparison correction
             print_results: Whether to print results
-            scenarios: Run scenario analysis
+            scenarios: Scenario analysis control:
+                - ``False`` (default): no scenario analysis.
+                - ``True``: run all configured scenarios.
+                - List of scenario names: run only the specified scenarios
+                  (e.g. ``["optimistic", "extreme"]``). Case-insensitive.
             summary: Output detail level
             return_results: Return results dict
             test_formula: Formula for statistical testing (default: use data generation formula).
                 If the formula contains random effects like (1|school), analysis switches to
-                mixed model testing (not yet implemented).
+                mixed model testing.
             progress_callback: Progress reporting control:
                 - ``None`` (default): auto-use ``PrintReporter`` when
                   *print_results* is ``True``.
@@ -1680,7 +1626,10 @@ def find_sample_size(
         """
         # Auto-apply if settings have changed
         if not self._applied:
-            self.apply()
+            self._apply()
+
+        # Resolve scenarios parameter
+        scenario_filter = self._resolve_scenarios(scenarios)
 
         # Validate from_size meets minimum requirements
         _validate_sample_size(from_size).raise_if_invalid()
@@ -1696,40 +1645,20 @@ def find_sample_size(
             )
 
         self._validate_analysis_inputs(correction)
-        resolved_test_formula = self._resolve_test_formula(test_formula)
+        resolved_test_formula, test_formula_effects, test_random_effects = self._resolve_test_formula(test_formula)
         validation_result = _validate_sample_size_range(from_size, to_size, by)
         for warning in validation_result.warnings:
             print(f"Warning: {warning}")
         validation_result.raise_if_invalid()
 
-        target_tests = self._parse_target_tests(target_test)
-
-        if correction and correction.lower() == "tukey" and not self._posthoc_specs:
-            raise ValueError(
-                "Tukey correction requires at least one post-hoc comparison "
-                "(e.g., target_test='group[0] vs group[1]'). "
-                "Tukey HSD only applies to pairwise contrasts between factor levels."
-            )
+        target_tests = self._parse_target_tests(target_test, test_formula_effects=test_formula_effects)
+        self._validate_tukey_posthoc(correction)
 
         sample_sizes = list(range(from_size, to_size + 1, by))
 
-        # Resolve progress callback
-        from .progress import PrintReporter, ProgressReporter, compute_total_simulations
-
-        if progress_callback is None:
-            effective_cb = PrintReporter() if print_results else None
-        elif progress_callback is False:
-            effective_cb = None
-        else:
-            effective_cb = progress_callback
-
-        reporter = None
-        if effective_cb is not None:
-            n_scenarios = (len(self._scenario_configs or DEFAULT_SCENARIO_CONFIG) + 1) if scenarios else 1
-            total = compute_total_simulations(self._effective_n_simulations, len(sample_sizes), n_scenarios)
-            reporter = ProgressReporter(total, effective_cb)
+        reporter = self._resolve_progress(progress_callback, print_results, scenario_filter, n_sample_sizes=len(sample_sizes))
 
-        if scenarios:
+        if scenario_filter is not None:
             result = self._run_scenario_analysis(
                 "sample_size",
                 target_tests=target_tests,
@@ -1738,8 +1667,11 @@ def find_sample_size(
                 summary=summary,
                 print_results=print_results,
                 test_formula=resolved_test_formula,
+                test_formula_effects=test_formula_effects,
+                test_random_effects=test_random_effects,
                 progress=reporter,
                 cancel_check=cancel_check,
+                scenario_filter=scenario_filter,
             )
         else:
             if reporter is not None:
@@ -1748,7 +1680,10 @@ def find_sample_size(
                 sample_sizes,
                 target_tests,
                 correction,
+                scenario_config=DEFAULT_SCENARIO_CONFIG["optimistic"],
                 test_formula=resolved_test_formula,
+                test_formula_effects=test_formula_effects,
+                test_random_effects=test_random_effects,
                 progress=reporter,
                 cancel_check=cancel_check,
             )
@@ -1756,7 +1691,7 @@ def find_sample_size(
         if reporter is not None:
             reporter.finish()
 
-        if not scenarios and print_results:
+        if scenario_filter is None and print_results:
             print(f"\n{'=' * 80}")
             print("SAMPLE SIZE ANALYSIS RESULTS")
             print(f"{'=' * 80}")
@@ -1780,6 +1715,8 @@ def _generate_dependent_variable(
         heterogeneity: float = 0.0,
         heteroskedasticity: float = 0.0,
         sim_seed: Optional[int] = None,
+        residual_dist: int = 0,
+        residual_df: float = 10.0,
     ) -> np.ndarray:
         """Generate the dependent variable as y = X @ beta + error via the active backend."""
         return get_backend().generate_y(
@@ -1788,19 +1725,103 @@ def _generate_dependent_variable(
             heterogeneity,
             heteroskedasticity,
             sim_seed if sim_seed is not None else -1,
+            residual_dist,
+            residual_df,
         )
 
     # =========================================================================
     # Internal methods
     # =========================================================================
 
+    @staticmethod
+    def _extract_reference_level(data_types_override, col):
+        """Extract reference level from data_types_override tuple for a column."""
+        dt = data_types_override.get(col)
+        if isinstance(dt, tuple) and len(dt) == 2:
+            return str(dt[1])
+        return None
+
+    def _resolve_scenarios(self, scenarios: Union[bool, List[str]]) -> Optional[List[str]]:
+        """Resolve the scenarios parameter into a list of scenario names or None.
+
+        Args:
+            scenarios: ``False`` for no scenarios, ``True`` for all configured
+                scenarios, or a list of scenario names (case-insensitive).
+
+        Returns:
+            List of validated, lowercase scenario names, or ``None`` if
+            scenarios are disabled.
+
+        Raises:
+            ValueError: If any requested scenario name is not configured.
+            TypeError: If *scenarios* is not ``bool`` or a list of strings.
+        """
+        if scenarios is False:
+            return None
+
+        all_configs = self._scenario_configs or DEFAULT_SCENARIO_CONFIG
+        available = set(all_configs.keys())
+
+        if scenarios is True:
+            return list(all_configs.keys())
+
+        if not isinstance(scenarios, list):
+            raise TypeError(f"scenarios must be True, False, or a list of scenario names, got {type(scenarios).__name__}")
+
+        # Case-insensitive matching
+        available_lower = {k.lower(): k for k in available}
+        resolved = []
+        invalid = []
+        for name in scenarios:
+            if not isinstance(name, str):
+                raise TypeError(f"Scenario names must be strings, got {type(name).__name__}")
+            key = available_lower.get(name.lower())
+            if key is None:
+                invalid.append(name)
+            else:
+                resolved.append(key)
+
+        if invalid:
+            raise ValueError(f"Unknown scenario(s): {', '.join(repr(n) for n in invalid)}. Available: {', '.join(sorted(available))}")
+
+        return resolved
+
+    def _resolve_progress(self, progress_callback, print_results, scenario_filter, n_sample_sizes=1):
+        """Resolve progress_callback into a ProgressReporter or None."""
+        from .progress import PrintReporter, ProgressReporter, compute_total_simulations
+
+        if progress_callback is None:
+            effective_cb = PrintReporter() if print_results else None
+        elif progress_callback is False:
+            effective_cb = None
+        else:
+            effective_cb = progress_callback
+
+        if effective_cb is None:
+            return None
+
+        n_scenarios = len(scenario_filter) if scenario_filter is not None else 1
+        total = compute_total_simulations(self._effective_n_simulations, n_sample_sizes, n_scenarios)
+        return ProgressReporter(total, effective_cb)
+
     def _validate_analysis_inputs(self, correction):
         """Validate the multiple-comparison correction method before analysis."""
         result = _validate_correction_method(correction)
         result.raise_if_invalid()
 
+    def _validate_tukey_posthoc(self, correction):
+        """Raise if Tukey correction is requested without posthoc specs."""
+        if correction and correction.lower() == "tukey" and not self._posthoc_specs:
+            raise ValueError(
+                "Tukey correction requires at least one post-hoc comparison "
+                "(e.g., target_test='group[0] vs group[1]'). "
+                "Tukey HSD only applies to pairwise contrasts between factor levels."
+            )
+
     def _validate_cluster_sample_size(self, sample_size: int):
         """Derive missing cluster dimensions from sample_size and validate minimums."""
+        # NOTE: This method both validates AND mutates — it derives missing
+        # cluster_size/n_clusters from sample_size before checking minimums.
         if not self._registry.cluster_names:
             return  # No clusters, nothing to do
 
@@ -1811,10 +1832,12 @@ def _validate_cluster_sample_size(self, sample_size: int):
             if spec.n_clusters is not None:
                 spec.cluster_size = sample_size // spec.n_clusters
             else:
-                assert spec.cluster_size is not None
+                if spec.cluster_size is None:
+                    raise RuntimeError(f"Cluster '{gv}': either n_clusters or cluster_size must be set")
                 spec.n_clusters = sample_size // spec.cluster_size
 
-            assert spec.n_clusters is not None and spec.cluster_size is not None
+            if spec.n_clusters is None or spec.cluster_size is None:
+                raise RuntimeError(f"Cluster '{gv}': failed to derive n_clusters and cluster_size from sample_size={sample_size}")
             actual_n = spec.n_clusters * spec.cluster_size
             if actual_n != sample_size:
                 print(
@@ -1825,7 +1848,7 @@ def _validate_cluster_sample_size(self, sample_size: int):
 
             _validate_cluster_sample_size(sample_size, spec.n_clusters, spec.cluster_size).raise_if_invalid()
 
-    def _parse_target_tests(self, target_test: Union[str, List[str]]) -> List[str]:
+    def _parse_target_tests(self, target_test: Union[str, List[str]], test_formula_effects: Optional[List[str]] = None) -> List[str]:
         """Parse a target_test argument into a list of effect names to test.
 
         Supports regular effect names (e.g. ``"x1"``, ``"overall"``),
@@ -1875,7 +1898,10 @@ def _parse_target_tests(self, target_test: Union[str, List[str]]) -> List[str]:
         cluster_effects = self._registry.cluster_effect_names
 
         if "all" in keywords:
-            fixed_effects = [e for e in self._registry.effect_names if e not in cluster_effects]
+            if test_formula_effects is not None:
+                fixed_effects = [e for e in test_formula_effects if e not in cluster_effects]
+            else:
+                fixed_effects = [e for e in self._registry.effect_names if e not in cluster_effects]
             keyword_expansion += ["overall"] + fixed_effects
 
         if "all-posthoc" in keywords:
@@ -1929,6 +1955,17 @@ def _parse_target_tests(self, target_test: Union[str, List[str]]) -> List[str]:
                 "(e.g. 'all'), do not also list tests that are already included."
             )
 
+        # -- Phase 7b: Validate explicit tests against test formula ----------------
+        if test_formula_effects is not None:
+            test_formula_set = set(test_formula_effects)
+            for test in expanded:
+                if " vs " in test or test == "overall":
+                    continue
+                if test not in test_formula_set:
+                    raise ValueError(
+                        f"Target test '{test}' is not in the test formula. Available effects: {', '.join(test_formula_effects)}"
+                    )
+
         # -- Phase 8: Parse posthoc specs + validate ------------------------------
         regular_tests: list[str] = []
         posthoc_specs: list[PostHocSpec] = []
@@ -1982,6 +2019,8 @@ def _parse_target_tests(self, target_test: Union[str, List[str]]) -> List[str]:
                 # User level k (k≥2) = dummy factor[k]
                 effect_order = list(self._registry._effects.keys())
 
+                # Returns None for the reference level, which is absorbed into the
+                # intercept in dummy coding and has no dedicated design matrix column.
                 def _level_to_col(factor_name, user_level, _effect_order=effect_order):
                     factor_info = self._registry._factors[factor_name]
                     reference = factor_info.get("reference_level", 1)
@@ -2069,30 +2108,60 @@ def _create_X_extended(self, X):
 
         return np.column_stack(columns) if columns else np.empty((X.shape[0], 0))
 
-    def _prepare_metadata(self, target_tests, correction=None):
+    def _prepare_metadata(self, target_tests, correction=None, test_formula_effects=None):
         """Pre-compute all static simulation metadata from the current model state."""
-        return prepare_metadata(self, target_tests, correction)
+        return prepare_metadata(self, target_tests, correction, test_formula_effects=test_formula_effects)
 
-    def _resolve_test_formula(self, test_formula: str) -> str:
-        """Resolve test formula and update _test_method accordingly.
+    def _resolve_test_formula(self, test_formula: str):
+        """Resolve test formula, validate, parse, and update _test_method.
 
-        Returns the resolved formula string.
+        Returns:
+            Tuple of (formula_string, test_effect_names, random_effects).
+            test_effect_names is None when test_formula is empty (use generation formula).
         """
+        from .utils.parsers import _parse_equation
+
         if not test_formula:
             resolved = self._registry.equation
-        else:
-            resolved = test_formula
+            _, _, random_effects = _parse_equation(resolved)
+            if random_effects:
+                self._test_method = "mixed_model"
+            else:
+                self._test_method = "linear_regression"
+            return resolved, None, []
 
-        from .utils.parsers import _parse_equation
+        # Validate test formula variables exist in the model
+        from .utils.validators import _validate_test_formula
 
-        _, _, random_effects = _parse_equation(resolved)
+        available_vars = (
+            [self._registry.dependent] + self._registry.non_factor_names + self._registry.factor_names + self._registry.cluster_names
+        )
+        validation = _validate_test_formula(test_formula, available_vars)
+        validation.raise_if_invalid()
+
+        # Parse test formula to get effects and random effects
+        from .utils.test_formula_utils import _extract_test_formula_effects
+
+        test_effects, random_effects = _extract_test_formula_effects(test_formula, self._registry)
+
+        if not test_effects:
+            raise ValueError(f"test_formula '{test_formula}' contains no testable effects from the data generation model.")
+
+        # Check for OLS -> LME cross (invalid: no cluster data to fit)
+        if random_effects and not self._registry._cluster_specs:
+            grouping_vars = [re["grouping_var"] for re in random_effects]
+            raise ValueError(
+                f"test_formula contains random effects ({grouping_vars}) but the "
+                f"data generation model has no cluster structure. Cannot fit a "
+                f"mixed model to data without clusters."
+            )
 
         if random_effects:
             self._test_method = "mixed_model"
         else:
             self._test_method = "linear_regression"
 
-        return resolved
+        return test_formula, test_effects, random_effects
 
     def _run_find_power(
         self,
@@ -2101,6 +2170,8 @@ def _run_find_power(
         correction,
         scenario_config=None,
         test_formula=None,
+        test_formula_effects=None,
+        test_random_effects=None,
         progress=None,
         cancel_check=None,
     ):
@@ -2109,13 +2180,15 @@ def _run_find_power(
         self._validate_cluster_sample_size(sample_size)
 
         # Route based on test method (routing logic handled in simulation.py)
-        metadata = self._prepare_metadata(target_tests, correction)
+        metadata = self._prepare_metadata(target_tests, correction, test_formula_effects)
 
-        if scenario_config:
-            metadata.heterogeneity = scenario_config["heterogeneity"]
-            metadata.heteroskedasticity = scenario_config["heteroskedasticity"]
-            if metadata.cluster_specs:
-                metadata.lme_scenario_config = scenario_config
+        # Set the random effects flag for test formula
+        if test_random_effects:
+            metadata.test_has_random_effects = True
+
+        # scenario_config is always a dict (SCENARIO_ZERO or user-provided)
+        metadata.heterogeneity = scenario_config["heterogeneity"]
+        metadata.heteroskedasticity = scenario_config["heteroskedasticity"]
 
         runner = SimulationRunner(
             n_simulations=self._effective_n_simulations,
@@ -2127,9 +2200,15 @@ def _run_find_power(
         )
 
         # Compute critical values once before the simulation loop
-        p = len(metadata.effect_sizes)
+        # Use test formula's effect count for critical values when subsetting
+        if metadata.test_column_indices is not None:
+            p = metadata.test_effect_count
+            n_targets = len(metadata.test_target_indices)
+        else:
+            p = len(metadata.effect_sizes)
+            n_targets = len(metadata.target_indices)
+
         dof = sample_size - p - 1
-        n_targets = len(metadata.target_indices)
         n_posthoc = len(metadata.posthoc_specs)
 
         if n_posthoc > 0 and metadata.posthoc_method == "t-test":
@@ -2185,7 +2264,7 @@ def analyze_func(X, y, indices, alpha, correction):
             analyze_func=analyze_func,
             create_X_extended_func=self._create_X_extended,
             scenario_config=scenario_config,
-            apply_perturbations_func=(apply_per_simulation_perturbations if scenario_config else None),
+            apply_perturbations_func=apply_per_simulation_perturbations,
             progress=progress,
             cancel_check=cancel_check,
         )
@@ -2193,11 +2272,17 @@ def analyze_func(X, y, indices, alpha, correction):
         if not sim_results:
             return {}
 
+        # When test formula is active, filter target_tests to only effects in the test model
+        effective_target_tests = target_tests
+        if test_formula_effects is not None:
+            test_effect_set = set(test_formula_effects)
+            effective_target_tests = [t for t in target_tests if t == "overall" or t in test_effect_set]
+
         processor = ResultsProcessor(target_power=self.power)
         power_results = processor.calculate_powers(
             sim_results["all_results"],
             sim_results["all_results_corrected"],
-            target_tests,
+            effective_target_tests,
         )
 
         # Add n_simulations_failed to power_results
@@ -2207,13 +2292,13 @@ def analyze_func(X, y, indices, alpha, correction):
         # Tukey correction only applies to pairwise contrasts; NaN-ify others
         if correction and correction.lower() == "tukey" and power_results.get("individual_powers_corrected"):
             posthoc_labels = {s.label for s in self._posthoc_specs}
-            for test in target_tests:
+            for test in effective_target_tests:
                 if test not in posthoc_labels:
                     power_results["individual_powers_corrected"][test] = float("nan")
 
         return build_power_result(
             model_type=self.model_type,
-            target_tests=target_tests,
+            target_tests=effective_target_tests,
             formula_to_test=test_formula,
             equation=self.equation,
             sample_size=sample_size,
@@ -2243,12 +2328,15 @@ def _run_sample_size_analysis(
         correction,
         scenario_config=None,
         test_formula=None,
+        test_formula_effects=None,
+        test_random_effects=None,
         progress=None,
         cancel_check=None,
     ):
         """Iterate over sample sizes, running power analysis for each."""
         from .progress import SimulationCancelled
 
+        use_sequential = True
         if self._is_parallel_effective():
             from joblib import Parallel, delayed
 
@@ -2258,7 +2346,18 @@ def _run_sample_size_analysis(
                     backend="loky",
                     verbose=0,
                     return_as="generator",
-                )(delayed(self._run_find_power)(ss, target_tests, correction, scenario_config, test_formula) for ss in sample_sizes)
+                )(
+                    delayed(self._run_find_power)(
+                        ss,
+                        target_tests,
+                        correction,
+                        scenario_config,
+                        test_formula,
+                        test_formula_effects,
+                        test_random_effects,
+                    )
+                    for ss in sample_sizes
+                )
                 results = []
                 for ss, result in zip(sample_sizes, power_results, strict=False):
                     if cancel_check is not None and cancel_check():
@@ -2266,25 +2365,13 @@ def _run_sample_size_analysis(
                     results.append((ss, result))
                     if progress is not None:
                         progress.advance(self._effective_n_simulations)
+                use_sequential = False
             except Exception as e:
                 if isinstance(e, SimulationCancelled):
                     raise
                 print(f"Warning: Parallel execution failed ({e}). Falling back to sequential.")
-                results = []
-                for ss in sample_sizes:
-                    if cancel_check is not None and cancel_check():
-                        raise SimulationCancelled("Simulation cancelled by user") from None
-                    result = self._run_find_power(
-                        ss,
-                        target_tests,
-                        correction,
-                        scenario_config,
-                        test_formula,
-                        progress=progress,
-                        cancel_check=cancel_check,
-                    )
-                    results.append((ss, result))
-        else:
+
+        if use_sequential:
             results = []
             for sample_size in sample_sizes:
                 if cancel_check is not None and cancel_check():
@@ -2295,19 +2382,28 @@ def _run_sample_size_analysis(
                     correction,
                     scenario_config,
                     test_formula,
+                    test_formula_effects=test_formula_effects,
+                    test_random_effects=test_random_effects,
                     progress=progress,
                     cancel_check=cancel_check,
                 )
                 results.append((sample_size, power_result))
 
         processor = ResultsProcessor(target_power=self.power)
-        analysis_results = processor.process_sample_size_results(results, target_tests, correction)
+        # Filter target_tests to match test formula effects
+        if test_formula_effects is not None:
+            test_set = set(test_formula_effects)
+            effective_target_tests = [t for t in target_tests if t in test_set or t == "overall"]
+        else:
+            effective_target_tests = target_tests
+
+        analysis_results = processor.process_sample_size_results(results, effective_target_tests, correction)
 
         # Tukey correction only applies to pairwise contrasts; NaN-ify others
         if correction and correction.lower() == "tukey":
             posthoc_labels = {s.label for s in self._posthoc_specs}
             if analysis_results.get("powers_by_test_corrected"):
-                for test in target_tests:
+                for test in effective_target_tests:
                     if test not in posthoc_labels:
                         n_points = len(analysis_results["powers_by_test_corrected"][test])
                         analysis_results["powers_by_test_corrected"][test] = [float("nan")] * n_points
@@ -2315,7 +2411,7 @@ def _run_sample_size_analysis(
 
         return build_sample_size_result(
             model_type=self.model_type,
-            target_tests=target_tests,
+            target_tests=effective_target_tests,
             formula_to_test=test_formula,
             equation=self.equation,
             sample_sizes=sample_sizes,
@@ -2331,9 +2427,16 @@ def _run_scenario_analysis(self, analysis_type, **kwargs):
         """Delegate to ScenarioRunner for multi-scenario power or sample-size analysis."""
         from functools import partial
 
-        configs = self._scenario_configs or DEFAULT_SCENARIO_CONFIG
+        all_configs = self._scenario_configs or DEFAULT_SCENARIO_CONFIG
+        scenario_filter = kwargs.pop("scenario_filter", None)
+        if scenario_filter is not None:
+            configs = {k: all_configs[k] for k in scenario_filter}
+        else:
+            configs = all_configs
         scenario_runner = ScenarioRunner(self, configs)
         test_formula = kwargs.get("test_formula")
+        test_formula_effects = kwargs.get("test_formula_effects")
+        test_random_effects = kwargs.get("test_random_effects")
         progress = kwargs.get("progress")
         cancel_check = kwargs.get("cancel_check")
 
@@ -2341,6 +2444,8 @@ def _run_scenario_analysis(self, analysis_type, **kwargs):
             run_power_func = partial(
                 self._run_find_power,
                 test_formula=test_formula,
+                test_formula_effects=test_formula_effects,
+                test_random_effects=test_random_effects,
                 progress=progress,
                 cancel_check=cancel_check,
             )
@@ -2357,6 +2462,8 @@ def _run_scenario_analysis(self, analysis_type, **kwargs):
             run_ss_func = partial(
                 self._run_sample_size_analysis,
                 test_formula=test_formula,
+                test_formula_effects=test_formula_effects,
+                test_random_effects=test_random_effects,
                 progress=progress,
                 cancel_check=cancel_check,
             )
diff --git a/mcpower/progress.py b/mcpower/progress.py
index e733148..dca3c25 100644
--- a/mcpower/progress.py
+++ b/mcpower/progress.py
@@ -87,10 +87,7 @@ def __init__(self, **tqdm_kwargs):
         self._bar = None
 
     def __call__(self, current: int, total: int):
-        try:
-            from tqdm import tqdm
-        except ImportError:
-            raise ImportError("tqdm is required for TqdmReporter. Install with: pip install tqdm") from None
+        from tqdm import tqdm
 
         if self._bar is None:
             self._bar = tqdm(total=total, unit="sim", **self._tqdm_kwargs)
diff --git a/mcpower/stats/data_generation.py b/mcpower/stats/data_generation.py
index 0d8800c..3c46d89 100644
--- a/mcpower/stats/data_generation.py
+++ b/mcpower/stats/data_generation.py
@@ -23,7 +23,6 @@
 SKEW_STD = np.sqrt(np.exp(2) - np.exp(1))
 NORM_SCALE = (DIST_RESOLUTION - 1) / (NORM_RANGE[1] - NORM_RANGE[0])
 PERC_SCALE = (DIST_RESOLUTION - 1) / (PERCENTILE_RANGE[1] - PERCENTILE_RANGE[0])
-FLOAT_NEAR_ZERO = 1e-15
 
 # Global lookup tables
 NORM_CDF_TABLE = None
@@ -58,13 +57,12 @@ def _compute_t3_sd():
     Replicates the vectorised norm-CDF -> t(3)-PPF lookup chain on a large
     fixed-seed sample to get a stable SD estimate.
     """
-    assert NORM_CDF_TABLE is not None
-    assert T3_PPF_TABLE is not None
+    if NORM_CDF_TABLE is None or T3_PPF_TABLE is None:
+        raise RuntimeError("Distribution tables not initialized — _init_tables() must be called first")
 
-    rng_state = np.random.get_state()
-    np.random.seed(999999)
-    z = np.random.standard_normal(200000)
-    np.random.set_state(rng_state)
+    # Use a local RNG to avoid affecting the global state and to be thread-safe.
+    rng = np.random.RandomState(999999)
+    z = rng.standard_normal(200000)
 
     # Step 1: Normal CDF lookup (z -> percentile)
     z_clipped = np.clip(z, NORM_RANGE[0], NORM_RANGE[1])
@@ -107,9 +105,16 @@ def create_uploaded_lookup_tables(
 
     for var_idx in range(n_vars):
         data = data_matrix[:, var_idx]
-        normalized = (data - np.mean(data)) / np.std(data)
+        std = np.std(data)
+        if std < 1e-15:
+            raise ValueError(
+                f"Variable at index {var_idx} has zero variance (constant value). Remove it from the model or check your data."
+            )
+        normalized = (data - np.mean(data)) / std
         sorted_uploaded = np.sort(normalized)
 
+        # Weibull plotting positions: i/(n+1) avoids 0 and 1, which would map
+        # to -inf/+inf under the normal PPF, giving well-behaved quantiles.
         percentiles = np.linspace(1 / (n_samples + 1), n_samples / (n_samples + 1), n_samples)
         normal_quantiles = norm_ppf_array(percentiles)
 
@@ -126,13 +131,12 @@ def _generate_factors(sample_size, factor_specs, seed):
     Args:
         sample_size: Number of observations
         factor_specs: List of {'n_levels': int, 'proportions': [float, ...]}
-        seed: Random seed
+        seed: Random seed (callers pass sim_seed + 3)
 
     Returns:
         X_factors: (sample_size, total_dummies) array
     """
-    if seed is not None:
-        np.random.seed(seed)
+    rng = np.random.RandomState(seed)
 
     if not factor_specs:
         return np.empty((sample_size, 0), dtype=float)
@@ -141,7 +145,7 @@ def _generate_factors(sample_size, factor_specs, seed):
     for spec in factor_specs:
         n_levels = spec["n_levels"]
         proportions = spec["proportions"]
-        factor_data = np.random.choice(n_levels, size=sample_size, p=proportions)
+        factor_data = rng.choice(n_levels, size=sample_size, p=proportions)
         dummies = np.eye(n_levels, dtype=float)[factor_data]
         factor_columns.append(dummies[:, 1:])
 
@@ -170,12 +174,11 @@ def bootstrap_uploaded_data(
         X_non_factors: Non-factor variables (continuous + binary mapped to 0-1)
         X_factors: Factor dummy variables
     """
-    if seed is not None:
-        np.random.seed(seed)
+    rng = np.random.RandomState(seed)
 
     # Bootstrap whole rows
     n_samples = raw_data.shape[0]
-    row_indices = np.random.choice(n_samples, size=sample_size, replace=True)
+    row_indices = rng.choice(n_samples, size=sample_size, replace=True)
     bootstrapped_data = raw_data[row_indices, :]
 
     # Separate by type
@@ -286,12 +289,17 @@ def _generate_cluster_effects(
     Returns:
         X_cluster: (sample_size, n_cluster_vars) array of random effect columns
     """
-    if sim_seed is not None:
-        # Use a derived seed to avoid collision with X generation seed
-        np.random.seed(sim_seed + 3)
+    rng = np.random.RandomState(sim_seed + 4 if sim_seed is not None else None)
 
     columns = []
 
+    # Extract perturbation defaults once
+    perturb = lme_perturbations or {}
+    tau_mults = perturb.get("tau_squared_multipliers", {})
+    re_dist_val = perturb.get("random_effect_dist", "normal")
+    re_df_val = perturb.get("random_effect_df", 5)
+    has_perturb = lme_perturbations is not None
+
     for gv, spec in cluster_specs.items():
         n_clusters = spec["n_clusters"]
         cluster_size = spec["cluster_size"]
@@ -302,19 +310,16 @@ def _generate_cluster_effects(
             cluster_size = sample_size // n_clusters
 
         # Apply LME perturbations if present
-        if lme_perturbations is not None:
-            multiplier = lme_perturbations["tau_squared_multipliers"].get(gv, 1.0)
-            tau_sq = tau_sq * multiplier
+        if has_perturb:
+            tau_sq = tau_sq * tau_mults.get(gv, 1.0)
 
         tau = np.sqrt(tau_sq)
 
         # Generate random intercepts (possibly non-normal)
-        if lme_perturbations is not None:
-            re_dist = lme_perturbations.get("random_effect_dist", "normal")
-            re_df = lme_perturbations.get("random_effect_df", 5)
-            random_intercepts = _generate_non_normal_intercepts(n_clusters, tau, re_dist, re_df)
+        if has_perturb:
+            random_intercepts = _generate_non_normal_intercepts(n_clusters, tau, re_dist_val, re_df_val, rng_state=rng)
         else:
-            random_intercepts = np.random.normal(0, tau, size=n_clusters)
+            random_intercepts = rng.normal(0, tau, size=n_clusters)
 
         # Create id_effect column: repeat each cluster's intercept
         # cluster_id assignment: [0,0,...,0, 1,1,...,1, ..., K-1,K-1,...,K-1]
@@ -413,14 +418,20 @@ def _generate_random_effects(
         A :class:`RandomEffectsResult` with intercept columns, slope
         contributions, cluster IDs, Z matrices, and nesting metadata.
     """
-    if sim_seed is not None:
-        np.random.seed(sim_seed + 3)
+    rng = np.random.RandomState(sim_seed + 4 if sim_seed is not None else None)
 
     intercept_cols: List[np.ndarray] = []
     slope_contribution = np.zeros(sample_size)
     cluster_ids_dict: Dict[str, np.ndarray] = {}
     Z_matrices: Dict[str, np.ndarray] = {}
 
+    # Extract perturbation defaults once (avoids repeated dict lookups)
+    perturb = lme_perturbations or {}
+    tau_multipliers = perturb.get("tau_squared_multipliers", {})
+    re_dist = perturb.get("random_effect_dist", "normal")
+    re_df = perturb.get("random_effect_df", 5)
+    has_perturbations = lme_perturbations is not None
+
     # Nested model bookkeeping
     child_to_parent: Optional[np.ndarray] = None
     K_parent = 0
@@ -460,19 +471,16 @@ def _generate_random_effects(
 
         # Apply LME perturbations: ICC jitter on tau_squared
         tau_sq = spec["tau_squared"]
-        if lme_perturbations is not None:
-            multiplier = lme_perturbations["tau_squared_multipliers"].get(gv, 1.0)
-            tau_sq = tau_sq * multiplier
+        if has_perturbations:
+            tau_sq = tau_sq * tau_multipliers.get(gv, 1.0)
 
         if q == 1:
             # --- Random intercept only ---
             tau = np.sqrt(tau_sq)
-            if lme_perturbations is not None:
-                re_dist = lme_perturbations.get("random_effect_dist", "normal")
-                re_df = lme_perturbations.get("random_effect_df", 5)
-                random_intercepts = _generate_non_normal_intercepts(n_clusters, tau, re_dist, re_df)
+            if has_perturbations:
+                random_intercepts = _generate_non_normal_intercepts(n_clusters, tau, re_dist, re_df, rng_state=rng)
             else:
-                random_intercepts = np.random.normal(0, tau, size=n_clusters)
+                random_intercepts = rng.normal(0, tau, size=n_clusters)
             id_effect = _trim_or_pad(np.repeat(random_intercepts, cluster_size), sample_size)
             intercept_cols.append(id_effect)
 
@@ -482,33 +490,29 @@ def _generate_random_effects(
             slope_vars = spec.get("random_slope_vars", [])
 
             # Apply ICC jitter to G_matrix intercept variance
-            if lme_perturbations is not None:
+            if has_perturbations:
                 ratio = tau_sq / spec["tau_squared"] if spec["tau_squared"] > 0 else 1.0
                 # Scale intercept row/column of G by sqrt(ratio)
                 sqrt_ratio = np.sqrt(ratio)
                 G_matrix[0, :] *= sqrt_ratio
                 G_matrix[:, 0] *= sqrt_ratio
 
-            # Draw correlated [b_int, b_slope1, ...] per cluster
-            re_dist = lme_perturbations.get("random_effect_dist", "normal") if lme_perturbations else "normal"
-            re_df = lme_perturbations.get("random_effect_df", 5) if lme_perturbations else 5
-
-            if re_dist == "heavy_tailed" and lme_perturbations is not None:
+            if re_dist == "heavy_tailed" and has_perturbations:
                 # Multivariate t: MVN(0, G * (df-2)/df) × sqrt(df / chi2(df))
                 df = max(re_df, 3)
                 G_scaled = G_matrix * ((df - 2.0) / df)
-                b_normal = np.random.multivariate_normal(np.zeros(q), G_scaled, size=n_clusters)
-                chi2_samples = np.random.chisquare(df, size=n_clusters)
+                b_normal = rng.multivariate_normal(np.zeros(q), G_scaled, size=n_clusters)
+                chi2_samples = rng.chisquare(df, size=n_clusters)
                 mixing = np.sqrt(df / chi2_samples)
                 b = b_normal * mixing[:, np.newaxis]
-            elif re_dist == "skewed" and lme_perturbations is not None:
+            elif re_dist == "skewed" and has_perturbations:
                 # Independent skewed marginals via shifted chi-squared, scaled by Cholesky
                 df = max(re_df, 3)
                 L = np.linalg.cholesky(G_matrix)
-                raw = (np.random.chisquare(df, size=(n_clusters, q)) - df) / np.sqrt(2 * df)
+                raw = (rng.chisquare(df, size=(n_clusters, q)) - df) / np.sqrt(2 * df)
                 b = raw @ L.T
             else:
-                b = np.random.multivariate_normal(np.zeros(q), G_matrix, size=n_clusters)
+                b = rng.multivariate_normal(np.zeros(q), G_matrix, size=n_clusters)
 
             # Intercept component
             intercept_effect = _trim_or_pad(np.repeat(b[:, 0], cluster_size), sample_size)
@@ -549,21 +553,19 @@ def _generate_random_effects(
         tau_sq_parent = p_spec["tau_squared"]
         tau_sq_child = c_spec["tau_squared"]
 
-        if lme_perturbations is not None:
-            tau_sq_parent *= lme_perturbations["tau_squared_multipliers"].get(p_gv, 1.0)
-            tau_sq_child *= lme_perturbations["tau_squared_multipliers"].get(c_gv, 1.0)
+        if has_perturbations:
+            tau_sq_parent *= tau_multipliers.get(p_gv, 1.0)
+            tau_sq_child *= tau_multipliers.get(c_gv, 1.0)
 
         tau_parent = np.sqrt(tau_sq_parent)
         tau_child = np.sqrt(tau_sq_child)
 
-        if lme_perturbations is not None:
-            re_dist = lme_perturbations.get("random_effect_dist", "normal")
-            re_df = lme_perturbations.get("random_effect_df", 5)
-            b_parent = _generate_non_normal_intercepts(K_parent, tau_parent, re_dist, re_df)
-            b_child = _generate_non_normal_intercepts(K_child, tau_child, re_dist, re_df)
+        if has_perturbations:
+            b_parent = _generate_non_normal_intercepts(K_parent, tau_parent, re_dist, re_df, rng_state=rng)
+            b_child = _generate_non_normal_intercepts(K_child, tau_child, re_dist, re_df, rng_state=rng)
         else:
-            b_parent = np.random.normal(0, tau_parent, size=K_parent)
-            b_child = np.random.normal(0, tau_child, size=K_child)
+            b_parent = rng.normal(0, tau_parent, size=K_parent)
+            b_child = rng.normal(0, tau_child, size=K_child)
 
         # IDs: parent_ids assigns each observation to a parent cluster,
         # child_ids assigns each observation to a child cluster.
diff --git a/mcpower/stats/distributions.py b/mcpower/stats/distributions.py
index cf13bca..6ee5265 100644
--- a/mcpower/stats/distributions.py
+++ b/mcpower/stats/distributions.py
@@ -3,9 +3,7 @@
 Provides F, t, chi2, normal, and studentized range distribution
 functions plus batch critical-value computation and table generation.
 
-Backend priority:
-  1. C++ native (Boost.Math + R Tukey port) via mcpower_native
-  2. scipy (optional shim, for when C++ is not compiled)
+All functions are provided by the C++ native backend (Boost.Math + R Tukey port).
 
 Usage:
     from mcpower.stats.distributions import norm_ppf, compute_critical_values_ols
@@ -14,13 +12,11 @@
 import numpy as np
 
 # ============================================================================
-# Backend selection
+# Backend — native C++ only
 # ============================================================================
 
-_BACKEND = None
-
 try:
-    from mcpower.backends.mcpower_native import (  # type: ignore[import]
+    from mcpower.backends.mcpower_native import (  # type: ignore[import]  # noqa: F401
         chi2_cdf,
         chi2_ppf,
         compute_critical_values_lme,
@@ -36,171 +32,17 @@
         t_ppf,
     )
 
-    _BACKEND = "native"
-
-except ImportError:
-    # -------------------------------------------------------------------
-    # scipy shim -- temporary fallback for when C++ is not compiled.
-    # Will be removed when Python fallback backends are fully dropped.
-    # -------------------------------------------------------------------
-    try:
-        from scipy.stats import (  # isort: skip
-            chi2 as _chi2_dist,
-            f as _f_dist,
-            norm as _norm_dist,
-            studentized_range as _sr_dist,
-            t as _t_dist,
-        )
-
-        def norm_ppf(p):  # noqa: F811
-            """Standard normal quantile function (inverse CDF)."""
-            return float(_norm_dist.ppf(p))
-
-        def norm_cdf(x):  # noqa: F811
-            """Standard normal CDF."""
-            return float(_norm_dist.cdf(x))
-
-        def t_ppf(p, df):  # noqa: F811
-            """Student's t quantile function."""
-            return float(_t_dist.ppf(p, df))
-
-        def f_ppf(p, dfn, dfd):  # noqa: F811
-            """Fisher F quantile function."""
-            return float(_f_dist.ppf(p, dfn, dfd))
-
-        def chi2_ppf(p, df):  # noqa: F811
-            """Chi-squared quantile function."""
-            return float(_chi2_dist.ppf(p, df))
-
-        def chi2_cdf(x, df):  # noqa: F811
-            """Chi-squared CDF."""
-            return float(_chi2_dist.cdf(x, df))
-
-        def studentized_range_ppf(p, k, df):  # noqa: F811
-            """Studentized range quantile (Tukey). k=groups, df=denom df."""
-            if df < 2 or k < 2 or k > 200 or p <= 0.0 or p >= 1.0:
-                return float("inf")
-            return float(_sr_dist.ppf(p, k, df))
-
-        def compute_critical_values_ols(alpha, dfn, dfd, n_targets, correction_method):  # noqa: F811
-            """Compute OLS critical values using scipy (fallback).
-
-            Args:
-                alpha: Significance level.
-                dfn: Numerator degrees of freedom (number of predictors).
-                dfd: Denominator degrees of freedom (n - p - 1).
-                n_targets: Number of individual effects being tested.
-                correction_method: 0=none, 1=Bonferroni, 2=FDR (BH), 3=Holm.
-
-            Returns:
-                Tuple of (f_crit, t_crit, correction_t_crits) where
-                correction_t_crits is an ndarray of length n_targets.
-            """
-            if dfd <= 0:
-                return np.inf, np.inf, np.full(max(n_targets, 1), np.inf)
-
-            f_crit = _f_dist.ppf(1 - alpha, dfn, dfd) if dfn > 0 else np.inf
-            t_crit = _t_dist.ppf(1 - alpha / 2, dfd)
-
-            m = n_targets
-            if m == 0:
-                return f_crit, t_crit, np.empty(0)
-
-            if correction_method == 0:  # None
-                correction_t_crits = np.full(m, t_crit)
-            elif correction_method == 1:  # Bonferroni
-                bonf_crit = _t_dist.ppf(1 - alpha / (2 * m), dfd)
-                correction_t_crits = np.full(m, bonf_crit)
-            elif correction_method == 2:  # FDR (Benjamini-Hochberg)
-                correction_t_crits = np.array(
-                    [_t_dist.ppf(1 - (k + 1) / m * alpha / 2, dfd) if (k + 1) / m * alpha / 2 >= 1e-12 else np.inf for k in range(m)]
-                )
-            elif correction_method == 3:  # Holm
-                correction_t_crits = np.array(
-                    [_t_dist.ppf(1 - alpha / (2 * (m - k)), dfd) if alpha / (2 * (m - k)) >= 1e-12 else np.inf for k in range(m)]
-                )
-            else:
-                correction_t_crits = np.full(m, t_crit)
-
-            return f_crit, t_crit, correction_t_crits
-
-        def compute_tukey_critical_value(alpha, n_levels, dfd):  # noqa: F811
-            """Compute Tukey HSD critical value (q / sqrt(2))."""
-            if dfd <= 0:
-                return np.inf
-            q_crit = _sr_dist.ppf(1 - alpha, n_levels, dfd)
-            return q_crit / np.sqrt(2)
-
-        def compute_critical_values_lme(alpha, n_fixed, n_targets, correction_method):  # noqa: F811
-            """Compute LME critical values using scipy (fallback).
-
-            Args:
-                alpha: Significance level.
-                n_fixed: Number of fixed effects (excluding intercept).
-                n_targets: Number of individual effects being tested.
-                correction_method: 0=none, 1=Bonferroni, 2=FDR (BH), 3=Holm.
-
-            Returns:
-                Tuple of (chi2_crit, z_crit, correction_z_crits) where
-                correction_z_crits is an ndarray of length n_targets.
-            """
-            chi2_crit = _chi2_dist.ppf(1 - alpha, n_fixed) if n_fixed > 0 else np.inf
-            z_crit = _norm_dist.ppf(1 - alpha / 2)
-
-            m = n_targets
-            if m == 0:
-                return chi2_crit, z_crit, np.empty(0)
-
-            if correction_method == 0:  # None
-                correction_z_crits = np.full(m, z_crit)
-            elif correction_method == 1:  # Bonferroni
-                bonf = _norm_dist.ppf(1 - alpha / (2 * m))
-                correction_z_crits = np.full(m, bonf)
-            elif correction_method == 2:  # FDR (Benjamini-Hochberg)
-                correction_z_crits = np.array(
-                    [_norm_dist.ppf(1 - (k + 1) / m * alpha / 2) if (k + 1) / m * alpha / 2 >= 1e-12 else np.inf for k in range(m)]
-                )
-            elif correction_method == 3:  # Holm
-                correction_z_crits = np.array(
-                    [_norm_dist.ppf(1 - alpha / (2 * (m - k))) if alpha / (2 * (m - k)) >= 1e-12 else np.inf for k in range(m)]
-                )
-            else:
-                correction_z_crits = np.full(m, z_crit)
-
-            return chi2_crit, z_crit, correction_z_crits
-
-        def generate_norm_cdf_table(x_min, x_max, resolution):  # noqa: F811
-            """Generate normal CDF lookup table."""
-            x = np.linspace(x_min, x_max, resolution)
-            return _norm_dist.cdf(x).astype(np.float64)
-
-        def generate_t3_ppf_table(perc_min, perc_max, resolution):  # noqa: F811
-            """Generate t(3) PPF lookup table (divided by sqrt(3))."""
-            p = np.linspace(perc_min, perc_max, resolution)
-            return (_t_dist.ppf(p, 3) / np.sqrt(3)).astype(np.float64)
-
-        def norm_ppf_array(percentiles):  # noqa: F811
-            """Vectorized normal PPF for percentile array."""
-            return _norm_dist.ppf(np.asarray(percentiles)).astype(np.float64)
-
-        _BACKEND = "scipy"
-
-    except ImportError as exc:
-        raise ImportError(
-            "No distribution backend available. "
-            "Install from PyPI for prebuilt C++ wheels: pip install MCPower\n"
-            "Or install scipy as fallback: pip install scipy"
-        ) from exc
+except ImportError as exc:
+    raise ImportError("Native C++ backend not available. Install from PyPI for prebuilt wheels: pip install MCPower") from exc
 
 
 # ============================================================================
-# Also re-export scipy optimizer shims for lme_solver.py
-# These replace scipy.optimize.minimize and minimize_scalar
+# Optimizer wrappers for lme_solver.py
 # ============================================================================
 
 
 def minimize_lbfgsb(objective, x0, bounds, maxiter=200, ftol=1e-10, gtol=1e-6):
-    """L-BFGS-B minimization -- C++ native or scipy fallback.
+    """L-BFGS-B minimization via native C++ backend.
 
     Args:
         objective: Callable f(x) -> float
@@ -213,53 +55,15 @@ def minimize_lbfgsb(objective, x0, bounds, maxiter=200, ftol=1e-10, gtol=1e-6):
     Returns:
         Object with .x (optimal point), .fun (optimal value), .converged (bool)
     """
-    if _BACKEND == "native":
-        try:
-            from mcpower.backends.mcpower_native import lbfgsb_minimize_fd  # type: ignore[import]
-
-            lb = np.array([b[0] for b in bounds])
-            ub = np.array([b[1] for b in bounds])
-            return lbfgsb_minimize_fd(objective, np.asarray(x0, dtype=np.float64), lb, ub, maxiter, ftol, gtol)
-        except ImportError:
-            import warnings
-
-            warnings.warn(
-                "Native L-BFGS-B optimizer not available despite native backend being loaded. Falling back to scipy.",
-                RuntimeWarning,
-                stacklevel=2,
-            )
-        except Exception as e:
-            import warnings
+    from mcpower.backends.mcpower_native import lbfgsb_minimize_fd  # type: ignore[import]
 
-            warnings.warn(
-                f"Native L-BFGS-B optimizer failed ({type(e).__name__}: {e}), falling back to scipy.",
-                RuntimeWarning,
-                stacklevel=2,
-            )
-
-    # scipy fallback
-    from scipy.optimize import minimize
-
-    result = minimize(
-        objective,
-        x0,
-        method="L-BFGS-B",
-        bounds=bounds,
-        options={"maxiter": maxiter, "ftol": ftol, "gtol": gtol},
-    )
-
-    class _Result:
-        __slots__ = ("x", "fun", "converged")
-
-    r = _Result()
-    r.x = result.x
-    r.fun = result.fun
-    r.converged = result.success
-    return r
+    lb = np.array([b[0] for b in bounds])
+    ub = np.array([b[1] for b in bounds])
+    return lbfgsb_minimize_fd(objective, np.asarray(x0, dtype=np.float64), lb, ub, maxiter, ftol, gtol)
 
 
 def minimize_scalar_brent(objective, bounds, tol=1e-8, maxiter=150):
-    """Brent 1D minimization -- C++ native or scipy fallback.
+    """Brent 1D minimization via native C++ backend.
 
     Args:
         objective: Callable f(x) -> float
@@ -270,43 +74,6 @@ def minimize_scalar_brent(objective, bounds, tol=1e-8, maxiter=150):
     Returns:
         Object with .x (optimal point), .fun (optimal value), .converged (bool)
     """
-    if _BACKEND == "native":
-        try:
-            from mcpower.backends.mcpower_native import brent_minimize_scalar  # type: ignore[import]
-
-            return brent_minimize_scalar(objective, bounds[0], bounds[1], tol, maxiter)
-        except ImportError:
-            import warnings
-
-            warnings.warn(
-                "Native Brent optimizer not available despite native backend being loaded. Falling back to scipy.",
-                RuntimeWarning,
-                stacklevel=2,
-            )
-        except Exception as e:
-            import warnings
-
-            warnings.warn(
-                f"Native Brent optimizer failed ({type(e).__name__}: {e}), falling back to scipy.",
-                RuntimeWarning,
-                stacklevel=2,
-            )
-
-    # scipy fallback
-    from scipy.optimize import minimize_scalar
-
-    result = minimize_scalar(
-        objective,
-        bounds=bounds,
-        method="bounded",
-        options={"xatol": tol, "maxiter": maxiter},
-    )
-
-    class _Result:
-        __slots__ = ("x", "fun", "converged")
+    from mcpower.backends.mcpower_native import brent_minimize_scalar  # type: ignore[import]
 
-    r = _Result()
-    r.x = result.x
-    r.fun = result.fun
-    r.converged = bool(getattr(result, "success", True))
-    return r
+    return brent_minimize_scalar(objective, bounds[0], bounds[1], tol, maxiter)
diff --git a/mcpower/stats/mixed_models.py b/mcpower/stats/mixed_models.py
index 414e911..07de5e8 100644
--- a/mcpower/stats/mixed_models.py
+++ b/mcpower/stats/mixed_models.py
@@ -11,10 +11,12 @@
 
 import threading
 import warnings
-from typing import Any, Dict, List, Optional, Union
+from typing import Any, Dict, Optional, Union
 
 import numpy as np
 
+from ..backends.native import _prep
+
 # Suppress statsmodels convergence warnings (expected with small samples/low ICC).
 # Module-level filterwarnings with module= is unreliable for statsmodels internals,
 # so we also use catch_warnings() context managers around .fit() calls below.
@@ -30,7 +32,6 @@ def _lme_analysis_wrapper(
     y: np.ndarray,
     target_indices: np.ndarray,
     cluster_ids: np.ndarray,
-    cluster_column_indices: List[int],
     correction_method: int,
     alpha: float,
     backend: str = "custom",
@@ -51,7 +52,6 @@ def _lme_analysis_wrapper(
         y: (n,) response vector
         target_indices: Coefficient indices to test (fixed effects only)
         cluster_ids: (n,) cluster membership array [0,0,0, 1,1,1, ...]
-        cluster_column_indices: Indices of cluster effect columns (unused)
         correction_method: 0=none, 1=Bonferroni, 2=FDR, 3=Holm
         alpha: Significance level
         backend: "custom" (default) or "statsmodels" (fallback)
@@ -118,9 +118,7 @@ def _lme_analysis_wrapper(
                 verbose=verbose,
             )
     elif backend == "statsmodels":
-        return _lme_analysis_statsmodels(
-            X_expanded, y, target_indices, cluster_ids, cluster_column_indices, correction_method, alpha, verbose
-        )
+        return _lme_analysis_statsmodels(X_expanded, y, target_indices, cluster_ids, correction_method, alpha, verbose)
     else:
         raise ValueError(f"Unknown backend: {backend}")
 
@@ -130,7 +128,6 @@ def _lme_analysis_statsmodels(
     y: np.ndarray,
     target_indices: np.ndarray,
     cluster_ids: np.ndarray,
-    cluster_column_indices: List[int],
     correction_method: int,
     alpha: float,
     verbose: bool = False,
@@ -145,11 +142,10 @@ def _lme_analysis_statsmodels(
     - Convergence retry strategy (allows ≤3% failures)
 
     Args:
-        X_expanded: (n, p) design matrix (includes cluster effect columns)
+        X_expanded: (n, p) design matrix (excludes cluster effect columns)
         y: (n,) response vector
         target_indices: Coefficient indices to test (fixed effects only)
         cluster_ids: (n,) cluster membership array
-        cluster_column_indices: Indices of cluster effect columns to remove
         correction_method: 0=none, 1=Bonferroni, 2=FDR, 3=Holm
         alpha: Significance level
         verbose: Return detailed diagnostics
@@ -172,8 +168,6 @@ def _lme_analysis_statsmodels(
     n, p = X_expanded.shape
     n_targets = len(target_indices)
 
-    # Note: X_expanded already excludes cluster effects (they're not in the design matrix)
-    # cluster_column_indices is now unused in this function but kept for API compatibility
     X_fixed = X_expanded
 
     # Step 1: Add intercept to fixed effects
@@ -530,6 +524,29 @@ def _compute_wald_test(result, alpha):
         return results_array
 
 
+def _ensure_lme_crits(alpha, p, n_targets, correction_method, chi2_crit, z_crit, correction_z_crits):
+    """Compute LME critical values on-the-fly if not precomputed."""
+    if z_crit is None or chi2_crit is None or correction_z_crits is None:
+        from .lme_solver import compute_lme_critical_values
+
+        return compute_lme_critical_values(alpha, p, n_targets, correction_method)
+    return chi2_crit, z_crit, correction_z_crits
+
+
+def _wrap_native_result(result, verbose, solver_name, extra_diag=None) -> Optional[Union[np.ndarray, Dict]]:
+    """Wrap C++ solver result with optional verbose diagnostics."""
+    if len(result) > 0:
+        if verbose:
+            diag = {"solver": solver_name}
+            if extra_diag:
+                diag.update(extra_diag)
+            return {"results": result, "diagnostics": diag}
+        return np.asarray(result)
+    if verbose:
+        return {"results": None, "failure_reason": f"C++ {solver_name} returned empty result"}
+    return None
+
+
 def _lme_analysis_custom(
     X_expanded: np.ndarray,
     y: np.ndarray,
@@ -545,39 +562,29 @@ def _lme_analysis_custom(
     """LME analysis for random-intercept models via C++ backend.
 
     Uses precomputed critical values (chi2_crit, z_crit) to avoid
-    per-simulation scipy calls. Falls back to computing them if not provided.
+    per-simulation distribution calls. Falls back to computing them if not provided.
     """
     n, p = X_expanded.shape
     n_targets = len(target_indices)
     K = int(cluster_ids.max()) + 1
 
-    if z_crit is None or chi2_crit is None or correction_z_crits is None:
-        from .lme_solver import compute_lme_critical_values
-
-        chi2_crit, z_crit, correction_z_crits = compute_lme_critical_values(alpha, p, n_targets, correction_method)
+    chi2_crit, z_crit, correction_z_crits = _ensure_lme_crits(alpha, p, n_targets, correction_method, chi2_crit, z_crit, correction_z_crits)
 
     from mcpower.backends import mcpower_native as _native  # type: ignore[attr-defined]
 
     result = _native.lme_analysis(
-        np.ascontiguousarray(X_expanded, dtype=np.float64),
-        np.ascontiguousarray(y, dtype=np.float64),
-        np.ascontiguousarray(cluster_ids, dtype=np.int32),
+        _prep(X_expanded),
+        _prep(y),
+        _prep(cluster_ids, np.int32),
         K,
-        np.ascontiguousarray(target_indices, dtype=np.int32),
+        _prep(target_indices, np.int32),
         float(chi2_crit),
         float(z_crit),
-        np.ascontiguousarray(correction_z_crits, dtype=np.float64),
+        _prep(correction_z_crits),
         int(correction_method),
         float(-1.0),
     )
-    if len(result) > 0:
-        if verbose:
-            return {"results": result, "diagnostics": {"solver": "native_q1"}}
-        return result  # type: ignore[no-any-return]
-
-    if verbose:
-        return {"results": None, "failure_reason": "C++ solver returned empty result"}
-    return None
+    return _wrap_native_result(result, verbose, "native_q1")
 
 
 def _lme_analysis_custom_general(
@@ -594,8 +601,6 @@ def _lme_analysis_custom_general(
     verbose: bool = False,
 ) -> Optional[Union[np.ndarray, Dict]]:
     """LME analysis for random slopes (q > 1) via C++ backend."""
-    from .lme_solver import compute_lme_critical_values
-
     n, p = X_expanded.shape
     n_targets = len(target_indices)
 
@@ -604,34 +609,26 @@ def _lme_analysis_custom_general(
     q = Z.shape[1]
     K = int(cluster_ids.max()) + 1
 
-    if z_crit is None or chi2_crit is None or correction_z_crits is None:
-        chi2_crit, z_crit, correction_z_crits = compute_lme_critical_values(alpha, p, n_targets, correction_method)
+    chi2_crit, z_crit, correction_z_crits = _ensure_lme_crits(alpha, p, n_targets, correction_method, chi2_crit, z_crit, correction_z_crits)
 
     from mcpower.backends import mcpower_native as _native  # type: ignore[attr-defined]
 
     warm_theta_arr = np.empty(0, dtype=np.float64)
     result = _native.lme_analysis_general(
-        np.ascontiguousarray(X_expanded, dtype=np.float64),
-        np.ascontiguousarray(y, dtype=np.float64),
-        np.ascontiguousarray(Z, dtype=np.float64),
-        np.ascontiguousarray(cluster_ids, dtype=np.int32),
+        _prep(X_expanded),
+        _prep(y),
+        _prep(Z),
+        _prep(cluster_ids, np.int32),
         K,
         q,
-        np.ascontiguousarray(target_indices, dtype=np.int32),
+        _prep(target_indices, np.int32),
         float(chi2_crit),
         float(z_crit),
-        np.ascontiguousarray(correction_z_crits, dtype=np.float64),
+        _prep(correction_z_crits),
         int(correction_method),
         warm_theta_arr,
     )
-    if len(result) > 0:
-        if verbose:
-            return {"results": result, "diagnostics": {"solver": "native_general", "q": q}}
-        return result  # type: ignore[no-any-return]
-
-    if verbose:
-        return {"results": None, "failure_reason": "C++ general solver returned empty result"}
-    return None
+    return _wrap_native_result(result, verbose, "native_general", extra_diag={"q": q})
 
 
 def _lme_analysis_custom_nested(
@@ -647,8 +644,6 @@ def _lme_analysis_custom_nested(
     verbose: bool = False,
 ) -> Optional[Union[np.ndarray, Dict]]:
     """LME analysis for nested random intercepts via C++ backend."""
-    from .lme_solver import compute_lme_critical_values
-
     n, p = X_expanded.shape
     n_targets = len(target_indices)
 
@@ -658,35 +653,27 @@ def _lme_analysis_custom_nested(
     K_child = re_result.K_child
     child_to_parent = re_result.child_to_parent
 
-    if z_crit is None or chi2_crit is None or correction_z_crits is None:
-        chi2_crit, z_crit, correction_z_crits = compute_lme_critical_values(alpha, p, n_targets, correction_method)
+    chi2_crit, z_crit, correction_z_crits = _ensure_lme_crits(alpha, p, n_targets, correction_method, chi2_crit, z_crit, correction_z_crits)
 
     from mcpower.backends import mcpower_native as _native  # type: ignore[attr-defined]
 
     warm_theta_arr = np.empty(0, dtype=np.float64)
     result = _native.lme_analysis_nested(
-        np.ascontiguousarray(X_expanded, dtype=np.float64),
-        np.ascontiguousarray(y, dtype=np.float64),
-        np.ascontiguousarray(parent_ids, dtype=np.int32),
-        np.ascontiguousarray(child_ids, dtype=np.int32),
+        _prep(X_expanded),
+        _prep(y),
+        _prep(parent_ids, np.int32),
+        _prep(child_ids, np.int32),
         K_parent,
         K_child,
-        np.ascontiguousarray(child_to_parent, dtype=np.int32),
-        np.ascontiguousarray(target_indices, dtype=np.int32),
+        _prep(child_to_parent, np.int32),
+        _prep(target_indices, np.int32),
         float(chi2_crit),
         float(z_crit),
-        np.ascontiguousarray(correction_z_crits, dtype=np.float64),
+        _prep(correction_z_crits),
         int(correction_method),
         warm_theta_arr,
     )
-    if len(result) > 0:
-        if verbose:
-            return {"results": result, "diagnostics": {"solver": "native_nested", "K_parent": K_parent, "K_child": K_child}}
-        return result  # type: ignore[no-any-return]
-
-    if verbose:
-        return {"results": None, "failure_reason": "C++ nested solver returned empty result"}
-    return None
+    return _wrap_native_result(result, verbose, "native_nested", extra_diag={"K_parent": K_parent, "K_child": K_child})
 
 
 def reset_warm_start_cache():
diff --git a/mcpower/tables/lookup.py b/mcpower/tables/lookup.py
index b66fefa..5f4d691 100644
--- a/mcpower/tables/lookup.py
+++ b/mcpower/tables/lookup.py
@@ -15,7 +15,7 @@ class LookupTableManager:
     """Manages pre-computed lookup tables for data-generation transforms.
 
     Tables are lazily loaded from disk (``tables/data/*.npz``) on first
-    access and generated from scipy if the cache files are missing.
+    access and generated via the C++ native backend if the cache files are missing.
     The C++ native backend consumes these tables for distribution
     transforms.
 
@@ -47,47 +47,37 @@ def ensure_data_dir(self) -> None:
         """Ensure data directory exists."""
         self.data_dir.mkdir(parents=True, exist_ok=True)
 
-    def load_norm_cdf_table(self) -> np.ndarray:
-        """Load (or generate and cache) the normal CDF lookup table.
+    def _load_table(self, key: str, generate_fn) -> np.ndarray:
+        """Load a table from cache, disk, or generate it on the fly.
+
+        Args:
+            key: Cache key and npz array name (e.g. ``"norm_cdf"``).
+            generate_fn: Bound method to generate and cache the table.
 
         Returns:
             1-D float64 array of length ``DIST_RESOLUTION``.
         """
-        if "norm_cdf" in self._tables:
-            return self._tables["norm_cdf"]
-
-        cache_file = self.data_dir / "norm_cdf.npz"
+        if key in self._tables:
+            return self._tables[key]
 
+        cache_file = self.data_dir / f"{key}.npz"
         try:
             data = np.load(cache_file)
-            self._tables["norm_cdf"] = data["norm_cdf"]
-            return self._tables["norm_cdf"]
+            self._tables[key] = data[key]
+            return self._tables[key]
         except (FileNotFoundError, KeyError):
             pass
 
-        self._generate_norm_cdf_table()
-        return self._tables["norm_cdf"]
-
-    def load_t3_ppf_table(self) -> np.ndarray:
-        """Load (or generate and cache) the t(df=3) PPF lookup table.
+        generate_fn()
+        return self._tables[key]
 
-        Returns:
-            1-D float64 array of length ``DIST_RESOLUTION``.
-        """
-        if "t3_ppf" in self._tables:
-            return self._tables["t3_ppf"]
-
-        cache_file = self.data_dir / "t3_ppf.npz"
-
-        try:
-            data = np.load(cache_file)
-            self._tables["t3_ppf"] = data["t3_ppf"]
-            return self._tables["t3_ppf"]
-        except (FileNotFoundError, KeyError):
-            pass
+    def load_norm_cdf_table(self) -> np.ndarray:
+        """Load (or generate and cache) the normal CDF lookup table."""
+        return self._load_table("norm_cdf", self._generate_norm_cdf_table)
 
-        self._generate_t3_ppf_table()
-        return self._tables["t3_ppf"]
+    def load_t3_ppf_table(self) -> np.ndarray:
+        """Load (or generate and cache) the t(df=3) PPF lookup table."""
+        return self._load_table("t3_ppf", self._generate_t3_ppf_table)
 
     def load_all_generation_tables(self) -> Tuple[np.ndarray, np.ndarray]:
         """
@@ -110,6 +100,8 @@ def _generate_norm_cdf_table(self) -> None:
         self._tables["norm_cdf"] = norm_cdf
 
         self.ensure_data_dir()
+        # Silently ignore cache write failures (e.g. read-only filesystem,
+        # permission denied). Tables are still usable from memory.
         try:
             np.savez_compressed(self.data_dir / "norm_cdf.npz", norm_cdf=norm_cdf, x_range=x_norm)
         except Exception:
@@ -125,6 +117,8 @@ def _generate_t3_ppf_table(self) -> None:
         self._tables["t3_ppf"] = t3_ppf
 
         self.ensure_data_dir()
+        # Silently ignore cache write failures (e.g. read-only filesystem,
+        # permission denied). Tables are still usable from memory.
         try:
             np.savez_compressed(
                 self.data_dir / "t3_ppf.npz",
diff --git a/mcpower/utils/formatters.py b/mcpower/utils/formatters.py
index b8bd0b0..8bc91b3 100644
--- a/mcpower/utils/formatters.py
+++ b/mcpower/utils/formatters.py
@@ -6,6 +6,7 @@
 """
 
 import math
+from itertools import combinations
 from typing import Any, Dict, List, Optional
 
 import numpy as np
@@ -13,6 +14,11 @@
 __all__ = []
 
 
+def _is_nan(value) -> bool:
+    """Check if a value is NaN (float type check + math.isnan)."""
+    return isinstance(value, float) and math.isnan(value)
+
+
 class _TableFormatter:
     """Static helpers for building fixed-width text tables."""
 
@@ -25,7 +31,10 @@ def _create_table(
         """Create formatted table with headers and rows."""
 
         if not col_widths:
-            col_widths = [max(len(str(h)), max(len(str(row[i])) + 2 for row in rows)) for i, h in enumerate(headers)]
+            if rows:
+                col_widths = [max(len(str(h)), max(len(str(row[i])) + 2 for row in rows)) for i, h in enumerate(headers)]
+            else:
+                col_widths = [len(str(h)) for h in headers]
 
         lines = []
 
@@ -131,7 +140,7 @@ def _format_short_power(self, data: Dict) -> str:
 
             for test in model["target_tests"]:
                 power_corr = results["individual_powers_corrected"][test]
-                if isinstance(power_corr, float) and math.isnan(power_corr):
+                if _is_nan(power_corr):
                     rows_corrected.append([test, "-", f"{target:.0f}", "-"])
                 else:
                     status = "✓" if power_corr >= target else "✗"
@@ -162,7 +171,7 @@ def _format_long_power(self, data: Dict) -> str:
                 power = results["individual_powers"][test]
                 power_corr = results.get("individual_powers_corrected", {}).get(test, power)
                 target = model.get("target_power", 80.0)
-                if isinstance(power_corr, float) and math.isnan(power_corr):
+                if _is_nan(power_corr):
                     rows.append([test, f"{power:.2f}", "-", f"{target:.1f}", "-"])
                 else:
                     achieved = "✓" if power_corr >= target else "✗"
@@ -331,13 +340,15 @@ def _format_scenario_power_short(self, scenarios: Dict, target_tests: List[str],
 
         lines = [f"\n{'=' * 80}", "SCENARIO SUMMARY", f"{'=' * 80}"]
 
+        scenario_names = list(scenarios.keys())
+        headers = ["Test"] + [name.title() for name in scenario_names]
+        col_widths = [40] + [12] * len(scenario_names)
+
         # Uncorrected table
-        headers = ["Test", "Optimistic", "Realistic", "Doomer"]
         rows = []
-
         for test in target_tests:
             row = [test]
-            for scenario in ["optimistic", "realistic", "doomer"]:
+            for scenario in scenario_names:
                 if scenario in scenarios and "results" in scenarios[scenario]:
                     power = scenarios[scenario]["results"]["individual_powers"][test]
                     row.append(f"{power:.1f}")
@@ -346,17 +357,17 @@ def _format_scenario_power_short(self, scenarios: Dict, target_tests: List[str],
             rows.append(row)
 
         lines.append("\nUncorrected Power:")
-        lines.append(self._table._create_table(headers, rows, [40, 12, 12, 12]))
+        lines.append(self._table._create_table(headers, rows, col_widths))
 
         # Corrected table if applicable
         if correction:
             rows_corr = []
             for test in target_tests:
                 row = [test]
-                for scenario in ["optimistic", "realistic", "doomer"]:
+                for scenario in scenario_names:
                     if scenario in scenarios and "results" in scenarios[scenario]:
                         power_corr = scenarios[scenario]["results"]["individual_powers_corrected"][test]
-                        if isinstance(power_corr, float) and math.isnan(power_corr):
+                        if _is_nan(power_corr):
                             row.append("-")
                         else:
                             row.append(f"{power_corr:.1f}")
@@ -365,7 +376,7 @@ def _format_scenario_power_short(self, scenarios: Dict, target_tests: List[str],
                 rows_corr.append(row)
 
             lines.append(f"\nCorrected Power ({correction}):")
-            lines.append(self._table._create_table(headers, rows_corr, [40, 12, 12, 12]))
+            lines.append(self._table._create_table(headers, rows_corr, col_widths))
 
         lines.append(f"{'=' * 80}")
 
@@ -395,74 +406,74 @@ def _format_scenario_power_long(
         lines.append("DETAILED SCENARIO RESULTS")
         lines.append(f"{'=' * 80}")
 
-        for scenario_name in ["optimistic", "realistic", "doomer"]:
-            if scenario_name in scenarios:
-                lines.append(f"\n{'-' * 80}")
-                lines.append(f"{scenario_name.upper()} SCENARIO")
-                lines.append(f"{'-' * 80}")
-
-                # Use regular power formatter for each scenario
-                scenario_data = {
-                    "model": scenarios[scenario_name]["model"],
-                    "results": scenarios[scenario_name]["results"],
-                }
-                lines.append(self._format_long_power(scenario_data))
-
-        # 3. Comparison analysis
-        lines.append(f"\n{'=' * 80}")
-        lines.append("ROBUSTNESS ANALYSIS")
-        lines.append(f"{'=' * 80}")
-
-        # Power reduction table
-        headers = ["Test", "Opt→Real Drop", "Opt→Doom Drop", "Vulnerability"]
-        rows = []
-        vulnerable_tests = []
-        inflated_tests = []
+        for scenario_name in scenarios:
+            lines.append(f"\n{'-' * 80}")
+            lines.append(f"{scenario_name.upper()} SCENARIO")
+            lines.append(f"{'-' * 80}")
+
+            scenario_data = {
+                "model": scenarios[scenario_name]["model"],
+                "results": scenarios[scenario_name]["results"],
+            }
+            lines.append(self._format_long_power(scenario_data))
+
+        # 3. Comparison analysis — compare each non-optimistic scenario to optimistic
+        if "optimistic" in scenarios and len(scenarios) > 1:
+            lines.append(f"\n{'=' * 80}")
+            lines.append("ROBUSTNESS ANALYSIS")
+            lines.append(f"{'=' * 80}")
+
+            other_scenarios = [s for s in scenarios if s != "optimistic"]
+            headers = ["Test"] + [f"Opt→{s.title()} Drop" for s in other_scenarios] + ["Vulnerability"]
+            rows = []
+            vulnerable_tests = []
+            inflated_tests = []
 
-        for test in target_tests:
-            opt_power = scenarios["optimistic"]["results"]["individual_powers"][test]
-            real_power = scenarios.get("realistic", {}).get("results", {}).get("individual_powers", {}).get(test, opt_power)
-            doom_power = scenarios.get("doomer", {}).get("results", {}).get("individual_powers", {}).get(test, opt_power)
-
-            real_drop = opt_power - real_power
-            doom_drop = opt_power - doom_power
-
-            # Format drops with proper signs
-            real_drop_str = f"+{abs(real_drop):.1f}%" if real_drop < 0 else f"-{real_drop:.1f}%"
-            doom_drop_str = f"+{abs(doom_drop):.1f}%" if doom_drop < 0 else f"-{doom_drop:.1f}%"
-
-            # Vulnerability assessment and categorization
-            if doom_drop > HIGH_VULNERABILITY_THRESHOLD:
-                vulnerability = "HIGH"
-                vulnerable_tests.append(test)
-            elif doom_drop > MEDIUM_VULNERABILITY_THRESHOLD:
-                vulnerability = "MEDIUM"
-            elif doom_drop < INFLATED_ERROR_THRESHOLD:
-                vulnerability = "INFLATED FALSE POSITIVES"
-                inflated_tests.append(test)
-            else:
-                vulnerability = "LOW"
+            for test in target_tests:
+                opt_power = scenarios["optimistic"]["results"]["individual_powers"][test]
+                row = [test]
+                max_drop = 0.0
+
+                for scenario in other_scenarios:
+                    other_power = scenarios.get(scenario, {}).get("results", {}).get("individual_powers", {}).get(test, opt_power)
+                    drop = opt_power - other_power
+                    max_drop = max(max_drop, drop)
+                    drop_str = f"+{abs(drop):.1f}%" if drop < 0 else f"-{drop:.1f}%"
+                    row.append(drop_str)
+
+                if max_drop > HIGH_VULNERABILITY_THRESHOLD:
+                    vulnerability = "HIGH"
+                    vulnerable_tests.append(test)
+                elif max_drop > MEDIUM_VULNERABILITY_THRESHOLD:
+                    vulnerability = "MEDIUM"
+                elif max_drop < INFLATED_ERROR_THRESHOLD:
+                    vulnerability = "INFLATED FALSE POSITIVES"
+                    inflated_tests.append(test)
+                else:
+                    vulnerability = "LOW"
 
-            rows.append([test, real_drop_str, doom_drop_str, vulnerability])
+                row.append(vulnerability)
+                rows.append(row)
 
-        lines.append(self._table._create_table(headers, rows))
+            lines.append(self._table._create_table(headers, rows))
 
         # 4. Recommendations
-        lines.append(f"\n{'=' * 80}")
-        lines.append("RECOMMENDATIONS")
-        lines.append(f"{'=' * 80}")
+        if "optimistic" in scenarios and len(scenarios) > 1:
+            lines.append(f"\n{'=' * 80}")
+            lines.append("RECOMMENDATIONS")
+            lines.append(f"{'=' * 80}")
 
-        if vulnerable_tests:
-            lines.append(f"• High vulnerability tests: {', '.join(vulnerable_tests)}")
-            lines.append("• Consider increasing sample size to maintain power under adverse conditions")
+            if vulnerable_tests:
+                lines.append(f"• High vulnerability tests: {', '.join(vulnerable_tests)}")
+                lines.append("• Consider increasing sample size to maintain power under adverse conditions")
 
-        if inflated_tests:
-            lines.append(f"• Inflated false positive risk: {', '.join(inflated_tests)}")
-            lines.append("• Be careful about interpretation")
+            if inflated_tests:
+                lines.append(f"• Inflated false positive risk: {', '.join(inflated_tests)}")
+                lines.append("• Be careful about interpretation")
 
-        if not vulnerable_tests and not inflated_tests:
-            lines.append("• Power analysis appears robust to assumption violations")
-            lines.append("• Original sample size should be sufficient")
+            if not vulnerable_tests and not inflated_tests:
+                lines.append("• Power analysis appears robust to assumption violations")
+                lines.append("• Original sample size should be sufficient")
 
         return "\n".join(lines)
 
@@ -484,25 +495,22 @@ def _format_scenario_sample_size_short(self, scenarios: Dict, target_tests: List
         """Short scenario sample size summary."""
 
         lines = [f"\n{'=' * 80}", "SCENARIO SUMMARY", f"{'=' * 80}"]
+        scenario_names = list(scenarios.keys())
 
         if correction:
             # Combined table with uncorrected and corrected
             lines.append("\nSample Size Requirements:")
-            headers = [
-                "Test",
-                "Opt(U)",
-                "Opt(C)",
-                "Real(U)",
-                "Real(C)",
-                "Doom(U)",
-                "Doom(C)",
-            ]
+            headers = ["Test"]
+            for name in scenario_names:
+                abbrev = name[:4].title()
+                headers.extend([f"{abbrev}(U)", f"{abbrev}(C)"])
+            col_widths = [40] + [8] * (len(scenario_names) * 2)
 
             rows = []
             for test in target_tests:
-                row = [test[:40]]  # Truncate to 40 chars
+                row = [test[:40]]
 
-                for scenario in ["optimistic", "realistic", "doomer"]:
+                for scenario in scenario_names:
                     if scenario in scenarios and "results" in scenarios[scenario]:
                         n_uncorr = scenarios[scenario]["results"]["first_achieved"][test]
                         n_corr = scenarios[scenario]["results"]["first_achieved_corrected"][test]
@@ -520,16 +528,17 @@ def _format_scenario_sample_size_short(self, scenarios: Dict, target_tests: List
                         row.extend(["N/A", "N/A"])
                 rows.append(row)
 
-            lines.append(self._table._create_table(headers, rows, [40, 8, 8, 8, 8, 8, 8]))
+            lines.append(self._table._create_table(headers, rows, col_widths))
             lines.append("Note: (U) = Uncorrected, (C) = Corrected")
         else:
             # Uncorrected only
-            headers = ["Test", "Optimistic", "Realistic", "Doomer"]
+            headers = ["Test"] + [name.title() for name in scenario_names]
+            col_widths = [40] + [12] * len(scenario_names)
 
             rows = []
             for test in target_tests:
-                row = [test[:40]]  # Truncate to 40 chars
-                for scenario in ["optimistic", "realistic", "doomer"]:
+                row = [test[:40]]
+                for scenario in scenario_names:
                     if scenario in scenarios and "results" in scenarios[scenario]:
                         n_required = scenarios[scenario]["results"]["first_achieved"][test]
                         if n_required > 0:
@@ -542,7 +551,7 @@ def _format_scenario_sample_size_short(self, scenarios: Dict, target_tests: List
                 rows.append(row)
 
             lines.append("\nUncorrected Sample Sizes:")
-            lines.append(self._table._create_table(headers, rows, [40, 12, 12, 12]))
+            lines.append(self._table._create_table(headers, rows, col_widths))
 
         lines.append(f"{'=' * 80}")
 
@@ -562,39 +571,33 @@ def _format_scenario_sample_size_long(
         # 1. Overall summary
         lines.append(self._format_scenario_sample_size_short(scenarios, target_tests, correction))
 
-        # 2. Recommendations
+        # 2. Recommendations — summarize max N per non-optimistic scenario
         lines.append(f"\n{'=' * 80}")
         lines.append("RECOMMENDATIONS")
         lines.append(f"{'=' * 80}")
 
-        # Calculate max required N across scenarios
-        max_n_realistic = max(
-            (scenarios.get("realistic", {}).get("results", {}).get("first_achieved", {}).get(test, 0) for test in target_tests),
-            default=0,
-        )
-        max_n_doomer = max(
-            (scenarios.get("doomer", {}).get("results", {}).get("first_achieved", {}).get(test, 0) for test in target_tests),
-            default=0,
-        )
-
-        max_tested = scenarios.get("realistic", {}).get("model", {}).get("sample_size_range", {}).get("to_size", 200)
-
-        if max_n_realistic > 0 and max_n_realistic <= max_tested:
-            lines.append(f"• For robust power under realistic conditions: N = {max_n_realistic}")
-        elif max_n_realistic <= 0:
-            lines.append(f"• For robust power under realistic conditions: N > {max_tested}")
-
-        if max_n_doomer > 0 and max_n_doomer <= max_tested:
-            lines.append(f"• For power under worst-case conditions: N = {max_n_doomer}")
-        elif max_n_doomer <= 0:
-            lines.append(f"• For power under worst-case conditions: N > {max_tested}")
-
-        # Check if any tests couldn't achieve power
-        unachievable = [
-            test for test in target_tests if scenarios.get("doomer", {}).get("results", {}).get("first_achieved", {}).get(test, -1) <= 0
-        ]
-        if unachievable:
-            lines.append(f"• Warning: These tests may not achieve target power under adverse conditions: {', '.join(unachievable)}")
+        other_scenarios = [s for s in scenarios if s != "optimistic"]
+        for scenario in other_scenarios:
+            max_n = max(
+                (scenarios.get(scenario, {}).get("results", {}).get("first_achieved", {}).get(test, 0) for test in target_tests),
+                default=0,
+            )
+            max_tested = scenarios.get(scenario, {}).get("model", {}).get("sample_size_range", {}).get("to_size", 200)
+            label = scenario.title()
+
+            if max_n > 0 and max_n <= max_tested:
+                lines.append(f"• For power under {label} conditions: N = {max_n}")
+            elif max_n <= 0:
+                lines.append(f"• For power under {label} conditions: N > {max_tested}")
+
+        # Check unachievable across worst scenario (last non-optimistic)
+        if other_scenarios:
+            worst = other_scenarios[-1]
+            unachievable = [
+                test for test in target_tests if scenarios.get(worst, {}).get("results", {}).get("first_achieved", {}).get(test, -1) <= 0
+            ]
+            if unachievable:
+                lines.append(f"• Warning: These tests may not achieve target power under {worst} conditions: {', '.join(unachievable)}")
 
         # Add cumulative probability analysis
         cumulative_lines = self._format_cumulative_recommendations(data, is_scenario=True)
@@ -706,7 +709,7 @@ def _add_cumulative_sample_size_table(
         # Filter out tests with NaN power (e.g. non-contrast tests under Tukey correction)
         def _has_nan_power(t: str) -> bool:
             vals = powers_by_test[t]
-            return bool(vals and isinstance(vals[0], float) and math.isnan(vals[0]))
+            return bool(vals and _is_nan(vals[0]))
 
         valid_tests = [t for t in target_tests if not _has_nan_power(t)]
         if not valid_tests:
@@ -741,8 +744,6 @@ def _has_nan_power(t: str) -> bool:
                 else:  # ≥k cases
                     # Approximate using independence assumption
                     prob_at_least_k = 0.0
-                    from itertools import combinations
-
                     # Sum over all ways to choose at least k tests
                     for num_sig in range(k, n_tests + 1):
                         for combo in combinations(range(n_tests), num_sig):
@@ -859,6 +860,12 @@ def _format_cumulative_recommendations(self, results: Dict, is_scenario: bool =
                 if prob >= target_power:
                     min_n_target = sample_sizes[i]
                     break
+
+            if min_n_target:
+                lines.append(f"• N={min_n_target} for {target_power:.0f}% chance all tests significant")
+            else:
+                max_tested = sample_sizes[-1]
+                lines.append(f"• >{max_tested} needed for {target_power:.0f}% chance all tests significant")
         return lines
 
 
diff --git a/mcpower/utils/parsers.py b/mcpower/utils/parsers.py
index e89d140..c14533f 100644
--- a/mcpower/utils/parsers.py
+++ b/mcpower/utils/parsers.py
@@ -105,6 +105,8 @@ def _split_assignments(self, input_string: str) -> List[str]:
                     paren_count += 1
                 elif char == ")":
                     paren_count -= 1
+                    if paren_count < 0:
+                        raise ValueError("Unbalanced parentheses: unexpected ')'")
                 current.append(char)
 
         if current:
@@ -424,7 +426,11 @@ def _parse_independent_variables(formula: str) -> Tuple[Dict, Dict]:
     """
     from itertools import combinations
 
-    terms = re.split(r"[+\-]", formula)
+    # Check for minus sign (term removal) which is not supported
+    if re.search(r"(?<!\()\s*-\s*(?!\d)", formula):
+        raise ValueError("Term removal with '-' is not supported in MCPower formulas. Use '+' to add predictors.")
+
+    terms = re.split(r"[+]", formula)
 
     variables = {}
     effects = {}
diff --git a/mcpower/utils/test_formula_utils.py b/mcpower/utils/test_formula_utils.py
new file mode 100644
index 0000000..8283a7b
--- /dev/null
+++ b/mcpower/utils/test_formula_utils.py
@@ -0,0 +1,147 @@
+"""Utilities for parsing and resolving test_formula parameters.
+
+Provides helpers that map a test formula string to registry effect names
+and column indices, enabling column-subsetting for model misspecification
+testing.
+"""
+
+import re
+from itertools import combinations
+from typing import Dict, List, Set, Tuple
+
+import numpy as np
+
+from .parsers import _parse_equation
+
+
+def _extract_test_formula_effects(
+    test_formula: str,
+    registry,
+) -> Tuple[List[str], List[Dict]]:
+    """Extract effect names from a test formula, matched against the registry.
+
+    Parses the test formula, expands factor variables to their dummies,
+    and returns the list of effect names (in registry order) that belong
+    to the test formula.
+
+    Args:
+        test_formula: Formula string (e.g. ``"y ~ x1 + x2"``).
+        registry: ``VariableRegistry`` instance.
+
+    Returns:
+        Tuple of ``(effect_names, random_effects)`` where *effect_names*
+        are the registry effect names present in the test formula (in
+        registry order), and *random_effects* is the list of parsed
+        random-effect dicts from the test formula.
+    """
+    _dep_var, fixed_formula, random_effects = _parse_equation(test_formula)
+
+    # Parse fixed effects into a set of term names
+    test_terms = _parse_fixed_terms(fixed_formula)
+
+    # Determine which registry effects belong to the test formula
+    cluster_effects = set(registry.cluster_effect_names)
+    test_effects: List[str] = []
+
+    for effect_name in registry._effects:
+        if effect_name in cluster_effects:
+            continue
+
+        effect = registry._effects[effect_name]
+
+        if effect.effect_type == "main":
+            # Direct match (continuous or interaction-less variable)
+            if effect_name in test_terms:
+                test_effects.append(effect_name)
+            elif effect_name in registry._factor_dummies:
+                # Factor dummy -- include if parent factor is in test terms
+                parent_factor = registry._factor_dummies[effect_name]["factor_name"]
+                if parent_factor in test_terms:
+                    test_effects.append(effect_name)
+        else:
+            # Interaction -- check if the interaction term is in test terms
+            if effect_name in test_terms:
+                test_effects.append(effect_name)
+
+    return test_effects, random_effects
+
+
+def _parse_fixed_terms(fixed_formula: str) -> Set[str]:
+    """Parse a fixed-effect formula string into a set of term names.
+
+    Handles ``+`` for additive terms, ``:`` for specific interactions,
+    and ``*`` for full factorial expansion (main effects plus all
+    two-way through n-way interactions).
+
+    Args:
+        fixed_formula: Right-hand side of the equation, spaces already
+            stripped by ``_parse_equation`` (e.g. ``"x1+x2+x1:x2"``).
+
+    Returns:
+        Set of term names (variable names and interaction terms like
+        ``"x1:x2"``).
+    """
+    if not fixed_formula.strip():
+        return set()
+
+    terms: Set[str] = set()
+    raw_terms = re.split(r"\+", fixed_formula)
+
+    for raw in raw_terms:
+        raw = raw.strip()
+        if not raw:
+            continue
+
+        if "*" in raw:
+            # Full factorial: x1*x2 -> x1, x2, x1:x2
+            vars_in_star = [v.strip() for v in raw.split("*") if v.strip()]
+            for v in vars_in_star:
+                terms.add(v)
+            for r in range(2, len(vars_in_star) + 1):
+                for combo in combinations(vars_in_star, r):
+                    terms.add(":".join(combo))
+        else:
+            # Plain term (may contain ":" for explicit interaction)
+            terms.add(raw)
+
+    return terms
+
+
+def _compute_test_column_indices(
+    all_effect_names: List[str],
+    test_effect_names: List[str],
+) -> np.ndarray:
+    """Compute column indices in X_expanded for test formula effects.
+
+    Args:
+        all_effect_names: All non-cluster effect names in registry order.
+        test_effect_names: Effect names present in the test formula
+            (a subset of *all_effect_names*).
+
+    Returns:
+        Integer array of column indices into X_expanded.
+    """
+    test_set = set(test_effect_names)
+    indices = [i for i, name in enumerate(all_effect_names) if name in test_set]
+    return np.array(indices, dtype=np.int64)
+
+
+def _remap_target_indices(
+    original_target_indices: np.ndarray,
+    test_column_indices: np.ndarray,
+) -> np.ndarray:
+    """Remap target indices from full X_expanded space to X_test space.
+
+    Args:
+        original_target_indices: Indices in X_expanded being tested.
+        test_column_indices: Columns of X_expanded included in X_test.
+
+    Returns:
+        Indices remapped to positions within X_test.
+    """
+    # Build mapping: full_index -> position in X_test
+    index_map = {int(full_idx): test_idx for test_idx, full_idx in enumerate(test_column_indices)}
+    return np.array(
+        [index_map[int(idx)] for idx in original_target_indices],
+        dtype=np.int64,
+    )
diff --git a/mcpower/utils/updates.py b/mcpower/utils/updates.py
index 8c3c76b..c7f57a5 100644
--- a/mcpower/utils/updates.py
+++ b/mcpower/utils/updates.py
@@ -12,6 +12,8 @@
 from datetime import datetime, timedelta
 from pathlib import Path
 
+_already_checked = False
+
 
 def _check_for_updates(current_version):
     """Check PyPI weekly for a newer MCPower version and warn if found.
@@ -20,18 +22,24 @@ def _check_for_updates(current_version):
     silently in worker processes (detected via environment variable)
     and in frozen (PyInstaller) bundles where pip is unavailable.
     """
+    global _already_checked
 
     # Skip in frozen bundles (PyInstaller) — the GUI has its own update checker
     if getattr(sys, "frozen", False):
         return
 
+    # Skip if already checked in this process
+    if _already_checked:
+        return
+
     # Skip in worker processes (loky/joblib inherit env vars from parent)
     if os.environ.get("_MCPOWER_UPDATE_CHECKED"):
         return
     os.environ["_MCPOWER_UPDATE_CHECKED"] = "1"
+    _already_checked = True
 
-    cache_path = Path(__file__).parent.parent / ".mcpower_cache.json"
-    cache_path.parent.mkdir(exist_ok=True)
+    cache_path = Path.home() / ".cache" / "mcpower" / "update_cache.json"
+    cache_path.parent.mkdir(parents=True, exist_ok=True)
 
     # Load cache
     cache = {}
@@ -57,9 +65,8 @@ def _check_for_updates(current_version):
 
     # Show update message only when PyPI version is strictly newer
     latest = cache.get("latest_version")
-    current = cache.get("current_version")
-    if latest and current and _is_newer(latest, current):
-        msg = f"\nNEW MCPower VERSION AVAILABLE: {latest} (you have {current})\nUpdate now: pip install --upgrade MCPower\n"
+    if latest and _is_newer(latest, current_version):
+        msg = f"\nNEW MCPower VERSION AVAILABLE: {latest} (you have {current_version})\nUpdate now: pip install --upgrade MCPower\n"
         warnings.warn(msg, stacklevel=3)
 
 
@@ -77,7 +84,10 @@ def _get_latest_version():
     """Fetch the latest MCPower version string from the PyPI JSON API."""
     try:
         with urllib.request.urlopen("https://pypi.org/pypi/MCPower/json", timeout=5) as response:
-            data = json.loads(response.read())
+            raw = response.read(1_000_000)
+            if len(raw) >= 1_000_000:
+                return None
+            data = json.loads(raw)
             return data["info"]["version"]
     except Exception:
         return None
diff --git a/mcpower/utils/validators.py b/mcpower/utils/validators.py
index 5853af6..1c344fd 100644
--- a/mcpower/utils/validators.py
+++ b/mcpower/utils/validators.py
@@ -27,6 +27,11 @@ class _ValidationResult:
     errors: List[str]
     warnings: List[str]
 
+    @classmethod
+    def from_errors(cls, errors: List[str], warnings: Optional[List[str]] = None) -> "_ValidationResult":
+        """Create a result from error/warning lists, deriving ``is_valid`` automatically."""
+        return cls(len(errors) == 0, errors, warnings or [])
+
     def raise_if_invalid(self):
         """Raise ``ValueError`` if the validation failed."""
         if not self.is_valid:
@@ -88,12 +93,12 @@ def _validate_numeric_parameter(
         errors.append(range_error)
 
     # Rounding warning for floats when int expected
-    if allow_rounding and isinstance(value, float) and (int, float) in expected_types:
+    if allow_rounding and isinstance(value, float) and int in expected_types:
         rounded = int(round(value))
         if value != rounded:
             warnings.append(f"{name} rounded from {value} to {rounded}")
 
-    return _ValidationResult(len(errors) == 0, errors, warnings)
+    return _ValidationResult.from_errors(errors, warnings)
 
 
 def _validate_power(power: Any) -> _ValidationResult:
@@ -112,6 +117,8 @@ def _validate_simulations(n_simulations: Any) -> Tuple[int, _ValidationResult]:
 
     if result.is_valid:
         rounded = int(round(n_simulations))
+        # 800 simulations threshold: below this, Monte Carlo standard error
+        # exceeds ~1.5% for power near 50%, reducing result reliability.
         if rounded < 800:
             result.warnings.append(f"Low simulation count ({rounded}). Consider using at least 1000 for reliable results.")
         return rounded, result
@@ -139,7 +146,7 @@ def _validate_sample_size(sample_size: Any) -> _ValidationResult:
             f"sample_size too large ({sample_size:,}). Maximum recommended: 100,000. We cannot guarantee stability for such small p-values."
         )
 
-    return _ValidationResult(len(errors) == 0, errors, [])
+    return _ValidationResult.from_errors(errors)
 
 
 def _validate_sample_size_for_model(sample_size: int, n_variables: int) -> _ValidationResult:
@@ -157,6 +164,8 @@ def _validate_sample_size_for_model(sample_size: int, n_variables: int) -> _Vali
         _ValidationResult with errors if sample size is insufficient.
     """
     errors = []
+    # Green's rule of thumb: N >= 15 + p for adequate power in regression,
+    # where p is the number of predictors (design matrix columns).
     min_required = 15 + n_variables
 
     if sample_size < min_required:
@@ -165,7 +174,7 @@ def _validate_sample_size_for_model(sample_size: int, n_variables: int) -> _Vali
             f"variables. Minimum required: {min_required} (15 + {n_variables} variables)."
         )
 
-    return _ValidationResult(len(errors) == 0, errors, [])
+    return _ValidationResult.from_errors(errors)
 
 
 def _validate_sample_size_range(from_size: Any, to_size: Any, by: Any) -> _ValidationResult:
@@ -193,7 +202,7 @@ def _validate_sample_size_range(from_size: Any, to_size: Any, by: Any) -> _Valid
     if n_tests > 100:
         warnings.append(f"Large number of sample sizes to test ({n_tests}). This may take significant time.")
 
-    return _ValidationResult(len(errors) == 0, errors, warnings)
+    return _ValidationResult.from_errors(errors, warnings)
 
 
 def _validate_correlation_matrix(
@@ -226,12 +235,14 @@ def _validate_correlation_matrix(
     # Positive semi-definite check
     try:
         eigenvals = np.linalg.eigvals(corr_matrix)
+        # -1e-8 tolerance for positive semi-definiteness: allows small negative
+        # eigenvalues from floating-point rounding in correlation matrices.
         if np.any(eigenvals < -1e-8):  # Tolerance for floating point noise
             errors.append("Correlation matrix must be positive semi-definite. ")
     except np.linalg.LinAlgError:
         errors.append("Cannot compute eigenvalues of correlation matrix")
 
-    return _ValidationResult(len(errors) == 0, errors, [])
+    return _ValidationResult.from_errors(errors)
 
 
 def _validate_correction_method(correction: Optional[str]) -> _ValidationResult:
@@ -285,7 +296,7 @@ def _validate_parallel_settings(enable: Any, n_cores: Optional[int]) -> Tuple[Tu
         else:
             validated_n_cores = min(n_cores, max_cores)
 
-    return (enable, validated_n_cores), _ValidationResult(len(errors) == 0, errors, [])
+    return (enable, validated_n_cores), _ValidationResult.from_errors(errors)
 
 
 def _validate_model_ready(model) -> _ValidationResult:
@@ -301,9 +312,10 @@ def _validate_model_ready(model) -> _ValidationResult:
     errors: List[str] = []
     warnings: List[str] = []
 
-    # Check effect sizes - check if pending effects were set
-    has_effects = hasattr(model, "_pending_effects") and model._pending_effects is not None
-    if not has_effects:
+    # Check effect sizes — pending (pre-apply) or flagged as set by user
+    has_pending = hasattr(model, "_pending_effects") and model._pending_effects is not None
+    has_set = hasattr(model, "_effects_set") and model._effects_set
+    if not has_pending and not has_set:
         if hasattr(model, "_registry"):
             available = model._registry.effect_names
             errors.append(
@@ -318,7 +330,7 @@ def _validate_model_ready(model) -> _ValidationResult:
         if not hasattr(model, attr):
             errors.append(f"Model missing required attribute: {attr}")
 
-    return _ValidationResult(len(errors) == 0, errors, warnings)
+    return _ValidationResult.from_errors(errors, warnings)
 
 
 def _validate_test_formula(test_formula: str, available_variables: List[str]) -> _ValidationResult:
@@ -361,7 +373,7 @@ def _validate_test_formula(test_formula: str, available_variables: List[str]) ->
                 f"Variables not found in original model: {', '.join(sorted(missing_vars))}. Available: {', '.join(available_variables)}"
             )
 
-        return _ValidationResult(len(errors) == 0, errors, [])
+        return _ValidationResult.from_errors(errors)
 
     except Exception as e:
         errors.append(f"Error parsing test_formula: {str(e)}")
@@ -399,6 +411,8 @@ def _validate_factor_specification(n_levels: int, proportions: List[float]) -> _
         # Check if they sum to approximately 1
         if not errors:  # Only if no errors with individual proportions
             total = sum(proportions)
+            # 1e-6 tolerance: proportions are normalized later, so small deviations
+            # from 1.0 are acceptable and only warrant a warning.
             if abs(total - 1.0) > 1e-6:
                 warnings.append(f"Proportions sum to {total:.4f}, not 1.0 (will be normalized)")
 
@@ -406,7 +420,7 @@ def _validate_factor_specification(n_levels: int, proportions: List[float]) -> _
     if n_levels > 10:
         warnings.append(f"Factor has {n_levels} levels. This creates {n_levels - 1} dummy variables, which may require large sample sizes")
 
-    return _ValidationResult(len(errors) == 0, errors, warnings)
+    return _ValidationResult.from_errors(errors, warnings)
 
 
 def _validate_upload_data(data: np.ndarray) -> _ValidationResult:
@@ -425,7 +439,7 @@ def _validate_upload_data(data: np.ndarray) -> _ValidationResult:
     if data.shape[0] < 25:
         errors.append(f"Need at least 25 samples for reliable quantile matching, got {data.shape[0]}")
 
-    return _ValidationResult(len(errors) == 0, errors, [])
+    return _ValidationResult.from_errors(errors)
 
 
 def _validate_cluster_config(
@@ -475,7 +489,7 @@ def _validate_cluster_config(
             if not isinstance(cluster_size, int) or cluster_size < 5:
                 errors.append(f"cluster_size must be an integer >= 5 for reliable mixed model estimation. Got {cluster_size}.")
 
-    return _ValidationResult(len(errors) == 0, errors, warnings)
+    return _ValidationResult.from_errors(errors, warnings)
 
 
 def _validate_cluster_sample_size(
@@ -517,4 +531,4 @@ def _validate_cluster_sample_size(
             f"Small cluster sizes may cause convergence issues or biased variance estimates."
         )
 
-    return _ValidationResult(len(errors) == 0, errors, warnings)
+    return _ValidationResult.from_errors(errors, warnings)
diff --git a/mcpower/utils/visualization.py b/mcpower/utils/visualization.py
index 22544a2..ab7eadd 100644
--- a/mcpower/utils/visualization.py
+++ b/mcpower/utils/visualization.py
@@ -18,6 +18,7 @@ def _create_power_plot(
     target_tests: List[str],
     target_power: float,
     title: str,
+    show: bool = True,
 ):
     """Create a sample-size vs. power line plot with achievement markers.
 
@@ -58,7 +59,7 @@ def _create_power_plot(
         )
 
         # Mark achievement point
-        if first_achieved[test] > 0:
+        if first_achieved[test] > 0 and first_achieved[test] in sample_sizes:
             achieved_idx = sample_sizes.index(first_achieved[test])
             achieved_power = powers[achieved_idx]
             ax.plot(
@@ -112,4 +113,6 @@ def _create_power_plot(
         color="#888888",
     )
     plt.tight_layout(rect=(0, 0.03, 1, 1))
-    plt.show()
+    if show:
+        plt.show()
+    return fig
diff --git a/pyproject.toml b/pyproject.toml
index 983e3d3..ec1f41f 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,14 +1,14 @@
 [build-system]
 requires = [
-    "scikit-build-core>=0.5",
+    "scikit-build-core>=0.10",
     "pybind11>=2.11",
-    "numpy>=2.0.0",
+    "numpy>=1.26.0",
 ]
 build-backend = "scikit_build_core.build"
 
 [project]
 name = "MCPower"
-version = "0.5.4"
+version = "0.6.0"
 description = "Monte Carlo Power Analysis for Statistical Models"
 readme = "README.md"
 license = {text = "GPL-3.0-or-later"}
@@ -31,9 +31,10 @@ classifiers = [
 ]
 requires-python = ">=3.10"
 dependencies = [
-    "numpy>=2.0.0",
+    "numpy>=1.26.0",
     "matplotlib>=3.8.0",
     "joblib>=1.3.0",
+    "tqdm>=4.60.0",
 ]
 
 [project.optional-dependencies]
@@ -41,6 +42,7 @@ lme = ["statsmodels>=0.14.0"]
 pandas = ["pandas>=2.0.0"]
 dev = [
     "pandas>=2.0.0",
+    "statsmodels>=0.14.0",
     "pytest>=7.0.0",
     "pytest-cov>=4.0.0",
     "scipy>=1.11.0",
@@ -52,14 +54,14 @@ dev = [
 ]
 all = [
     "pandas>=2.0.0",
-    "statsmodels>=0.14.0",
-]
+    ]
 
 [project.urls]
 Homepage = "https://github.com/pawlenartowicz/MCPower"
-Documentation = "https://github.com/pawlenartowicz/MCPower#readme"
+Documentation = "https://github.com/pawlenartowicz/MCPower/wiki"
 Repository = "https://github.com/pawlenartowicz/MCPower"
 Issues = "https://github.com/pawlenartowicz/MCPower/issues"
+Changelog = "https://github.com/pawlenartowicz/MCPower/blob/main/CHANGELOG.md"
 
 [tool.scikit-build]
 wheel.packages = ["mcpower"]
@@ -77,8 +79,6 @@ python_files = ["test_*.py"]
 python_classes = ["Test*"]
 python_functions = ["test_*"]
 markers = [
-    "unit: Unit tests",
-    "integration: Integration tests",
     "lme: LME mixed-effects model tests",
 ]
 addopts = "-v --tb=short --strict-markers"
@@ -86,7 +86,6 @@ filterwarnings = [
     "ignore::FutureWarning",
     "ignore::DeprecationWarning",
     "ignore::UserWarning:statsmodels",
-    "ignore:Mixed-effects models are experimental:UserWarning",
 ]
 
 [tool.ruff]
@@ -107,6 +106,20 @@ known-first-party = ["mcpower"]
 python_version = "3.10"
 warn_return_any = true
 warn_unused_configs = true
-ignore_missing_imports = true
 check_untyped_defs = true
 exclude = ["build", "dist", "tests"]
+
+[[tool.mypy.overrides]]
+module = [
+    "mcpower_native",
+    "mcpower_native.*",
+    "statsmodels",
+    "statsmodels.*",
+    "tqdm",
+    "tqdm.*",
+    "joblib",
+    "joblib.*",
+    "pandas",
+    "pandas.*",
+]
+ignore_missing_imports = true
diff --git a/tests/config.py b/tests/config.py
index 63c1999..a8b874c 100644
--- a/tests/config.py
+++ b/tests/config.py
@@ -5,9 +5,21 @@
 across the test suite.
 """
 
-# Monte Carlo simulation parameters
-N_SIMS = 5000
-"""Number of Monte Carlo simulations for power analysis tests."""
+# Monte Carlo simulation parameters — 4-tier ladder
+N_SIMS_CHECK = 50
+"""Smoke tests — just verify no crash, structure, API contract."""
+
+N_SIMS_ORDERING = 1000
+"""Ordering tests — monotonicity, correction hierarchy, A < B checks."""
+
+N_SIMS_STANDARD = 1600
+"""Standard tests — null calibration, Type I error, general validation."""
+
+N_SIMS_ACCURACY = 5000
+"""Accuracy tests — comparison against analytical power formulas."""
+
+N_SIMS = N_SIMS_ACCURACY
+"""Backward-compat alias for accuracy-level simulations."""
 
 SEED = 2137
 """Default random seed for reproducibility."""
diff --git a/tests/conftest.py b/tests/conftest.py
index 93c8814..d2100b5 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -63,12 +63,6 @@ def correlation_matrix_2x2():
     return np.array([[1.0, 0.5], [0.5, 1.0]])
 
 
-@pytest.fixture
-def correlation_matrix_3x3():
-    """Create a 3x3 correlation matrix."""
-    return np.array([[1.0, 0.3, 0.2], [0.3, 1.0, 0.4], [0.2, 0.4, 1.0]])
-
-
 @pytest.fixture
 def sample_data():
     """Create sample empirical data."""
@@ -80,41 +74,13 @@ def sample_data():
 
 
 @pytest.fixture
-def suppress_output(capsys):
-    """Suppress print output during tests by capturing it."""
-    yield
-    # Output is automatically captured by capsys
-
-
-BACKENDS = ["c++"]
-
-
-@pytest.fixture(params=BACKENDS)
-def backend(request):
-    """
-    Force MCPower to run on a specific backend.
-
-    Parametrizes tests against C++ (primary backend).
-    Automatically resets backend after each test.
-    """
-    from mcpower.backends import reset_backend, set_backend
-
-    set_backend(request.param)
-    yield request.param
-    reset_backend()
-
-
-@pytest.fixture(autouse=True)
-def reset_backend_after_test():
-    """
-    Automatically reset backend to default after every test.
-
-    Ensures no hidden backend state leaks between tests.
-    """
-    yield
-    from mcpower.backends import reset_backend
+def suppress_output():
+    """Suppress print output during tests."""
+    import contextlib
+    import io
 
-    reset_backend()
+    with contextlib.redirect_stdout(io.StringIO()):
+        yield
 
 
 def _statsmodels_available():
diff --git a/tests/helpers/power_helpers.py b/tests/helpers/power_helpers.py
index d509a79..620e995 100644
--- a/tests/helpers/power_helpers.py
+++ b/tests/helpers/power_helpers.py
@@ -40,42 +40,3 @@ def compute_crits(X, target_indices, alpha=DEFAULT_ALPHA, correction_method=0):
     return compute_critical_values(alpha, p, dof, n_targets, correction_method)
 
 
-def run_with_backend(
-    backend_name,
-    equation,
-    effects_str,
-    sample_size,
-    n_sims,
-    seed,
-    target_test="all",
-    correction=None,
-    correlations_str=None,
-    alpha=DEFAULT_ALPHA,
-):
-    """Run a full MCPower power analysis with a specific backend forced."""
-    import contextlib
-    import io
-
-    from mcpower import MCPower
-    from mcpower.backends import reset_backend, set_backend
-
-    set_backend(backend_name)
-    try:
-        m = MCPower(equation)
-        m.set_simulations(n_sims)
-        m.set_seed(seed)
-        m.set_alpha(alpha)
-        m.set_effects(effects_str)
-        if correlations_str:
-            m.set_correlations(correlations_str)
-        with contextlib.redirect_stdout(io.StringIO()):
-            result = m.find_power(
-                sample_size=sample_size,
-                target_test=target_test,
-                correction=correction,
-                print_results=False,
-                return_results=True,
-            )
-    finally:
-        reset_backend()
-    return result
diff --git a/tests/integration/test_find_power_api.py b/tests/integration/test_find_power_api.py
index 69f6bc5..fe56de3 100644
--- a/tests/integration/test_find_power_api.py
+++ b/tests/integration/test_find_power_api.py
@@ -222,14 +222,14 @@ def test_all_targets(self, suppress_output):
 
 
 class TestHeterogeneity:
-    """Test heterogeneity settings."""
+    """Test heterogeneity via scenario configs."""
 
     def test_with_heterogeneity(self, suppress_output):
         from mcpower import MCPower
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
-        model.set_heterogeneity(0.1)
+        model.set_scenario_configs({"het": {"heterogeneity": 0.1}})
         result = model.find_power(100, print_results=False, return_results=True)
 
         assert result is not None
@@ -239,7 +239,7 @@ def test_with_heteroskedasticity(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
-        model.set_heteroskedasticity(0.2)
+        model.set_scenario_configs({"hsked": {"heteroskedasticity": 0.2}})
         result = model.find_power(100, print_results=False, return_results=True)
 
         assert result is not None
@@ -249,8 +249,7 @@ def test_combined(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
-        model.set_heterogeneity(0.1)
-        model.set_heteroskedasticity(0.2)
+        model.set_scenario_configs({"combo": {"heterogeneity": 0.1, "heteroskedasticity": 0.2}})
         result = model.find_power(100, print_results=False, return_results=True)
 
         assert result is not None
@@ -330,7 +329,7 @@ def test_all_features_combined(self, suppress_output):
         model.upload_data({"x1": np.random.exponential(2, 100)})
         model.set_correlations("(x1,x2)=0.3")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2, x2=0.15, x1:x2=0.1")
-        model.set_heterogeneity(0.05)
+        model.set_scenario_configs({"test": {"heterogeneity": 0.05}})
         result = model.find_power(200, print_results=False, return_results=True)
 
         assert result is not None
diff --git a/tests/integration/test_model.py b/tests/integration/test_model.py
index da00d33..a57b051 100644
--- a/tests/integration/test_model.py
+++ b/tests/integration/test_model.py
@@ -86,16 +86,6 @@ def test_set_variable_type(self, suppress_output):
         assert model._pending_variable_types == "group=(factor,3)"
         assert model._applied is False
 
-    def test_set_heterogeneity(self, simple_model):
-        simple_model.set_heterogeneity(0.1)
-        assert simple_model._pending_heterogeneity == 0.1
-        assert simple_model._applied is False
-
-    def test_set_heteroskedasticity(self, simple_model):
-        simple_model.set_heteroskedasticity(0.2)
-        assert simple_model._pending_heteroskedasticity == 0.2
-        assert simple_model._applied is False
-
     def test_upload_data_dict(self, simple_model, sample_data):
         simple_model.upload_data(sample_data)
         assert simple_model._pending_data is not None
@@ -131,12 +121,12 @@ class TestApply:
     """Test apply() method."""
 
     def test_apply_sets_flag(self, configured_model):
-        configured_model.apply()
+        configured_model._apply()
         assert configured_model._applied is True
 
     def test_apply_processes_effects(self, simple_model):
         simple_model.set_effects("x1=0.5, x2=0.3")
-        simple_model.apply()
+        simple_model._apply()
         effect_sizes = simple_model._registry.get_effect_sizes()
         assert effect_sizes[0] == 0.5
         assert effect_sizes[1] == 0.3
@@ -147,22 +137,17 @@ def test_apply_processes_variable_types(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
         assert len(model._registry.factor_names) == 1
         assert len(model._registry.dummy_names) == 2
 
     def test_apply_processes_correlations(self, simple_model):
         simple_model.set_effects("x1=0.3, x2=0.2")
         simple_model.set_correlations("(x1,x2)=0.5")
-        simple_model.apply()
+        simple_model._apply()
         corr = simple_model.correlation_matrix
         assert corr[0, 1] == 0.5
 
-    def test_apply_processes_heterogeneity(self, configured_model):
-        configured_model.set_heterogeneity(0.15)
-        configured_model.apply()
-        assert configured_model.heterogeneity == 0.15
-
     def test_apply_order_independence(self, suppress_output):
         """Test that set_* methods can be called in any order."""
         from mcpower import MCPower
@@ -172,14 +157,14 @@ def test_apply_order_independence(self, suppress_output):
         m1.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2, x2=0.1")
         m1.set_variable_type("group=(factor,3)")
         m1.set_correlations("(x1,x2)=0.5")
-        m1.apply()
+        m1._apply()
 
         # Order 2: variable_type, correlations, effects
         m2 = MCPower("y = group + x1 + x2")
         m2.set_variable_type("group=(factor,3)")
         m2.set_correlations("(x1,x2)=0.5")
         m2.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2, x2=0.1")
-        m2.apply()
+        m2._apply()
 
         # Both should have same effect sizes
         assert np.allclose(m1._registry.get_effect_sizes(), m2._registry.get_effect_sizes())
@@ -242,7 +227,7 @@ def test_sample_sizes_tested(self, configured_model):
         assert result["results"]["sample_sizes_tested"] == [50, 75, 100]
 
     def test_first_achieved(self, configured_model):
-        result = configured_model.find_sample_size(from_size=50, to_size=200, by=25, print_results=False, return_results=True)
+        result = configured_model.find_sample_size(from_size=50, to_size=200, by=50, print_results=False, return_results=True)
         assert "first_achieved" in result["results"]
 
     def test_find_sample_size_runs(self, configured_model):
@@ -257,7 +242,7 @@ class TestErrors:
     def test_invalid_effect_name(self, simple_model):
         simple_model.set_effects("invalid=0.3")
         with pytest.raises(ValueError, match="not found"):
-            simple_model.apply()
+            simple_model._apply()
 
     def test_missing_effects(self, simple_model):
         with pytest.raises(ValueError, match="Effect sizes must be set"):
@@ -290,7 +275,7 @@ def test_basic_named_levels(self):
         model = MCPower("y = treatment + x1")
         model.set_factor_levels("treatment=placebo,drug_a,drug_b")
         model.set_effects("treatment[drug_a]=0.5, treatment[drug_b]=0.8, x1=0.3")
-        model.apply()
+        model._apply()
         assert "treatment" in model._registry.factor_names
         assert "treatment[drug_a]" in model._registry.dummy_names
         assert "treatment[drug_b]" in model._registry.dummy_names
@@ -302,7 +287,7 @@ def test_multiple_factors(self):
         model = MCPower("y = group + dose")
         model.set_factor_levels("group=control,treatment; dose=low,medium,high")
         model.set_effects("group[treatment]=0.5, dose[medium]=0.3, dose[high]=0.6")
-        model.apply()
+        model._apply()
         assert "group[treatment]" in model._registry.dummy_names
         assert "dose[medium]" in model._registry.dummy_names
         assert "dose[high]" in model._registry.dummy_names
@@ -313,7 +298,7 @@ def test_unknown_variable_raises(self):
         model = MCPower("y = x1")
         with pytest.raises(ValueError, match="not found"):
             model.set_factor_levels("unknown=a,b,c")
-            model.apply()
+            model._apply()
 
     def test_single_level_raises(self):
         from mcpower import MCPower
@@ -321,7 +306,7 @@ def test_single_level_raises(self):
         model = MCPower("y = x1")
         with pytest.raises(ValueError, match="at least 2"):
             model.set_factor_levels("x1=only_one")
-            model.apply()
+            model._apply()
 
     def test_find_power_with_named_levels(self):
         """End-to-end: find_power works with set_factor_levels."""
diff --git a/tests/integration/test_parallel.py b/tests/integration/test_parallel.py
index 682372c..2048d7e 100644
--- a/tests/integration/test_parallel.py
+++ b/tests/integration/test_parallel.py
@@ -4,6 +4,8 @@
 
 import pytest
 
+from tests.config import N_SIMS_CHECK
+
 
 def _joblib_available():
     """Check if joblib is available."""
@@ -26,6 +28,7 @@ def test_parallel_results_match_sequential(self, suppress_output):
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
         model.set_seed(42)
+        model.set_simulations(N_SIMS_CHECK)
 
         # Run sequential analysis
         model.set_parallel(False)
@@ -56,6 +59,7 @@ def test_parallel_with_scenarios(self, suppress_output):
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
         model.set_seed(42)
+        model.set_simulations(N_SIMS_CHECK)
 
         # Run sequential with scenarios
         model.set_parallel(False)
@@ -93,6 +97,7 @@ def test_parallel_with_interactions(self, suppress_output):
         model = MCPower("y = a + b + a:b")
         model.set_effects("a=0.4, b=0.3, a:b=0.2")
         model.set_seed(42)
+        model.set_simulations(N_SIMS_CHECK)
 
         # Run sequential
         model.set_parallel(False)
@@ -128,6 +133,7 @@ def test_parallel_fallback_on_failure(self, suppress_output, monkeypatch):
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
         model.set_seed(42)
+        model.set_simulations(N_SIMS_CHECK)
         model.set_parallel(True, n_cores=2)
 
         # Mock joblib.Parallel to raise an exception
@@ -157,6 +163,7 @@ def test_find_power_ignores_parallel(self, suppress_output):
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.3, x2=0.2")
         model.set_seed(42)
+        model.set_simulations(N_SIMS_CHECK)
 
         # Run with parallel=False
         model.set_parallel(False)
diff --git a/tests/integration/test_posthoc_integration.py b/tests/integration/test_posthoc_integration.py
index 9499388..b6b7f21 100644
--- a/tests/integration/test_posthoc_integration.py
+++ b/tests/integration/test_posthoc_integration.py
@@ -15,7 +15,7 @@ def test_parse_vs_syntax(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("group[1] vs group[2]")
         assert "group[1] vs group[2]" in tests
@@ -27,7 +27,7 @@ def test_parse_multiple_vs(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("group[1] vs group[2], group[2] vs group[3]")
         assert "group[1] vs group[2]" in tests
@@ -40,7 +40,7 @@ def test_parse_mixed_regular_and_posthoc(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("overall, group[1] vs group[2]")
         assert "overall" in tests
@@ -52,7 +52,7 @@ def test_all_does_not_include_posthoc(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all")
         # "all" should NOT include any post-hoc comparisons
@@ -66,7 +66,7 @@ def test_invalid_factor_name(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="Factor.*not found"):
             model._parse_target_tests("notafactor[1] vs notafactor[2]")
@@ -77,7 +77,7 @@ def test_invalid_level(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="out of range"):
             model._parse_target_tests("group[0] vs group[5]")
@@ -88,7 +88,7 @@ def test_same_level_comparison_rejected(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="Cannot compare a level to itself"):
             model._parse_target_tests("group[2] vs group[2]")
@@ -99,7 +99,7 @@ def test_cross_factor_comparison_rejected(self, suppress_output):
         model = MCPower("y = a + b")
         model.set_variable_type("a=(factor,3), b=(factor,2)")
         model.set_effects("a[2]=0.3, a[3]=0.2, b[2]=0.1")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="same factor"):
             model._parse_target_tests("a[1] vs b[1]")
@@ -396,7 +396,7 @@ def test_all_posthoc_keyword(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all-posthoc")
         # 3-level factor → C(3,2) = 3 pairs
@@ -415,7 +415,7 @@ def test_all_plus_all_posthoc(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all, all-posthoc")
         # "all" → overall + group[2] + group[3] + x1 = 4
@@ -434,7 +434,7 @@ def test_all_posthoc_multiple_factors(self, suppress_output):
         model = MCPower("y = a + b")
         model.set_variable_type("a=(factor,3), b=(factor,2)")
         model.set_effects("a[2]=0.3, a[3]=0.2, b[2]=0.1")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all-posthoc")
         # a: C(3,2)=3, b: C(2,2)=1 → 4 total
@@ -448,7 +448,7 @@ def test_all_posthoc_no_factors_with_all(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.5, x2=0.3")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all, all-posthoc")
         assert "overall" in tests
@@ -461,7 +461,7 @@ def test_all_posthoc_alone_no_factors_raises(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.5, x2=0.3")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="no factor variables"):
             model._parse_target_tests("all-posthoc")
@@ -472,7 +472,7 @@ def test_exclusion_removes_test(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.5, x2=0.3")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all, -overall")
         assert "overall" not in tests
@@ -486,7 +486,7 @@ def test_exclusion_posthoc(self, suppress_output):
         model = MCPower("y = group + x1")
         model.set_variable_type("group=(factor,3)")
         model.set_effects("group[2]=0.4, group[3]=0.3, x1=0.2")
-        model.apply()
+        model._apply()
 
         tests = model._parse_target_tests("all-posthoc, -group[1] vs group[2]")
         assert "group[1] vs group[2]" not in tests
@@ -500,7 +500,7 @@ def test_exclusion_invalid_raises(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.5, x2=0.3")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="does not match"):
             model._parse_target_tests("all, -nonexistent")
@@ -511,7 +511,7 @@ def test_exclusion_all_raises(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.5, x2=0.3")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="nothing left"):
             model._parse_target_tests("all, -overall, -x1, -x2")
@@ -522,7 +522,7 @@ def test_duplicate_raises(self, suppress_output):
 
         model = MCPower("y = x1 + x2")
         model.set_effects("x1=0.5, x2=0.3")
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError, match="Duplicate"):
             model._parse_target_tests("all, x1")
diff --git a/tests/integration/test_scenarios.py b/tests/integration/test_scenarios.py
index 5172fb3..b0339cc 100644
--- a/tests/integration/test_scenarios.py
+++ b/tests/integration/test_scenarios.py
@@ -5,8 +5,10 @@
 from unittest.mock import MagicMock
 
 import numpy as np
+import pytest
 
 from mcpower.core.scenarios import (
+    DEFAULT_SCENARIO_CONFIG,
     ScenarioRunner,
     apply_per_simulation_perturbations,
 )
@@ -82,6 +84,152 @@ def test_create_scenario_plots_early_return(self):
         runner._create_scenario_plots({"scenarios": {"optimistic": {}}})
 
 
+class TestSetScenarioConfigs:
+    """Test set_scenario_configs() merge behavior and KeyError prevention."""
+
+    # All keys that must exist in every scenario config
+    ALL_KEYS = sorted(DEFAULT_SCENARIO_CONFIG["optimistic"].keys())
+
+    def _make_model(self):
+        from mcpower import MCPower
+
+        m = MCPower("y = x1 + x2")
+        m.set_effects("x1=0.3, x2=0.2")
+        return m
+
+    # ── Merge semantics ──────────────────────────────────────────
+
+    def test_custom_scenario_inherits_all_optimistic_keys(self):
+        """New custom scenario with one key still has every required key."""
+        m = self._make_model()
+        m.set_scenario_configs({"extreme": {"heterogeneity": 0.6}})
+        cfg = m._scenario_configs["extreme"]
+        missing = set(self.ALL_KEYS) - set(cfg.keys())
+        assert not missing, f"Missing keys: {missing}"
+
+    def test_custom_scenario_overrides_value(self):
+        """Provided key overrides the optimistic default."""
+        m = self._make_model()
+        m.set_scenario_configs({"extreme": {"heterogeneity": 0.6}})
+        assert m._scenario_configs["extreme"]["heterogeneity"] == 0.6
+
+    def test_custom_scenario_non_overridden_keys_are_optimistic(self):
+        """Non-overridden keys equal the optimistic baseline."""
+        m = self._make_model()
+        m.set_scenario_configs({"extreme": {"heterogeneity": 0.6}})
+        opt = DEFAULT_SCENARIO_CONFIG["optimistic"]
+        cfg = m._scenario_configs["extreme"]
+        for key in self.ALL_KEYS:
+            if key != "heterogeneity":
+                assert cfg[key] == opt[key], f"Key {key}: {cfg[key]} != {opt[key]}"
+
+    def test_existing_scenario_update_preserves_other_keys(self):
+        """Updating one key on 'realistic' keeps the rest intact."""
+        m = self._make_model()
+        m.set_scenario_configs({"realistic": {"heterogeneity": 0.99}})
+        cfg = m._scenario_configs["realistic"]
+        assert cfg["heterogeneity"] == 0.99
+        # Other keys should match original realistic defaults
+        assert cfg["correlation_noise_sd"] == DEFAULT_SCENARIO_CONFIG["realistic"]["correlation_noise_sd"]
+
+    def test_defaults_still_present_after_adding_custom(self):
+        """Adding a custom scenario doesn't remove optimistic/realistic/doomer."""
+        m = self._make_model()
+        m.set_scenario_configs({"custom": {"heterogeneity": 0.1}})
+        for name in ("optimistic", "realistic", "doomer", "custom"):
+            assert name in m._scenario_configs
+
+    def test_multiple_custom_scenarios(self):
+        """Multiple custom scenarios each inherit independently."""
+        m = self._make_model()
+        m.set_scenario_configs({
+            "mild": {"heterogeneity": 0.05},
+            "severe": {"heterogeneity": 0.8, "heteroskedasticity": 0.5},
+        })
+        assert m._scenario_configs["mild"]["heterogeneity"] == 0.05
+        assert m._scenario_configs["mild"]["heteroskedasticity"] == 0.0  # optimistic default
+        assert m._scenario_configs["severe"]["heterogeneity"] == 0.8
+        assert m._scenario_configs["severe"]["heteroskedasticity"] == 0.5
+
+    def test_empty_custom_scenario_equals_optimistic(self):
+        """An empty custom config is identical to the optimistic baseline."""
+        m = self._make_model()
+        m.set_scenario_configs({"empty": {}})
+        opt = DEFAULT_SCENARIO_CONFIG["optimistic"]
+        for key in self.ALL_KEYS:
+            assert m._scenario_configs["empty"][key] == opt[key]
+
+    # ── Type validation ──────────────────────────────────────────
+
+    def test_non_dict_raises_type_error(self):
+        m = self._make_model()
+        with pytest.raises(TypeError):
+            m.set_scenario_configs("not_a_dict")
+
+    def test_returns_self_for_chaining(self):
+        m = self._make_model()
+        result = m.set_scenario_configs({"custom": {"heterogeneity": 0.1}})
+        assert result is m
+
+    # ── End-to-end: no KeyError during simulation ────────────────
+
+    def test_custom_partial_config_runs_without_error(self):
+        """Custom scenario with only one key runs find_power without KeyError."""
+        m = self._make_model()
+        m.set_scenario_configs({"partial": {"heterogeneity": 0.3}})
+        result = m.find_power(
+            50, scenarios=True, print_results=False, return_results=True
+        )
+        assert "partial" in result["scenarios"]
+        power = result["scenarios"]["partial"]["results"]["individual_powers"]["overall"]
+        assert 0 <= power <= 100
+
+    def test_custom_residual_only_config_runs(self):
+        """Custom scenario with only residual keys runs without error."""
+        m = self._make_model()
+        m.set_scenario_configs({
+            "residual_test": {
+                "residual_change_prob": 1.0,
+                "residual_dists": ["heavy_tailed"],
+                "residual_df": 5,
+            }
+        })
+        result = m.find_power(
+            50, scenarios=True, print_results=False, return_results=True
+        )
+        assert "residual_test" in result["scenarios"]
+
+    def test_custom_lme_keys_on_ols_model_ignored(self):
+        """LME-specific keys on an OLS model don't cause errors."""
+        m = self._make_model()
+        m.set_scenario_configs({
+            "lme_on_ols": {
+                "icc_noise_sd": 0.3,
+                "random_effect_dist": "heavy_tailed",
+                "random_effect_df": 3,
+            }
+        })
+        result = m.find_power(
+            50, scenarios=True, print_results=False, return_results=True
+        )
+        assert "lme_on_ols" in result["scenarios"]
+
+    def test_overriding_all_three_defaults(self):
+        """Overriding optimistic, realistic, and doomer all at once."""
+        m = self._make_model()
+        m.set_scenario_configs({
+            "optimistic": {"heterogeneity": 0.01},
+            "realistic": {"heterogeneity": 0.5},
+            "doomer": {"heterogeneity": 0.9},
+        })
+        assert m._scenario_configs["optimistic"]["heterogeneity"] == 0.01
+        assert m._scenario_configs["realistic"]["heterogeneity"] == 0.5
+        assert m._scenario_configs["doomer"]["heterogeneity"] == 0.9
+        # Other keys preserved from defaults
+        assert m._scenario_configs["realistic"]["correlation_noise_sd"] == DEFAULT_SCENARIO_CONFIG["realistic"]["correlation_noise_sd"]
+        assert m._scenario_configs["doomer"]["correlation_noise_sd"] == DEFAULT_SCENARIO_CONFIG["doomer"]["correlation_noise_sd"]
+
+
 class TestApplyPerSimulationPerturbations:
     """Test apply_per_simulation_perturbations function."""
 
@@ -122,3 +270,152 @@ def test_var_type_perturbation(self):
 
         # All normal (type 0) vars should be changed to right_skewed (type 2)
         assert np.all(p_types == 2)
+
+
+class TestScenarioConfigKeysE2E:
+    """End-to-end tests for each individual config key and mixed combinations.
+
+    Each test verifies that setting a single config key (or combination)
+    via set_scenario_configs() runs find_power(scenarios=True) without
+    error and produces valid power values.
+    """
+
+    N_SIMS = 50
+    SAMPLE_SIZE = 80
+
+    def _make_model(self):
+        from mcpower import MCPower
+
+        m = MCPower("y = x1 + x2")
+        m.set_effects("x1=0.3, x2=0.2")
+        m.set_simulations(self.N_SIMS)
+        return m
+
+    def _run(self, model, config, scenario_name="test_scenario"):
+        model.set_scenario_configs({scenario_name: config})
+        result = model.find_power(
+            self.SAMPLE_SIZE,
+            scenarios=True,
+            print_results=False,
+            return_results=True,
+        )
+        power = result["scenarios"][scenario_name]["results"]["individual_powers"]["overall"]
+        assert 0 <= power <= 100, f"Power out of range: {power}"
+        return result
+
+    # ── Individual general keys ───────────────────────────────────
+
+    def test_heterogeneity_only(self):
+        self._run(self._make_model(), {"heterogeneity": 0.3})
+
+    def test_heteroskedasticity_only(self):
+        self._run(self._make_model(), {"heteroskedasticity": 0.2})
+
+    def test_correlation_noise_sd_only(self):
+        m = self._make_model()
+        m.set_correlations("(x1,x2)=0.4")
+        self._run(m, {"correlation_noise_sd": 0.3})
+
+    def test_distribution_change_prob_only(self):
+        self._run(self._make_model(), {"distribution_change_prob": 0.5})
+
+    def test_new_distributions_with_change_prob(self):
+        self._run(self._make_model(), {
+            "distribution_change_prob": 1.0,
+            "new_distributions": ["uniform"],
+        })
+
+    # ── Individual residual keys ──────────────────────────────────
+
+    def test_residual_change_prob_only(self):
+        self._run(self._make_model(), {"residual_change_prob": 0.5})
+
+    def test_residual_df_only(self):
+        self._run(self._make_model(), {
+            "residual_change_prob": 1.0,
+            "residual_df": 3,
+        })
+
+    def test_residual_dists_only(self):
+        self._run(self._make_model(), {
+            "residual_change_prob": 1.0,
+            "residual_dists": ["heavy_tailed"],
+        })
+
+    # ── Mixed general combinations ────────────────────────────────
+
+    def test_heterogeneity_and_correlation_noise(self):
+        m = self._make_model()
+        m.set_correlations("(x1,x2)=0.3")
+        self._run(m, {
+            "heterogeneity": 0.25,
+            "correlation_noise_sd": 0.3,
+        })
+
+    def test_distribution_change_and_heteroskedasticity(self):
+        self._run(self._make_model(), {
+            "distribution_change_prob": 0.5,
+            "heteroskedasticity": 0.15,
+        })
+
+    def test_all_general_keys_together(self):
+        m = self._make_model()
+        m.set_correlations("(x1,x2)=0.3")
+        self._run(m, {
+            "heterogeneity": 0.2,
+            "heteroskedasticity": 0.1,
+            "correlation_noise_sd": 0.2,
+            "distribution_change_prob": 0.3,
+        })
+
+    # ── Mixed general + residual ──────────────────────────────────
+
+    def test_general_plus_residual_keys(self):
+        self._run(self._make_model(), {
+            "heterogeneity": 0.2,
+            "residual_change_prob": 0.5,
+            "residual_df": 5,
+        })
+
+    def test_all_ols_keys_together(self):
+        m = self._make_model()
+        m.set_correlations("(x1,x2)=0.3")
+        self._run(m, {
+            "heterogeneity": 0.3,
+            "heteroskedasticity": 0.15,
+            "correlation_noise_sd": 0.25,
+            "distribution_change_prob": 0.4,
+            "new_distributions": ["right_skewed", "uniform"],
+            "residual_change_prob": 0.5,
+            "residual_dists": ["heavy_tailed", "skewed"],
+            "residual_df": 6,
+        })
+
+    # ── Boundary values ───────────────────────────────────────────
+
+    def test_zero_perturbation_matches_optimistic(self):
+        """A custom scenario with all zeros should match optimistic power."""
+        m = self._make_model()
+        m.set_seed(42)
+        result = self._run(m, {
+            "heterogeneity": 0.0,
+            "heteroskedasticity": 0.0,
+            "correlation_noise_sd": 0.0,
+            "distribution_change_prob": 0.0,
+            "residual_change_prob": 0.0,
+        })
+        opt_power = result["scenarios"]["optimistic"]["results"]["individual_powers"]["overall"]
+        custom_power = result["scenarios"]["test_scenario"]["results"]["individual_powers"]["overall"]
+        # Same seed, same zero config → should be close (not exact due to seed offsets)
+        assert abs(opt_power - custom_power) < 15
+
+    def test_max_perturbation_runs(self):
+        """Extreme perturbation values should not crash."""
+        self._run(self._make_model(), {
+            "heterogeneity": 0.9,
+            "heteroskedasticity": 0.5,
+            "correlation_noise_sd": 0.8,
+            "distribution_change_prob": 1.0,
+            "residual_change_prob": 1.0,
+            "residual_df": 2,
+        })
diff --git a/tests/integration/test_test_formula.py b/tests/integration/test_test_formula.py
new file mode 100644
index 0000000..77b2e6e
--- /dev/null
+++ b/tests/integration/test_test_formula.py
@@ -0,0 +1,388 @@
+"""
+End-to-end integration tests for the test_formula feature.
+
+The test_formula feature generates data using one model formula but fits a
+different (reduced) model for statistical testing, enabling model
+misspecification analysis (e.g. omitted variable bias).
+"""
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from mcpower import MCPower
+
+N_SIMS = 200
+SEED = 42
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _power_model(formula, effects, *, n_sims=N_SIMS, seed=SEED, **kwargs):
+    """Create a configured MCPower model ready for find_power."""
+    model = MCPower(formula)
+
+    # Apply optional configuration before effects
+    if "variable_types" in kwargs:
+        model.set_variable_type(kwargs.pop("variable_types"))
+    if "correlations" in kwargs:
+        model.set_correlations(kwargs.pop("correlations"))
+    if "cluster" in kwargs:
+        cluster_cfg = kwargs.pop("cluster")
+        model.set_cluster(**cluster_cfg)
+    if "max_failed" in kwargs:
+        model.set_max_failed_simulations(kwargs.pop("max_failed"))
+    if "upload_data" in kwargs:
+        model.upload_data(kwargs.pop("upload_data"))
+
+    model.set_effects(effects)
+    model.set_simulations(n_sims)
+    model.set_seed(seed)
+    return model
+
+
+def _run_power(model, sample_size, **kwargs):
+    """Run find_power with standard test defaults."""
+    return model.find_power(
+        sample_size,
+        print_results=False,
+        return_results=True,
+        progress_callback=False,
+        **kwargs,
+    )
+
+
+def _individual_powers(result):
+    """Extract individual_powers dict from a result."""
+    return result["results"]["individual_powers"]
+
+
+# ===========================================================================
+# Class 1: TestOLSSubset
+# ===========================================================================
+
+
+class TestOLSSubset:
+    """Test basic OLS test_formula subsetting scenarios."""
+
+    def test_omitted_variable_reduces_power(self):
+        """Omitting x3 from test formula excludes it from results."""
+        model = _power_model(
+            "y = x1 + x2 + x3",
+            "x1=0.5, x2=0.3, x3=0.5",
+        )
+        result = _run_power(model, 100, test_formula="y = x1 + x2")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "x2" in powers
+        assert "x3" not in powers
+
+    def test_omitted_interaction(self):
+        """Omitting interaction from test formula excludes it from results."""
+        model = _power_model(
+            "y = x1 + x2 + x1:x2",
+            "x1=0.5, x2=0.3, x1:x2=0.2",
+        )
+        result = _run_power(model, 100, test_formula="y = x1 + x2")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "x2" in powers
+        assert "x1:x2" not in powers
+
+    def test_single_variable_test(self):
+        """Testing only x1 from a 3-variable generation model."""
+        model = _power_model(
+            "y = x1 + x2 + x3",
+            "x1=0.5, x2=0.3, x3=0.2",
+        )
+        result = _run_power(model, 100, test_formula="y = x1")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "overall" in powers
+        assert "x2" not in powers
+        assert "x3" not in powers
+
+    def test_same_formula_matches_no_test_formula(self):
+        """Using test_formula identical to generation gives same powers."""
+        model_a = _power_model("y = x1 + x2", "x1=0.5, x2=0.3")
+        result_a = _run_power(model_a, 100, test_formula="y = x1 + x2")
+
+        model_b = _power_model("y = x1 + x2", "x1=0.5, x2=0.3")
+        result_b = _run_power(model_b, 100)
+
+        powers_a = _individual_powers(result_a)
+        powers_b = _individual_powers(result_b)
+
+        for key in powers_b:
+            assert abs(powers_a[key] - powers_b[key]) < 0.01, (
+                f"Power mismatch for {key}: {powers_a[key]} vs {powers_b[key]}"
+            )
+
+    def test_empty_test_formula_uses_generation(self):
+        """Empty test_formula string uses the generation formula (default)."""
+        model_a = _power_model("y = x1 + x2", "x1=0.5, x2=0.3")
+        result_a = _run_power(model_a, 100, test_formula="")
+
+        model_b = _power_model("y = x1 + x2", "x1=0.5, x2=0.3")
+        result_b = _run_power(model_b, 100)
+
+        powers_a = _individual_powers(result_a)
+        powers_b = _individual_powers(result_b)
+
+        for key in powers_b:
+            assert abs(powers_a[key] - powers_b[key]) < 0.01, (
+                f"Power mismatch for {key}: {powers_a[key]} vs {powers_b[key]}"
+            )
+
+
+# ===========================================================================
+# Class 2: TestFactorVariables
+# ===========================================================================
+
+
+class TestFactorVariables:
+    """Test test_formula with factor (categorical) variables."""
+
+    def test_omitted_factor(self):
+        """Omitting a factor variable from test formula excludes its dummies."""
+        model = _power_model(
+            "y = x1 + x2",
+            "x1=0.5, x2[2]=0.3, x2[3]=0.4",
+            variable_types="x2=(factor,3)",
+        )
+        result = _run_power(model, 150, test_formula="y = x1")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        # Factor dummies should not be in results
+        assert "x2[2]" not in powers
+        assert "x2[3]" not in powers
+
+    def test_factor_kept_continuous_dropped(self):
+        """Keeping factor but dropping continuous variable."""
+        model = _power_model(
+            "y = x1 + x2",
+            "x1=0.5, x2[2]=0.3, x2[3]=0.4",
+            variable_types="x2=(factor,3)",
+        )
+        result = _run_power(model, 150, test_formula="y = x2")
+
+        powers = _individual_powers(result)
+        # x1 excluded
+        assert "x1" not in powers
+        # Factor dummies should be present
+        assert "x2[2]" in powers
+        assert "x2[3]" in powers
+
+
+# ===========================================================================
+# Class 3: TestCorrelationStructures
+# ===========================================================================
+
+
+class TestCorrelationStructures:
+    """Test test_formula with correlated predictors."""
+
+    def test_correlated_variables_subset(self):
+        """Subsetting correlated variables runs without error."""
+        model = _power_model(
+            "y = x1 + x2",
+            "x1=0.5, x2=0.3",
+            correlations="(x1,x2)=0.5",
+        )
+        result = _run_power(model, 100, test_formula="y = x1")
+
+        assert result is not None
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "x2" not in powers
+
+
+# ===========================================================================
+# Class 4: TestResultsStructure
+# ===========================================================================
+
+
+class TestResultsStructure:
+    """Test that result dict contains correct test_formula metadata."""
+
+    def test_results_contain_both_formulas(self):
+        """Result should have data_formula and test_formula fields."""
+        model = _power_model(
+            "y = x1 + x2 + x3",
+            "x1=0.5, x2=0.3, x3=0.2",
+        )
+        result = _run_power(model, 100, test_formula="y = x1 + x2")
+
+        assert "data_formula" in result["model"]
+        assert "test_formula" in result["model"]
+        # data_formula should be the generation formula
+        assert "x3" in result["model"]["data_formula"]
+        # test_formula should be the reduced formula
+        assert result["model"]["test_formula"] == "y = x1 + x2"
+
+    def test_target_tests_reflect_test_formula(self):
+        """target_tests in results should not contain excluded effects."""
+        model = _power_model(
+            "y = x1 + x2 + x3",
+            "x1=0.5, x2=0.3, x3=0.2",
+        )
+        result = _run_power(model, 100, test_formula="y = x1 + x2")
+
+        target_tests = result["model"]["target_tests"]
+        assert "x1" in target_tests
+        assert "x2" in target_tests
+        assert "x3" not in target_tests
+
+
+# ===========================================================================
+# Class 5: TestValidation
+# ===========================================================================
+
+
+class TestValidation:
+    """Test validation errors for invalid test_formula usage."""
+
+    def test_nonexistent_variable_raises(self):
+        """test_formula with unknown variable raises ValueError."""
+        model = _power_model(
+            "y = x1 + x2",
+            "x1=0.5, x2=0.3",
+        )
+        with pytest.raises(ValueError, match="not found"):
+            _run_power(model, 100, test_formula="y = x1 + x99")
+
+    def test_ols_to_lme_raises(self):
+        """test_formula with random effects on OLS model raises ValueError.
+
+        When the grouping variable (school) is not in the generation model,
+        validation fails with 'not found'. When it is present but has no
+        cluster config, it fails with 'random effects'.
+        """
+        # Case 1: grouping var not in model at all -> "not found"
+        model = _power_model(
+            "y = x1 + x2",
+            "x1=0.5, x2=0.3",
+        )
+        with pytest.raises(ValueError, match="not found"):
+            _run_power(model, 100, test_formula="y = x1 + (1|school)")
+
+    def test_ols_with_cluster_var_but_no_cluster_config_raises(self):
+        """test_formula with random effects when var exists but no cluster config.
+
+        When the generation model knows about 'school' as a variable but has
+        no cluster specification, the random effects check triggers.
+        """
+        # This would require a model that has 'school' as a predictor but
+        # no set_cluster call. The generation model includes school as a
+        # fixed effect, so it's a known variable.
+        model = _power_model(
+            "y = x1 + school",
+            "x1=0.5, school=0.3",
+        )
+        with pytest.raises(ValueError, match="random effects"):
+            _run_power(model, 100, test_formula="y = x1 + (1|school)")
+
+
+# ===========================================================================
+# Class 6: TestFindSampleSize
+# ===========================================================================
+
+
+class TestFindSampleSize:
+    """Test test_formula with find_sample_size."""
+
+    def test_subset_via_find_sample_size(self):
+        """find_sample_size with test_formula excludes omitted variable."""
+        model = _power_model(
+            "y = x1 + x2 + x3",
+            "x1=0.5, x2=0.3, x3=0.2",
+        )
+        result = model.find_sample_size(
+            target_test="x1",
+            from_size=30,
+            to_size=100,
+            by=10,
+            test_formula="y = x1 + x2",
+            print_results=False,
+            return_results=True,
+            progress_callback=False,
+        )
+
+        assert result is not None
+        powers_by_test = result["results"]["powers_by_test"]
+        assert "x1" in powers_by_test
+        assert "x3" not in powers_by_test
+
+
+# ===========================================================================
+# Class 7: TestMixedModelCross (LME)
+# ===========================================================================
+
+
+@pytest.mark.lme
+class TestMixedModelCross:
+    """Test test_formula across mixed model boundaries."""
+
+    def test_lme_gen_ols_test(self):
+        """Generate with LME, test with OLS (drop random effects)."""
+        model = _power_model(
+            "y ~ x1 + x2 + (1|school)",
+            "x1=0.5, x2=0.3",
+            cluster={"grouping_var": "school", "ICC": 0.2, "n_clusters": 20},
+            max_failed=0.10,
+        )
+        result = _run_power(model, 1000, test_formula="y ~ x1 + x2")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "x2" in powers
+
+    def test_lme_gen_lme_subset(self):
+        """Generate with LME full model, test with LME subset (drop x2)."""
+        model = _power_model(
+            "y ~ x1 + x2 + (1|school)",
+            "x1=0.5, x2=0.3",
+            cluster={"grouping_var": "school", "ICC": 0.2, "n_clusters": 20},
+            max_failed=0.10,
+        )
+        result = _run_power(model, 1000, test_formula="y ~ x1 + (1|school)")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "x2" not in powers
+
+
+# ===========================================================================
+# Class 8: TestUploadedData
+# ===========================================================================
+
+
+class TestUploadedData:
+    """Test test_formula with uploaded empirical data."""
+
+    def test_upload_with_test_formula(self):
+        """Uploaded data with test_formula excludes omitted variable."""
+        np.random.seed(SEED)
+        data = pd.DataFrame({
+            "x1": np.random.normal(0, 1, 50),
+            "x2": np.random.normal(0, 1, 50),
+            "x3": np.random.normal(0, 1, 50),
+        })
+
+        model = _power_model(
+            "y = x1 + x2 + x3",
+            "x1=0.5, x2=0.3, x3=0.2",
+            upload_data=data,
+        )
+        result = _run_power(model, 100, test_formula="y = x1 + x2")
+
+        powers = _individual_powers(result)
+        assert "x1" in powers
+        assert "x2" in powers
+        assert "x3" not in powers
diff --git a/tests/integration/test_upload_data.py b/tests/integration/test_upload_data.py
index 93a7d8f..05602f0 100644
--- a/tests/integration/test_upload_data.py
+++ b/tests/integration/test_upload_data.py
@@ -71,7 +71,7 @@ def test_binary_auto_detection(self, cars_data):
         model = MCPower("mpg = vs + am")
         model.upload_data(_select(cars_data, ["vs", "am"]))
         model.set_effects("vs=0.3, am=0.4")
-        model.apply()
+        model._apply()
 
         # Check that vs and am were detected as uploaded_binary
         vs_pred = model._registry.get_predictor("vs")
@@ -85,7 +85,7 @@ def test_factor_auto_detection(self, cars_data):
         model = MCPower("mpg = cyl + gear")
         model.upload_data(_select(cars_data, ["cyl", "gear"]), preserve_factor_level_names=False)
         model.set_effects("cyl[2]=0.3, cyl[3]=0.4, gear[2]=0.2, gear[3]=0.3")
-        model.apply()
+        model._apply()
 
         # Check that cyl and gear were detected as factor
         # After expansion, check the factor names
@@ -103,7 +103,7 @@ def test_continuous_auto_detection(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]))
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         # Check that hp and wt were detected as continuous (uploaded_data)
         hp_pred = model._registry.get_predictor("hp")
@@ -122,14 +122,14 @@ def test_constant_column_dropped(self, cars_data):
         # Should raise error because 'constant' will be dropped
         with pytest.raises(ValueError, match="All uploaded columns were dropped"):
             model.upload_data(_select(data, ["constant"]))
-            model.apply()
+            model._apply()
 
     def test_mixed_types_auto_detection(self, cars_data):
         """Test auto-detection with mixed variable types."""
         model = MCPower("mpg = vs + cyl + hp")
         model.upload_data(_select(cars_data, ["vs", "cyl", "hp"]), preserve_factor_level_names=False)
         model.set_effects("vs=0.3, cyl[2]=0.2, cyl[3]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
 
         vs_pred = model._registry.get_predictor("vs")
         hp_pred = model._registry.get_predictor("hp")
@@ -147,7 +147,7 @@ def test_override_to_continuous(self, cars_data):
         model = MCPower("mpg = cyl + hp")
         model.upload_data(_select(cars_data, ["cyl", "hp"]), data_types={"cyl": "continuous"})
         model.set_effects("cyl=0.4, hp=0.5")
-        model.apply()
+        model._apply()
 
         cyl_pred = model._registry.get_predictor("cyl")
         # Should be uploaded_data (continuous) instead of factor
@@ -178,7 +178,7 @@ def test_override_to_binary(self, cars_data):
         model_binary = MCPower("mpg = hp_binary + wt")
         model_binary.upload_data(data, data_types={"hp_binary": "binary"})
         model_binary.set_effects("hp_binary=0.4, wt=0.3")
-        model_binary.apply()
+        model_binary._apply()
 
         hp_pred = model_binary._registry.get_predictor("hp_binary")
         assert hp_pred.var_type == "uploaded_binary"
@@ -206,7 +206,7 @@ def test_no_correlation_from_data(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]), preserve_correlation="no")
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         # Correlation matrix should be identity (or user-specified)
         corr = model.correlation_matrix
@@ -219,7 +219,7 @@ def test_binary_uses_standard_generation(self, cars_data):
         model = MCPower("mpg = vs + am")
         model.upload_data(_select(cars_data, ["vs", "am"]), preserve_correlation="no")
         model.set_effects("vs=0.3, am=0.4")
-        model.apply()
+        model._apply()
 
         # Should detect proportions from data
         vs_pred = model._registry.get_predictor("vs")
@@ -231,7 +231,7 @@ def test_continuous_uses_lookup_tables(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]), preserve_correlation="no")
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         # Should have lookup tables populated
         assert model.upload_normal_values.shape[0] > 0
@@ -246,7 +246,7 @@ def test_strict_is_default(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]))
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         assert model._preserve_correlation == "strict"
 
@@ -255,7 +255,7 @@ def test_correlations_computed_from_data(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]), preserve_correlation="partial")
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         # Correlation should match data correlation
         hp_arr = np.array(cars_data["hp"])
@@ -278,7 +278,7 @@ def test_user_can_override_correlations(self, cars_data):
         # This tests that user correlations can override data correlations
         # For now, the implementation always uses data correlations
         # TODO: Implement user override priority
-        model.apply()
+        model._apply()
 
         # Just verify it doesn't crash
         assert model.correlation_matrix is not None
@@ -292,7 +292,7 @@ def test_strict_mode_sets_metadata(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]), preserve_correlation="strict")
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         assert model._preserve_correlation == "strict"
         assert model._uploaded_raw_data is not None
@@ -303,7 +303,7 @@ def test_strict_mode_warns_cross_correlations(self, cars_data, capsys):
         model = MCPower("mpg = hp + wt + x1")  # x1 is created, hp/wt uploaded
         model.upload_data(_select(cars_data, ["hp", "wt"]), preserve_correlation="strict")
         model.set_effects("hp=0.5, wt=0.3, x1=0.4")
-        model.apply()
+        model._apply()
 
         captured = capsys.readouterr()
         # Should warn about cross-correlations
@@ -314,7 +314,7 @@ def test_strict_mode_bootstrap_preserves_relationships(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]), preserve_correlation="strict")
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         # Should be able to run simulation without error
         result = model.find_power(sample_size=50, print_results=False, return_results=True)
@@ -356,7 +356,7 @@ def test_strict_mode_with_binary(self, cars_data):
         model = MCPower("mpg = vs + am")
         model.upload_data(_select(cars_data, ["vs", "am"]), preserve_correlation="strict")
         model.set_effects("vs=0.3, am=0.4")
-        model.apply()
+        model._apply()
 
         # Check metadata
         assert "vs" in model._uploaded_var_metadata
@@ -373,7 +373,7 @@ def test_strict_mode_with_factor(self, cars_data):
             preserve_factor_level_names=False,
         )
         model.set_effects("cyl[2]=0.3, cyl[3]=0.4, gear[2]=0.2, gear[3]=0.3")
-        model.apply()
+        model._apply()
 
         # Check metadata
         assert "cyl" in model._uploaded_var_metadata
@@ -390,7 +390,7 @@ def test_warning_for_unmatched_columns(self, cars_data, capsys):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt", "vs"]))  # vs not in model
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         captured = capsys.readouterr()
         assert "Ignoring unmatched columns" in captured.out
@@ -401,7 +401,7 @@ def test_warning_for_large_sample_size(self, cars_data, capsys):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]))
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         # 32 samples * 3 = 96, so 100 should trigger warning
         model.find_power(sample_size=100, print_results=False)
@@ -424,7 +424,7 @@ def test_warning_for_dropped_constant_columns(self, cars_data, capsys):
         # This should raise an error because constant was dropped and no effect was set for it
         # But the auto-detection output should show it was dropped
         try:
-            model.apply()
+            model._apply()
         except ValueError:
             pass  # Expected to fail because constant column missing
 
@@ -442,7 +442,7 @@ def test_full_dict_with_unmatched_columns(self, cars_data):
         model = MCPower("mpg = hp + wt")
         model.upload_data(cars_data)  # Full dict, not pre-filtered
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         hp_pred = model._registry.get_predictor("hp")
         wt_pred = model._registry.get_predictor("wt")
@@ -463,7 +463,7 @@ def test_full_dict_with_mixed_var_types(self, cars_data):
         model = MCPower("mpg = vs + cyl + hp")
         model.upload_data(cars_data, preserve_factor_level_names=False)  # Full dict
         model.set_effects("vs=0.3, cyl[2]=0.2, cyl[3]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
 
         vs_pred = model._registry.get_predictor("vs")
         hp_pred = model._registry.get_predictor("hp")
@@ -493,7 +493,7 @@ def test_string_matched_column_auto_detected_as_factor(self):
         model = MCPower("y = x")
         model.upload_data(data)
         model.set_effects("x[b]=0.3, x[c]=0.4")
-        model.apply()
+        model._apply()
         assert "x" in model._registry.factor_names
         assert "x[b]" in model._registry.dummy_names
         assert "x[c]" in model._registry.dummy_names
@@ -507,7 +507,7 @@ def test_no_matching_columns_ignores_data(self, cars_data, capsys):
         model = MCPower("mpg = x1 + x2")
         model.upload_data(_select(cars_data, ["hp", "wt"]))
         model.set_effects("x1=0.3, x2=0.4")
-        model.apply()
+        model._apply()
 
         captured = capsys.readouterr()
         assert "uploaded data ignored" in captured.out.lower()
@@ -557,7 +557,7 @@ def test_dict_format(self, cars_data):
         }
         model.upload_data(data_dict)
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         assert model._applied is True
 
@@ -566,10 +566,10 @@ def test_sample_size_warning_in_find_sample_size(self, cars_data, capsys):
         model = MCPower("mpg = hp + wt")
         model.upload_data(_select(cars_data, ["hp", "wt"]))
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
-        # 32 * 3 = 96, so to_size=150 should trigger warning
-        model.find_sample_size(from_size=30, to_size=150, by=20, print_results=False)
+        # 32 * 3 = 96, so size=110 > 96 triggers warning
+        model.find_sample_size(from_size=50, to_size=110, by=30, print_results=False)
 
         captured = capsys.readouterr()
         assert "Warning" in captured.out
@@ -583,14 +583,14 @@ def test_string_column_auto_detected_as_factor(self, cars_data):
         model = MCPower("mpg = origin + hp")
         model.upload_data(_select(cars_data, ["origin", "hp"]))
         model.set_effects("origin[Japan]=0.3, origin[USA]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         assert "origin" in model._registry.factor_names
 
     def test_string_column_creates_named_dummies(self, cars_data):
         model = MCPower("mpg = origin + hp")
         model.upload_data(_select(cars_data, ["origin", "hp"]))
         model.set_effects("origin[Japan]=0.3, origin[USA]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         dummy_names = model._registry.dummy_names
         assert "origin[Japan]" in dummy_names
         assert "origin[USA]" in dummy_names
@@ -600,7 +600,7 @@ def test_string_column_no_mode(self, cars_data):
         model = MCPower("mpg = origin + hp")
         model.upload_data(_select(cars_data, ["origin", "hp"]), preserve_correlation="no")
         model.set_effects("origin[Japan]=0.3, origin[USA]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         assert "origin" in model._registry.factor_names
 
     def test_too_many_string_levels_raises(self):
@@ -611,7 +611,7 @@ def test_too_many_string_levels_raises(self):
         model = MCPower("y = name + x1")
         with pytest.raises(ValueError, match="too many unique"):
             model.upload_data(_select(data, ["name", "x1"]))
-            model.apply()
+            model._apply()
 
 
 class TestPreserveFactorLevelNames:
@@ -621,7 +621,7 @@ def test_numeric_factor_uses_original_values(self, cars_data):
         model = MCPower("mpg = cyl + hp")
         model.upload_data(_select(cars_data, ["cyl", "hp"]))
         model.set_effects("cyl[6]=0.3, cyl[8]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         dummy_names = model._registry.dummy_names
         assert "cyl[6]" in dummy_names
         assert "cyl[8]" in dummy_names
@@ -631,7 +631,7 @@ def test_preserve_false_uses_integer_indices(self, cars_data):
         model = MCPower("mpg = cyl + hp")
         model.upload_data(_select(cars_data, ["cyl", "hp"]), preserve_factor_level_names=False)
         model.set_effects("cyl[2]=0.3, cyl[3]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         dummy_names = model._registry.dummy_names
         assert "cyl[2]" in dummy_names
         assert "cyl[3]" in dummy_names
@@ -640,7 +640,7 @@ def test_custom_reference_via_data_types_tuple(self, cars_data):
         model = MCPower("mpg = cyl + hp")
         model.upload_data(_select(cars_data, ["cyl", "hp"]), data_types={"cyl": ("factor", 6)})
         model.set_effects("cyl[4]=0.3, cyl[8]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         dummy_names = model._registry.dummy_names
         assert "cyl[4]" in dummy_names
         assert "cyl[8]" in dummy_names
@@ -650,7 +650,7 @@ def test_invalid_reference_level_raises(self, cars_data):
         model = MCPower("mpg = cyl + hp")
         with pytest.raises(ValueError, match="not found in"):
             model.upload_data(_select(cars_data, ["cyl", "hp"]), data_types={"cyl": ("factor", 99)})
-            model.apply()
+            model._apply()
 
     def test_string_custom_reference(self, cars_data):
         model = MCPower("mpg = origin + hp")
@@ -658,7 +658,7 @@ def test_string_custom_reference(self, cars_data):
             _select(cars_data, ["origin", "hp"]), data_types={"origin": ("factor", "Japan")}
         )
         model.set_effects("origin[Europe]=0.3, origin[USA]=0.4, hp=0.5")
-        model.apply()
+        model._apply()
         dummy_names = model._registry.dummy_names
         assert "origin[Europe]" in dummy_names
         assert "origin[USA]" in dummy_names
@@ -737,7 +737,7 @@ def test_origin_as_factor(self, cars_data):
         model = MCPower("mpg = origin + hp")
         model.upload_data(_select(cars_data, ["origin", "hp"]))
         model.set_effects("origin[Japan]=0.3, origin[USA]=0.5, hp=0.4")
-        model.apply()
+        model._apply()
 
         assert "origin" in model._registry.factor_names
         assert "origin[Japan]" in model._registry.dummy_names
@@ -762,7 +762,7 @@ def test_origin_with_cyl_mixed(self, cars_data):
         model = MCPower("mpg = origin + cyl")
         model.upload_data(_select(cars_data, ["origin", "cyl"]))
         model.set_effects("origin[Japan]=0.3, origin[USA]=0.5, cyl[6]=0.2, cyl[8]=0.4")
-        model.apply()
+        model._apply()
 
         assert "origin[Japan]" in model._registry.dummy_names
         assert "cyl[6]" in model._registry.dummy_names
@@ -822,7 +822,7 @@ def test_dataframe_upload(self):
         model = MCPower("mpg = hp + wt")
         model.upload_data(df[["hp", "wt"]])
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         hp_pred = model._registry.get_predictor("hp")
         assert hp_pred.var_type == "uploaded_data"
@@ -833,7 +833,7 @@ def test_dataframe_with_string_index_column(self):
         model = MCPower("mpg = hp + wt")
         model.upload_data(df)
         model.set_effects("hp=0.5, wt=0.3")
-        model.apply()
+        model._apply()
 
         hp_pred = model._registry.get_predictor("hp")
         assert hp_pred.var_type == "uploaded_data"
diff --git a/tests/mixed_models/test_cluster_validators.py b/tests/mixed_models/test_cluster_validators.py
index 0cca25e..27540d2 100644
--- a/tests/mixed_models/test_cluster_validators.py
+++ b/tests/mixed_models/test_cluster_validators.py
@@ -102,7 +102,7 @@ def test_sufficient_observations_per_cluster(self):
         model.set_cluster("cluster", ICC=0.2, n_clusters=5)
         model.set_effects("x=0.5")
         model.set_simulations(10)
-        model.apply()
+        model._apply()
 
         # 50 / 5 = 10 (above warning band)
         result = model.find_power(sample_size=50, return_results=True)
@@ -118,7 +118,7 @@ def test_insufficient_observations_per_cluster_rejected(self):
         model.set_cluster("cluster", ICC=0.2, n_clusters=5)
         model.set_effects("x=0.5")
         model.set_simulations(10)
-        model.apply()
+        model._apply()
 
         # 20 / 5 = 4 (below minimum)
         with pytest.raises(ValueError, match="Insufficient observations per cluster"):
@@ -134,7 +134,7 @@ def test_validation_message_suggestions(self):
         model.set_cluster("cluster", ICC=0.2, n_clusters=10)
         model.set_effects("x=0.5")
         model.set_simulations(10)
-        model.apply()
+        model._apply()
 
         with pytest.raises(ValueError) as exc_info:
             model.find_power(sample_size=30)  # 30/10 = 3 < 5
@@ -155,7 +155,7 @@ def test_valid_config_runs_successfully(self):
         model.set_cluster("cluster", ICC=0.2, n_clusters=5)
         model.set_effects("x=0.5")
         model.set_simulations(10)
-        model.apply()
+        model._apply()
 
         result = model.find_power(sample_size=50, return_results=True)  # 10 per cluster
 
@@ -170,7 +170,7 @@ def test_edge_case_exactly_5_per_cluster(self):
         model.set_effects("x=0.5")
         model.set_simulations(10)
         model.set_max_failed_simulations(0.30)  # Allow more failures at edge
-        model.apply()
+        model._apply()
 
         result = model.find_power(sample_size=20, return_results=True)  # 20/4 = 5
 
@@ -182,7 +182,7 @@ def test_icc_zero_no_convergence_issues(self):
         model.set_cluster("cluster", ICC=0.0, n_clusters=5)
         model.set_effects("x=0.5")
         model.set_simulations(20)
-        model.apply()
+        model._apply()
 
         result = model.find_power(sample_size=250, return_results=True)
 
diff --git a/tests/mixed_models/test_integration_phase2.py b/tests/mixed_models/test_integration_phase2.py
index 42d8621..0d04919 100644
--- a/tests/mixed_models/test_integration_phase2.py
+++ b/tests/mixed_models/test_integration_phase2.py
@@ -23,7 +23,7 @@ def test_slope_model_setup(self):
             slope_intercept_corr=0.3,
         )
         model.set_effects("x1=0.5")
-        model.apply()
+        model._apply()
 
         # Verify cluster spec was configured correctly
         spec = model._registry._cluster_specs["school"]
@@ -106,7 +106,7 @@ def test_nested_model_setup(self):
         model.set_cluster("school", ICC=0.15, n_clusters=10)
         model.set_cluster("classroom", ICC=0.10, n_per_parent=3)
         model.set_effects("treatment=0.5")
-        model.apply()
+        model._apply()
 
         assert "school" in model._registry._cluster_specs
         assert "school:classroom" in model._registry._cluster_specs
diff --git a/tests/mixed_models/test_mixed_models.py b/tests/mixed_models/test_mixed_models.py
index de4b422..e0a867c 100644
--- a/tests/mixed_models/test_mixed_models.py
+++ b/tests/mixed_models/test_mixed_models.py
@@ -405,7 +405,6 @@ def test_unknown_backend_raises(self):
                 np.zeros(10),
                 np.array([0]),
                 np.zeros(10, dtype=int),
-                [],
                 0,
                 0.05,
                 backend="nonexistent",
diff --git a/tests/mixed_models/test_mixed_models_validation.py b/tests/mixed_models/test_mixed_models_validation.py
index fbe2ecf..5502bfe 100644
--- a/tests/mixed_models/test_mixed_models_validation.py
+++ b/tests/mixed_models/test_mixed_models_validation.py
@@ -90,7 +90,7 @@ def test_icc_recovery_medium(self):
 
         from mcpower.stats.data_generation import _generate_cluster_effects
 
-        sample_size = 500
+        sample_size = 1000
         n_clusters = 20
         icc_target = ICC_MODERATE_HIGH
 
@@ -270,7 +270,7 @@ def test_diagnostics_available(self):
             y=y,
             target_indices=np.array([0]),
             cluster_ids=cluster_ids,
-            cluster_column_indices=[],
+
             correction_method=0,
             alpha=0.05,
             backend="statsmodels",
diff --git a/tests/mixed_models/test_scenarios_lme.py b/tests/mixed_models/test_scenarios_lme.py
index f3d5ad6..41597d3 100644
--- a/tests/mixed_models/test_scenarios_lme.py
+++ b/tests/mixed_models/test_scenarios_lme.py
@@ -14,7 +14,6 @@
 from mcpower.core.scenarios import (
     DEFAULT_SCENARIO_CONFIG,
     apply_lme_perturbations,
-    apply_lme_residual_perturbations,
 )
 from mcpower.stats.data_generation import (
     _generate_cluster_effects,
@@ -35,7 +34,7 @@ class TestDefaultConfig:
         "random_effect_dist",
         "random_effect_df",
         "icc_noise_sd",
-        "residual_dist",
+        "residual_dists",
         "residual_change_prob",
         "residual_df",
     ]
@@ -49,22 +48,37 @@ def test_doomer_has_lme_keys(self):
             assert key in DEFAULT_SCENARIO_CONFIG["doomer"], f"Missing key: {key}"
 
     def test_realistic_values(self):
+        """Realistic scenario has non-zero LME perturbation values."""
         cfg = DEFAULT_SCENARIO_CONFIG["realistic"]
         assert cfg["random_effect_dist"] == "heavy_tailed"
-        assert cfg["random_effect_df"] == 5
-        assert cfg["icc_noise_sd"] == 0.15
-        assert cfg["residual_dist"] == "heavy_tailed"
-        assert cfg["residual_change_prob"] == 0.3
-        assert cfg["residual_df"] == 10
+        assert cfg["random_effect_df"] > 0
+        assert cfg["icc_noise_sd"] > 0
+        assert cfg["residual_dists"] == ["heavy_tailed", "skewed"]
+        assert cfg["residual_change_prob"] > 0
+        assert cfg["residual_df"] > 2
 
     def test_doomer_values(self):
-        cfg = DEFAULT_SCENARIO_CONFIG["doomer"]
-        assert cfg["random_effect_dist"] == "heavy_tailed"
-        assert cfg["random_effect_df"] == 3
-        assert cfg["icc_noise_sd"] == 0.30
-        assert cfg["residual_dist"] == "heavy_tailed"
-        assert cfg["residual_change_prob"] == 0.8
-        assert cfg["residual_df"] == 5
+        """Doomer scenario has more severe perturbation than realistic."""
+        real = DEFAULT_SCENARIO_CONFIG["realistic"]
+        doom = DEFAULT_SCENARIO_CONFIG["doomer"]
+        assert doom["random_effect_dist"] == "heavy_tailed"
+        assert doom["random_effect_df"] <= real["random_effect_df"]
+        assert doom["icc_noise_sd"] >= real["icc_noise_sd"]
+        assert doom["residual_dists"] == ["heavy_tailed", "skewed"]
+        assert doom["residual_change_prob"] >= real["residual_change_prob"]
+        assert doom["residual_df"] <= real["residual_df"]
+
+    def test_optimistic_has_lme_keys(self):
+        for key in self.LME_KEYS:
+            assert key in DEFAULT_SCENARIO_CONFIG["optimistic"], f"Missing key: {key}"
+
+    def test_optimistic_values_are_zero(self):
+        cfg = DEFAULT_SCENARIO_CONFIG["optimistic"]
+        assert cfg["heterogeneity"] == 0.0
+        assert cfg["heteroskedasticity"] == 0.0
+        assert cfg["residual_change_prob"] == 0.0
+        assert cfg["icc_noise_sd"] == 0.0
+        assert cfg["random_effect_dist"] == "normal"
 
 
 # ---------------------------------------------------------------------------
@@ -309,73 +323,3 @@ def test_slopes_without_perturbations(self):
         assert result.intercept_columns.shape == (1000, 1)
 
 
-# ---------------------------------------------------------------------------
-# apply_lme_residual_perturbations
-# ---------------------------------------------------------------------------
-class TestApplyLmeResidualPerturbations:
-    """Test apply_lme_residual_perturbations() function."""
-
-    def _make_y(self, seed=42):
-        """Generate a deterministic y vector with known errors."""
-        rng = np.random.RandomState(seed + 2)
-        return rng.standard_normal(500)
-
-    def test_normal_dist_returns_unchanged(self):
-        y = self._make_y()
-        config = {"residual_dist": "normal", "residual_change_prob": 1.0, "residual_df": 5}
-        result = apply_lme_residual_perturbations(y.copy(), config, 42)
-        np.testing.assert_array_equal(result, y)
-
-    def test_zero_prob_returns_unchanged(self):
-        y = self._make_y()
-        config = {"residual_dist": "heavy_tailed", "residual_change_prob": 0.0, "residual_df": 5}
-        result = apply_lme_residual_perturbations(y.copy(), config, 42)
-        np.testing.assert_array_equal(result, y)
-
-    def test_prob_1_always_applies(self):
-        y = self._make_y()
-        config = {"residual_dist": "heavy_tailed", "residual_change_prob": 1.0, "residual_df": 5}
-        result = apply_lme_residual_perturbations(y.copy(), config, 42)
-        # Should be different from original
-        assert not np.array_equal(result, y)
-
-    def test_heavy_tailed_residuals_have_excess_kurtosis(self):
-        """When residuals are replaced with t(5), the diff should have heavy tails."""
-        y_orig = self._make_y()
-        config = {"residual_dist": "heavy_tailed", "residual_change_prob": 1.0, "residual_df": 5}
-        y_perturbed = apply_lme_residual_perturbations(y_orig.copy(), config, 42)
-        diff = y_perturbed - y_orig
-        # The diff = new_errors - original_errors. Both have finite variance,
-        # but the new_errors are t(5) which has excess kurtosis.
-        # For large enough N, the kurtosis of the difference should be positive.
-        sp_stats.kurtosis(diff + y_orig, fisher=True)
-        # Just check it ran without error and output differs
-        assert not np.array_equal(y_perturbed, y_orig)
-
-    def test_skewed_residuals_applied(self):
-        y_orig = self._make_y()
-        config = {"residual_dist": "skewed", "residual_change_prob": 1.0, "residual_df": 5}
-        y_perturbed = apply_lme_residual_perturbations(y_orig.copy(), config, 42)
-        assert not np.array_equal(y_perturbed, y_orig)
-
-    def test_coin_flip_seed_reproducible(self):
-        y = self._make_y()
-        config = {"residual_dist": "heavy_tailed", "residual_change_prob": 0.5, "residual_df": 5}
-        r1 = apply_lme_residual_perturbations(y.copy(), config, 42)
-        r2 = apply_lme_residual_perturbations(y.copy(), config, 42)
-        np.testing.assert_array_equal(r1, r2)
-
-    def test_coin_flip_prob_respected(self):
-        """With prob=0.3, roughly 30% of simulations should be perturbed."""
-        config = {"residual_dist": "heavy_tailed", "residual_change_prob": 0.3, "residual_df": 5}
-        n_perturbed = 0
-        n_trials = 200
-        y_template = np.ones(100)
-        for i in range(n_trials):
-            y = y_template.copy()
-            result = apply_lme_residual_perturbations(y, config, i * 100)
-            if not np.array_equal(result, y_template):
-                n_perturbed += 1
-        # Should be roughly 30% ± some tolerance
-        pct = n_perturbed / n_trials
-        assert 0.10 < pct < 0.55, f"Expected ~30% perturbed, got {pct:.1%}"
diff --git a/tests/specs/test_alpha_levels.py b/tests/specs/test_alpha_levels.py
index 529e9db..c1f0908 100644
--- a/tests/specs/test_alpha_levels.py
+++ b/tests/specs/test_alpha_levels.py
@@ -1,9 +1,8 @@
 """
-Non-default alpha level tests — backend-agnostic.
+Non-default alpha level tests.
 
 Validates that the full alpha pipeline (power accuracy, corrections,
 null calibration) works correctly at alpha != 0.05.
-Tests run on ALL available backends via the backend fixture.
 """
 
 import contextlib
@@ -12,7 +11,7 @@
 import numpy as np
 import pytest
 
-from tests.config import N_SIMS, SEED
+from tests.config import N_SIMS, N_SIMS_ORDERING, N_SIMS_STANDARD, SEED
 from tests.helpers.analytical import analytical_f_power, analytical_t_power
 from tests.helpers.mc_margins import mc_accuracy_margin, mc_margin
 from tests.helpers.power_helpers import get_power, get_power_corrected, make_null_model
@@ -44,7 +43,7 @@ class TestAlphaAccuracyVsAnalytical:
             (0.5, 100),
         ],
     )
-    def test_single_predictor_t_test_alpha(self, backend, alpha, beta, n):
+    def test_single_predictor_t_test_alpha(self, alpha, beta, n):
         """t-test power matches analytical non-central t at non-default alpha."""
         from mcpower import MCPower
 
@@ -63,7 +62,7 @@ def test_single_predictor_t_test_alpha(self, backend, alpha, beta, n):
         exact_power = analytical_t_power(beta, n, p=1, sigma_eps=1.0, vif_j=1.0, alpha=alpha)
         margin = mc_accuracy_margin(exact_power, N_SIMS)
         assert abs(mc_power - exact_power) < margin, (
-            f"[{backend}] alpha={alpha}, β={beta}, n={n}: MC={mc_power:.2f}%, analytical={exact_power:.2f}% ± {margin:.2f}%"
+            f"alpha={alpha}, β={beta}, n={n}: MC={mc_power:.2f}%, analytical={exact_power:.2f}% ± {margin:.2f}%"
         )
 
     @pytest.mark.parametrize("alpha", [0.01, 0.10])
@@ -74,7 +73,7 @@ def test_single_predictor_t_test_alpha(self, backend, alpha, beta, n):
             (0.5, 0.3, 80),
         ],
     )
-    def test_two_predictors_uncorrelated_alpha(self, backend, alpha, b1, b2, n):
+    def test_two_predictors_uncorrelated_alpha(self, alpha, b1, b2, n):
         """Each t-test and F-test with Σ = I at non-default alpha."""
         from mcpower import MCPower
 
@@ -103,14 +102,14 @@ def test_two_predictors_uncorrelated_alpha(self, backend, alpha, b1, b2, n):
             )
             margin = mc_accuracy_margin(exact, N_SIMS)
             assert abs(mc_power - exact) < margin, (
-                f"[{backend}] alpha={alpha}, {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
+                f"alpha={alpha}, {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
             )
 
         mc_f = get_power(result, "overall")
         exact_f = analytical_f_power([b1, b2], n, Sigma, sigma_eps=1.0, alpha=alpha)
         margin_f = mc_accuracy_margin(exact_f, N_SIMS)
         assert abs(mc_f - exact_f) < margin_f, (
-            f"[{backend}] alpha={alpha}, F-test: MC={mc_f:.2f}%, analytical={exact_f:.2f}% ± {margin_f:.2f}%"
+            f"alpha={alpha}, F-test: MC={mc_f:.2f}%, analytical={exact_f:.2f}% ± {margin_f:.2f}%"
         )
 
     @pytest.mark.parametrize("alpha", [0.01, 0.10])
@@ -121,7 +120,7 @@ def test_two_predictors_uncorrelated_alpha(self, backend, alpha, b1, b2, n):
             (0.5, 0.3, 0.5, 80),
         ],
     )
-    def test_two_predictors_correlated_alpha(self, backend, alpha, b1, b2, rho, n):
+    def test_two_predictors_correlated_alpha(self, alpha, b1, b2, rho, n):
         """VIF-corrected t-tests with correlated predictors at non-default alpha."""
         from mcpower import MCPower
 
@@ -154,7 +153,7 @@ def test_two_predictors_correlated_alpha(self, backend, alpha, b1, b2, rho, n):
             )
             margin = mc_accuracy_margin(exact, N_SIMS)
             assert abs(mc_power - exact) < margin, (
-                f"[{backend}] alpha={alpha}, rho={rho}, {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
+                f"alpha={alpha}, rho={rho}, {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
             )
 
 
@@ -170,9 +169,9 @@ class TestAlphaCorrectionAccuracy:
 
     @pytest.mark.parametrize("alpha", [0.01, 0.10])
     @pytest.mark.parametrize("correction", ["bonferroni", "holm", "fdr"])
-    def test_corrected_leq_uncorrected_at_alpha(self, backend, alpha, correction):
+    def test_corrected_leq_uncorrected_at_alpha(self, alpha, correction):
         """Corrected power <= uncorrected power when all effects = 0."""
-        m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS, alpha=alpha, seed=SEED)
+        m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS_ORDERING, alpha=alpha, seed=SEED)
         result = m.find_power(
             sample_size=100,
             target_test="x1, x2, x3",
@@ -184,14 +183,14 @@ def test_corrected_leq_uncorrected_at_alpha(self, backend, alpha, correction):
             uncorr = get_power(result, var)
             corr = get_power_corrected(result, var)
             assert corr <= uncorr + 0.5, (
-                f"[{backend}] alpha={alpha}, {correction}: corrected {corr:.2f}% > uncorrected {uncorr:.2f}% for {var}"
+                f"alpha={alpha}, {correction}: corrected {corr:.2f}% > uncorrected {uncorr:.2f}% for {var}"
             )
 
     @pytest.mark.parametrize("alpha", [0.01, 0.10])
     @pytest.mark.parametrize("correction", ["bonferroni", "holm"])
-    def test_fwer_controlled_at_alpha(self, backend, alpha, correction):
+    def test_fwer_controlled_at_alpha(self, alpha, correction):
         """FWER-controlling methods keep per-test rejection below nominal alpha."""
-        m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS, alpha=alpha, seed=SEED)
+        m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS_ORDERING, alpha=alpha, seed=SEED)
         result = m.find_power(
             sample_size=100,
             target_test="x1, x2, x3",
@@ -201,17 +200,17 @@ def test_fwer_controlled_at_alpha(self, backend, alpha, correction):
         )
         for var in ["x1", "x2", "x3"]:
             corr = get_power_corrected(result, var)
-            assert corr < alpha * 100 + mc_margin(alpha, N_SIMS), (
-                f"[{backend}] alpha={alpha}, {correction} FWER violation for {var}: corrected power = {corr:.2f}%"
+            assert corr < alpha * 100 + mc_margin(alpha, N_SIMS_ORDERING), (
+                f"alpha={alpha}, {correction} FWER violation for {var}: corrected power = {corr:.2f}%"
             )
 
     @pytest.mark.parametrize("alpha", [0.01, 0.10])
-    def test_bonferroni_more_conservative_than_fdr_at_alpha(self, backend, alpha):
+    def test_bonferroni_more_conservative_than_fdr_at_alpha(self, alpha):
         """Bonferroni should reject <= FDR (BH) under non-null at non-default alpha."""
         from mcpower import MCPower
 
         m = MCPower("y = x1 + x2 + x3")
-        m.set_simulations(N_SIMS)
+        m.set_simulations(N_SIMS_ORDERING)
         m.set_seed(SEED)
         m.set_alpha(alpha)
         m.set_effects("x1=0.3, x2=0.2, x3=0.1")
@@ -233,7 +232,7 @@ def test_bonferroni_more_conservative_than_fdr_at_alpha(self, backend, alpha):
         for var in ["x1", "x2", "x3"]:
             bonf = get_power_corrected(result_bonf, var)
             fdr = get_power_corrected(result_fdr, var)
-            assert bonf <= fdr + 2.0, f"[{backend}] alpha={alpha}: Bonferroni ({bonf:.2f}%) > FDR ({fdr:.2f}%) for {var}"
+            assert bonf <= fdr + 2.0, f"alpha={alpha}: Bonferroni ({bonf:.2f}%) > FDR ({fdr:.2f}%) for {var}"
 
 
 # ── Class 3: Null calibration at alpha != 0.05 (multi-predictor) ────
@@ -245,29 +244,29 @@ class TestAlphaCalibrationExtended:
     to multi-predictor models and corrected rejection under the null.
     """
 
-    @pytest.mark.parametrize("alpha", [0.01, 0.05, 0.10])
-    def test_null_rejection_multi_predictor(self, backend, alpha):
+    @pytest.mark.parametrize("alpha", [0.01, 0.10])
+    def test_null_rejection_multi_predictor(self, alpha):
         """Two-predictor null: each t-test and overall F-test reject at ~alpha."""
-        m = make_null_model("y = x1 + x2", n_sims=N_SIMS, alpha=alpha, seed=SEED)
+        m = make_null_model("y = x1 + x2", n_sims=N_SIMS_STANDARD, alpha=alpha, seed=SEED)
         result = m.find_power(
             sample_size=100,
             target_test="all",
             print_results=False,
             return_results=True,
         )
-        margin = mc_margin(alpha, N_SIMS)
+        margin = mc_margin(alpha, N_SIMS_STANDARD)
         expected = alpha * 100
         for test_name in ["x1", "x2", "overall"]:
             power = get_power(result, test_name)
             assert abs(power - expected) < margin, (
-                f"[{backend}] alpha={alpha}, {test_name}: observed {power:.2f}%, expected {expected}% ± {margin:.2f}%"
+                f"alpha={alpha}, {test_name}: observed {power:.2f}%, expected {expected}% ± {margin:.2f}%"
             )
 
-    @pytest.mark.parametrize("alpha", [0.01, 0.05, 0.10])
+    @pytest.mark.parametrize("alpha", [0.01, 0.10])
     @pytest.mark.parametrize("correction", ["bonferroni", "holm"])
-    def test_null_rejection_corrected_at_alpha(self, backend, alpha, correction):
+    def test_null_rejection_corrected_at_alpha(self, alpha, correction):
         """Corrected null rejection stays below alpha + MC margin for 3 predictors."""
-        m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS, alpha=alpha, seed=SEED)
+        m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS_STANDARD, alpha=alpha, seed=SEED)
         result = m.find_power(
             sample_size=100,
             target_test="x1, x2, x3",
@@ -275,9 +274,9 @@ def test_null_rejection_corrected_at_alpha(self, backend, alpha, correction):
             print_results=False,
             return_results=True,
         )
-        margin = mc_margin(alpha, N_SIMS)
+        margin = mc_margin(alpha, N_SIMS_STANDARD)
         for var in ["x1", "x2", "x3"]:
             corr = get_power_corrected(result, var)
             assert corr < alpha * 100 + margin, (
-                f"[{backend}] alpha={alpha}, {correction}, {var}: corrected rejection {corr:.2f}% exceeds {alpha * 100}% + {margin:.2f}%"
+                f"alpha={alpha}, {correction}, {var}: corrected rejection {corr:.2f}% exceeds {alpha * 100}% + {margin:.2f}%"
             )
diff --git a/tests/specs/test_corrections.py b/tests/specs/test_corrections.py
index b26b8e1..025aee7 100644
--- a/tests/specs/test_corrections.py
+++ b/tests/specs/test_corrections.py
@@ -1,7 +1,5 @@
 """
-Multiple comparison correction tests — backend-agnostic.
-
-Tests run on ALL available backends via the backend fixture.
+Multiple comparison correction tests.
 """
 
 import contextlib
@@ -9,7 +7,7 @@
 
 import pytest
 
-from tests.config import N_SIMS, SEED
+from tests.config import N_SIMS_ORDERING as N_SIMS, SEED
 from tests.helpers.mc_margins import mc_margin
 from tests.helpers.power_helpers import get_power, get_power_corrected, make_null_model
 
@@ -28,7 +26,7 @@ class TestCorrectionConservativeness:
     """
 
     @pytest.mark.parametrize("correction", ["bonferroni", "holm", "fdr"])
-    def test_corrected_leq_uncorrected_under_null(self, backend, correction):
+    def test_corrected_leq_uncorrected_under_null(self, correction):
         """Corrected power ≤ uncorrected power when all effects = 0."""
         m = make_null_model("y = x1 + x2 + x3", n_sims=N_SIMS, seed=SEED)
         result = m.find_power(
@@ -42,11 +40,11 @@ def test_corrected_leq_uncorrected_under_null(self, backend, correction):
             uncorr = get_power(result, var)
             corr = get_power_corrected(result, var)
             assert corr <= uncorr + 0.5, (  # tiny tolerance for MC noise
-                f"[{backend}] {correction}: corrected {corr:.2f}% > uncorrected {uncorr:.2f}% for {var}"
+                f"{correction}: corrected {corr:.2f}% > uncorrected {uncorr:.2f}% for {var}"
             )
 
     @pytest.mark.parametrize("correction", ["bonferroni", "holm"])
-    def test_fwer_controlled_under_null(self, backend, correction):
+    def test_fwer_controlled_under_null(self, correction):
         """
         Family-wise error rate under H0 should be ≤ alpha.
 
@@ -65,10 +63,10 @@ def test_fwer_controlled_under_null(self, backend, correction):
             # Under complete null, FWER-controlling methods should have
             # per-test rejection well below the nominal alpha
             assert corr < m.alpha * 100 + mc_margin(m.alpha, m.n_simulations), (
-                f"[{backend}] {correction} FWER violation for {var}: corrected power = {corr:.2f}%"
+                f"{correction} FWER violation for {var}: corrected power = {corr:.2f}%"
             )
 
-    def test_bonferroni_more_conservative_than_fdr(self, backend):
+    def test_bonferroni_more_conservative_than_fdr(self):
         """Bonferroni should reject ≤ FDR (BH) under non-null."""
         from mcpower import MCPower
 
@@ -95,4 +93,4 @@ def test_bonferroni_more_conservative_than_fdr(self, backend):
             bonf = get_power_corrected(result_bonf, var)
             fdr = get_power_corrected(result_fdr, var)
             # Bonferroni ≤ BH-FDR (with MC tolerance)
-            assert bonf <= fdr + 2.0, f"[{backend}] Bonferroni ({bonf:.2f}%) > FDR ({fdr:.2f}%) for {var}"
+            assert bonf <= fdr + 2.0, f"Bonferroni ({bonf:.2f}%) > FDR ({fdr:.2f}%) for {var}"
diff --git a/tests/specs/test_monotonicity.py b/tests/specs/test_monotonicity.py
index 4261b99..bac2fe9 100644
--- a/tests/specs/test_monotonicity.py
+++ b/tests/specs/test_monotonicity.py
@@ -1,8 +1,7 @@
 """
-Power monotonicity tests — backend-agnostic.
+Power monotonicity tests.
 
 Power must increase with effect size, sample size, and alpha.
-Tests run on ALL available backends via the backend fixture.
 """
 
 import contextlib
@@ -10,7 +9,7 @@
 
 import pytest
 
-from tests.config import N_SIMS, SEED
+from tests.config import N_SIMS_ORDERING as N_SIMS, SEED
 from tests.helpers.power_helpers import get_power
 
 
@@ -24,7 +23,7 @@ def _quiet():
 class TestPowerMonotonicity:
     """Power must increase with effect size, sample size, and alpha."""
 
-    def test_power_increases_with_effect_size(self, backend):
+    def test_power_increases_with_effect_size(self):
         """Larger standardised beta → higher power."""
         from mcpower import MCPower
 
@@ -43,9 +42,9 @@ def test_power_increases_with_effect_size(self, backend):
             powers.append(get_power(result, "x1"))
 
         for i in range(len(powers) - 1):
-            assert powers[i] < powers[i + 1], f"[{backend}] Power not monotonic in effect size: {powers}"
+            assert powers[i] < powers[i + 1], f"Power not monotonic in effect size: {powers}"
 
-    def test_power_increases_with_sample_size(self, backend):
+    def test_power_increases_with_sample_size(self):
         """Larger N → higher power (for non-zero effect)."""
         from mcpower import MCPower
 
@@ -64,9 +63,9 @@ def test_power_increases_with_sample_size(self, backend):
             powers.append(get_power(result, "x1"))
 
         for i in range(len(powers) - 1):
-            assert powers[i] < powers[i + 1], f"[{backend}] Power not monotonic in N: {powers}"
+            assert powers[i] < powers[i + 1], f"Power not monotonic in N: {powers}"
 
-    def test_power_increases_with_alpha(self, backend):
+    def test_power_increases_with_alpha(self):
         """Less stringent alpha → higher power."""
         from mcpower import MCPower
 
@@ -86,13 +85,13 @@ def test_power_increases_with_alpha(self, backend):
             powers.append(get_power(result, "x1"))
 
         for i in range(len(powers) - 1):
-            assert powers[i] < powers[i + 1], f"[{backend}] Power not monotonic in alpha: {powers}"
+            assert powers[i] < powers[i + 1], f"Power not monotonic in alpha: {powers}"
 
 
 class TestPowerConvergence:
     """Power must approach 100% when signal is overwhelming."""
 
-    def test_large_effect_high_power(self, backend):
+    def test_large_effect_high_power(self):
         """Very large effect → power near 100%."""
         from mcpower import MCPower
 
@@ -107,9 +106,9 @@ def test_large_effect_high_power(self, backend):
             return_results=True,
         )
         power = get_power(result, "x1")
-        assert power > 99.0, f"[{backend}] Large-effect power should be ~100%, got {power:.2f}%"
+        assert power > 99.0, f"Large-effect power should be ~100%, got {power:.2f}%"
 
-    def test_large_n_moderate_effect(self, backend):
+    def test_large_n_moderate_effect(self):
         """Large N with moderate effect → power near 100%."""
         from mcpower import MCPower
 
@@ -124,4 +123,4 @@ def test_large_n_moderate_effect(self, backend):
             return_results=True,
         )
         power = get_power(result, "x1")
-        assert power > 99.0, f"[{backend}] Large-N power should be ~100%, got {power:.2f}%"
+        assert power > 99.0, f"Large-N power should be ~100%, got {power:.2f}%"
diff --git a/tests/specs/test_power_accuracy.py b/tests/specs/test_power_accuracy.py
index 755d0bb..fff7235 100644
--- a/tests/specs/test_power_accuracy.py
+++ b/tests/specs/test_power_accuracy.py
@@ -1,9 +1,8 @@
 """
-Power accuracy tests — backend-agnostic.
+Power accuracy tests.
 
 Compare MC power estimates against exact analytical power from
 non-central t / F distributions.
-Tests run on ALL available backends via the backend fixture.
 """
 
 import contextlib
@@ -42,7 +41,7 @@ class TestAccuracyVsAnalytical:
             (0.5, 150),
         ],
     )
-    def test_single_predictor_t_test(self, backend, beta, n):
+    def test_single_predictor_t_test(self, beta, n):
         """t-test power matches analytical non-central t."""
         from mcpower import MCPower
 
@@ -60,7 +59,7 @@ def test_single_predictor_t_test(self, backend, beta, n):
         exact_power = analytical_t_power(beta, n, p=1, sigma_eps=1.0, vif_j=1.0)
         margin = mc_accuracy_margin(exact_power, N_SIMS)
         assert abs(mc_power - exact_power) < margin, (
-            f"[{backend}] β={beta}, n={n}: MC={mc_power:.2f}%, analytical={exact_power:.2f}% ± {margin:.2f}%"
+            f"β={beta}, n={n}: MC={mc_power:.2f}%, analytical={exact_power:.2f}% ± {margin:.2f}%"
         )
 
     @pytest.mark.parametrize(
@@ -71,7 +70,7 @@ def test_single_predictor_t_test(self, backend, beta, n):
             (0.2, 0.2, 200),
         ],
     )
-    def test_two_predictors_uncorrelated(self, backend, b1, b2, n):
+    def test_two_predictors_uncorrelated(self, b1, b2, n):
         """Each t-test and F-test with Σ = I."""
         from mcpower import MCPower
 
@@ -91,12 +90,12 @@ def test_two_predictors_uncorrelated(self, backend, b1, b2, n):
             mc_power = get_power(result, var)
             exact = analytical_t_power(beta, n, p=2, sigma_eps=1.0, vif_j=1.0)
             margin = mc_accuracy_margin(exact, N_SIMS)
-            assert abs(mc_power - exact) < margin, f"[{backend}] {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
+            assert abs(mc_power - exact) < margin, f"{var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
 
         mc_f = get_power(result, "overall")
         exact_f = analytical_f_power([b1, b2], n, Sigma, sigma_eps=1.0)
         margin_f = mc_accuracy_margin(exact_f, N_SIMS)
-        assert abs(mc_f - exact_f) < margin_f, f"[{backend}] F-test: MC={mc_f:.2f}%, analytical={exact_f:.2f}% ± {margin_f:.2f}%"
+        assert abs(mc_f - exact_f) < margin_f, f"F-test: MC={mc_f:.2f}%, analytical={exact_f:.2f}% ± {margin_f:.2f}%"
 
     @pytest.mark.parametrize(
         "b1,b2,rho,n",
@@ -107,7 +106,7 @@ def test_two_predictors_uncorrelated(self, backend, b1, b2, n):
             (0.5, 0.3, 0.5, 80),
         ],
     )
-    def test_two_predictors_correlated_t_tests(self, backend, b1, b2, rho, n):
+    def test_two_predictors_correlated_t_tests(self, b1, b2, rho, n):
         """Individual t-tests with correlated predictors: VIF matters."""
         from mcpower import MCPower
 
@@ -132,5 +131,5 @@ def test_two_predictors_correlated_t_tests(self, backend, b1, b2, rho, n):
             exact = analytical_t_power(beta, n, p=2, sigma_eps=1.0, vif_j=vif)
             margin = mc_accuracy_margin(exact, N_SIMS)
             assert abs(mc_power - exact) < margin, (
-                f"[{backend}] rho={rho}, {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
+                f"rho={rho}, {var}: MC={mc_power:.2f}%, analytical={exact:.2f}% ± {margin:.2f}%"
             )
diff --git a/tests/specs/test_type1_error.py b/tests/specs/test_type1_error.py
index c56b4fe..565d80e 100644
--- a/tests/specs/test_type1_error.py
+++ b/tests/specs/test_type1_error.py
@@ -1,8 +1,7 @@
 """
-Type I error control tests — backend-agnostic.
+Type I error control tests.
 
 Under H0 (effect = 0), rejection rate must equal alpha.
-Tests run on ALL available backends via the backend fixture.
 """
 
 import contextlib
@@ -10,7 +9,7 @@
 
 import pytest
 
-from tests.config import N_SIMS, SEED
+from tests.config import N_SIMS_STANDARD as N_SIMS, SEED
 from tests.helpers.mc_margins import mc_margin
 from tests.helpers.power_helpers import get_power, make_null_model
 
@@ -25,7 +24,7 @@ def _quiet():
 class TestTypeIErrorControl:
     """Under H0 (effect = 0), rejection rate must equal alpha."""
 
-    def test_single_predictor_null_overall(self, backend):
+    def test_single_predictor_null_overall(self):
         """F-test rejection rate ≈ alpha with one predictor at zero effect."""
         m = make_null_model("y = x1", n_sims=N_SIMS, seed=SEED)
         result = m.find_power(
@@ -37,9 +36,9 @@ def test_single_predictor_null_overall(self, backend):
         power = get_power(result, "overall")
         margin = mc_margin(m.alpha, m.n_simulations)
         expected = m.alpha * 100
-        assert abs(power - expected) < margin, f"[{backend}] F-test power under H0: {power:.2f}%, expected {expected}% ± {margin:.2f}%"
+        assert abs(power - expected) < margin, f"F-test power under H0: {power:.2f}%, expected {expected}% ± {margin:.2f}%"
 
-    def test_single_predictor_null_individual(self, backend):
+    def test_single_predictor_null_individual(self):
         """t-test rejection rate ≈ alpha for a single zero-effect predictor."""
         m = make_null_model("y = x1", n_sims=N_SIMS, seed=SEED)
         result = m.find_power(
@@ -51,9 +50,9 @@ def test_single_predictor_null_individual(self, backend):
         power = get_power(result, "x1")
         margin = mc_margin(m.alpha, m.n_simulations)
         expected = m.alpha * 100
-        assert abs(power - expected) < margin, f"[{backend}] t-test power under H0: {power:.2f}%, expected {expected}% ± {margin:.2f}%"
+        assert abs(power - expected) < margin, f"t-test power under H0: {power:.2f}%, expected {expected}% ± {margin:.2f}%"
 
-    def test_two_predictors_null_each(self, backend):
+    def test_two_predictors_null_each(self):
         """Both predictors at zero → each t-test rejects at ~alpha."""
         m = make_null_model("y = x1 + x2", n_sims=N_SIMS, seed=SEED)
         result = m.find_power(
@@ -66,9 +65,9 @@ def test_two_predictors_null_each(self, backend):
         expected = m.alpha * 100
         for var in ["x1", "x2"]:
             power = get_power(result, var)
-            assert abs(power - expected) < margin, f"[{backend}] {var} power under H0: {power:.2f}%, expected {expected}% ± {margin:.2f}%"
+            assert abs(power - expected) < margin, f"{var} power under H0: {power:.2f}%, expected {expected}% ± {margin:.2f}%"
 
-    def test_large_sample_null(self, backend):
+    def test_large_sample_null(self):
         """
         Large N with zero effect must NOT inflate Type I error.
 
@@ -76,7 +75,7 @@ def test_large_sample_null(self, backend):
         """
         m = make_null_model("y = x1", n_sims=N_SIMS, seed=SEED)
         result = m.find_power(
-            sample_size=1000,
+            sample_size=500,
             target_test="x1",
             print_results=False,
             return_results=True,
@@ -85,7 +84,7 @@ def test_large_sample_null(self, backend):
         margin = mc_margin(m.alpha, m.n_simulations)
         expected = m.alpha * 100
         assert abs(power - expected) < margin, (
-            f"[{backend}] Large-N null power: {power:.2f}%, expected {expected}% ± {margin:.2f}% (Type I error inflated with N?)"
+            f"Large-N null power: {power:.2f}%, expected {expected}% ± {margin:.2f}% (Type I error inflated with N?)"
         )
 
 
@@ -93,7 +92,7 @@ class TestAlphaCalibration:
     """Rejection rate tracks the nominal alpha across levels."""
 
     @pytest.mark.parametrize("alpha", [0.01, 0.05, 0.10])
-    def test_null_rejection_matches_alpha(self, backend, alpha):
+    def test_null_rejection_matches_alpha(self, alpha):
         m = make_null_model("y = x1", n_sims=N_SIMS, alpha=alpha, seed=SEED)
         result = m.find_power(
             sample_size=100,
@@ -104,4 +103,4 @@ def test_null_rejection_matches_alpha(self, backend, alpha):
         power = get_power(result, "x1")
         margin = mc_margin(alpha, m.n_simulations)
         expected = alpha * 100
-        assert abs(power - expected) < margin, f"[{backend}] alpha={alpha}: observed {power:.2f}%, expected {expected}% ± {margin:.2f}%"
+        assert abs(power - expected) < margin, f"alpha={alpha}: observed {power:.2f}%, expected {expected}% ± {margin:.2f}%"
diff --git a/tests/unit/test_distributions.py b/tests/unit/test_distributions.py
index d3e77f9..693d1ff 100644
--- a/tests/unit/test_distributions.py
+++ b/tests/unit/test_distributions.py
@@ -24,7 +24,6 @@
 import pytest
 
 from mcpower.stats.distributions import (
-    _BACKEND,
     chi2_cdf,
     chi2_ppf,
     compute_critical_values_lme,
@@ -587,29 +586,6 @@ def test_studentized_range_k_too_large_returns_inf(self):
 
 # ===========================================================================
 # 15. Backend detection
-# ===========================================================================
-class TestBackendDetection:
-    """Verify the distribution backend is correctly detected."""
-
-    def test_backend_is_set(self):
-        assert _BACKEND is not None
-
-    def test_backend_is_string(self):
-        assert isinstance(_BACKEND, str)
-
-    def test_backend_is_known_value(self):
-        assert _BACKEND in ("native", "scipy")
-
-    def test_native_backend_when_compiled(self):
-        """When the C++ extension is compiled, backend should be 'native'."""
-        try:
-            import mcpower.backends.mcpower_native  # noqa: F401
-
-            assert _BACKEND == "native"
-        except ImportError:
-            pytest.skip("C++ native backend not compiled")
-
-
 # ===========================================================================
 # Cross-consistency checks
 # ===========================================================================
diff --git a/tests/unit/test_distributions_coverage.py b/tests/unit/test_distributions_coverage.py
new file mode 100644
index 0000000..224c8e6
--- /dev/null
+++ b/tests/unit/test_distributions_coverage.py
@@ -0,0 +1,42 @@
+"""Tests for distributions.py — optimizer functions and edge cases."""
+
+import numpy as np
+import pytest
+
+from mcpower.stats.distributions import minimize_lbfgsb, minimize_scalar_brent
+
+
+class TestOptimizerLBFGSB:
+    """L-BFGS-B optimizer via native backend."""
+
+    def test_finds_correct_minimum(self):
+        # Simple quadratic: f(x) = (x-2)^2
+        result = minimize_lbfgsb(
+            lambda x: float((x[0] - 2) ** 2),
+            x0=np.array([0.0]),
+            bounds=[(-10.0, 10.0)],
+        )
+        assert abs(result.x[0] - 2.0) < 0.01
+        assert result.fun < 0.01
+
+
+class TestOptimizerBrent:
+    """Brent scalar minimizer via native backend."""
+
+    def test_finds_correct_minimum(self):
+        # f(x) = (x - 3)^2
+        result = minimize_scalar_brent(
+            lambda x: (x - 3) ** 2,
+            bounds=(0.0, 10.0),
+        )
+        assert abs(result.x - 3.0) < 0.01
+        assert result.fun < 0.01
+
+    def test_converged_flag(self):
+        result = minimize_scalar_brent(
+            lambda x: (x - 5) ** 2,
+            bounds=(0.0, 10.0),
+        )
+        assert result.converged
+
+
diff --git a/tests/unit/test_formatters_edge.py b/tests/unit/test_formatters_edge.py
new file mode 100644
index 0000000..f52fd25
--- /dev/null
+++ b/tests/unit/test_formatters_edge.py
@@ -0,0 +1,230 @@
+"""Tests for formatter edge cases — scenario sample-size long format, cumulative recs, NaN filtering."""
+
+import math
+
+import pytest
+
+from mcpower.utils.formatters import _ResultFormatter, _is_nan
+
+
+_fmt = _ResultFormatter()
+
+
+def _make_scenario_sample_size_data(
+    target_tests=("x1", "x2"),
+    correction=None,
+    sample_sizes=(50, 100, 150),
+    optimistic_achieved=None,
+    realistic_achieved=None,
+    doomer_achieved=None,
+):
+    """Build a scenario sample_size result dict for formatting tests."""
+    if optimistic_achieved is None:
+        optimistic_achieved = {"x1": 50, "x2": 100}
+    if realistic_achieved is None:
+        realistic_achieved = {"x1": 100, "x2": 150}
+    if doomer_achieved is None:
+        doomer_achieved = {"x1": 0, "x2": 0}  # Not achieved
+
+    def _make_scenario(achieved):
+        achieved_corr = {t: -1 for t in target_tests} if not correction else achieved
+        return {
+            "model": {
+                "target_tests": list(target_tests),
+                "correction": correction,
+                "sample_size_range": {"from_size": sample_sizes[0], "to_size": sample_sizes[-1]},
+                "target_power": 80.0,
+            },
+            "results": {
+                "first_achieved": achieved,
+                "first_achieved_corrected": achieved_corr,
+                "sample_sizes_tested": list(sample_sizes),
+                "powers_by_test": {
+                    t: [30.0 + 25.0 * i for i in range(len(sample_sizes))]
+                    for t in target_tests
+                },
+                "powers_by_test_corrected": (
+                    {t: [25.0 + 25.0 * i for i in range(len(sample_sizes))] for t in target_tests}
+                    if correction
+                    else None
+                ),
+            },
+        }
+
+    return {
+        "analysis_type": "sample_size",
+        "scenarios": {
+            "optimistic": _make_scenario(optimistic_achieved),
+            "realistic": _make_scenario(realistic_achieved),
+            "doomer": _make_scenario(doomer_achieved),
+        },
+        "comparison": {},
+    }
+
+
+class TestScenarioSampleSizeLongFormat:
+    """Test _format_scenario_sample_size with summary='long'."""
+
+    def test_recommendations_present(self):
+        data = _make_scenario_sample_size_data()
+        output = _fmt.format("scenario_sample_size", data, "long")
+        assert "RECOMMENDATIONS" in output
+
+    def test_unachievable_tests_warning(self):
+        data = _make_scenario_sample_size_data(
+            doomer_achieved={"x1": 0, "x2": 0},
+        )
+        output = _fmt.format("scenario_sample_size", data, "long")
+        assert "Warning" in output or "may not achieve" in output
+
+    def test_realistic_recommendation_shown(self):
+        data = _make_scenario_sample_size_data(
+            realistic_achieved={"x1": 100, "x2": 150},
+        )
+        output = _fmt.format("scenario_sample_size", data, "long")
+        assert "150" in output  # max N for realistic
+
+    def test_short_format_produces_table(self):
+        data = _make_scenario_sample_size_data()
+        output = _fmt.format("scenario_sample_size", data, "short")
+        assert "SCENARIO SUMMARY" in output
+
+    def test_with_correction(self):
+        data = _make_scenario_sample_size_data(correction="bonferroni")
+        output = _fmt.format("scenario_sample_size", data, "short")
+        assert "Opt(U)" in output or "Uncorrected" in output.lower() or "(U)" in output
+
+
+class TestCumulativeRecommendations:
+    """Test _format_cumulative_recommendations paths."""
+
+    def test_non_scenario_target_met(self):
+        data = {
+            "model": {
+                "target_tests": ["x1", "x2"],
+                "target_power": 80.0,
+            },
+            "results": {
+                "sample_sizes_tested": [50, 100, 150],
+                "powers_by_test": {
+                    "x1": [60.0, 85.0, 95.0],
+                    "x2": [70.0, 90.0, 98.0],
+                },
+            },
+        }
+        lines = _fmt._format_cumulative_recommendations(data, is_scenario=False)
+        joined = "\n".join(lines)
+        assert "N=" in joined  # Found a sample size
+
+    def test_non_scenario_target_not_met(self):
+        data = {
+            "model": {
+                "target_tests": ["x1", "x2"],
+                "target_power": 80.0,
+            },
+            "results": {
+                "sample_sizes_tested": [50, 100],
+                "powers_by_test": {
+                    "x1": [10.0, 20.0],
+                    "x2": [15.0, 25.0],
+                },
+            },
+        }
+        lines = _fmt._format_cumulative_recommendations(data, is_scenario=False)
+        joined = "\n".join(lines)
+        assert ">100" in joined  # Exceeded max tested
+
+    def test_scenario_recommendations(self):
+        data = _make_scenario_sample_size_data(
+            sample_sizes=(50, 100, 150, 200),
+            optimistic_achieved={"x1": 100, "x2": 150},
+        )
+        # Override powers so all > 80%
+        for scenario in data["scenarios"].values():
+            scenario["results"]["powers_by_test"] = {
+                "x1": [50.0, 85.0, 92.0, 98.0],
+                "x2": [40.0, 75.0, 88.0, 95.0],
+            }
+        lines = _fmt._format_cumulative_recommendations(data, is_scenario=True)
+        assert len(lines) > 0
+
+    def test_empty_scenarios(self):
+        data = {"scenarios": {}}
+        lines = _fmt._format_cumulative_recommendations(data, is_scenario=True)
+        assert lines == []
+
+    def test_no_results_key(self):
+        data = {}
+        lines = _fmt._format_cumulative_recommendations(data, is_scenario=False)
+        assert lines == []
+
+
+class TestNaNPowerFiltering:
+    """NaN power values in cumulative table should be filtered out."""
+
+    def test_nan_power_filtered_in_cumulative_sample_size_table(self):
+        lines = []
+        _fmt._add_cumulative_sample_size_table(
+            lines,
+            sample_sizes=[50, 100],
+            target_tests=["x1", "x2_nan"],
+            powers_by_test={
+                "x1": [50.0, 80.0],
+                "x2_nan": [float("nan"), float("nan")],
+            },
+        )
+        # Should still produce output for x1 (x2_nan filtered out)
+        output = "\n".join(lines)
+        assert "N=50" in output or "50" in output
+
+    def test_all_nan_produces_no_table(self):
+        lines = []
+        _fmt._add_cumulative_sample_size_table(
+            lines,
+            sample_sizes=[50],
+            target_tests=["x1"],
+            powers_by_test={"x1": [float("nan")]},
+        )
+        # All NaN → no valid tests → no table
+        assert len(lines) == 0
+
+
+class TestIsNan:
+    """Test _is_nan utility."""
+
+    def test_nan_float(self):
+        assert _is_nan(float("nan"))
+
+    def test_regular_float(self):
+        assert not _is_nan(42.0)
+
+    def test_non_float(self):
+        assert not _is_nan("nan")
+        assert not _is_nan(None)
+        assert not _is_nan(42)
+
+
+class TestExtractScenarioMeta:
+    """Test _extract_scenario_meta."""
+
+    def test_no_model_returns_none(self):
+        target_tests, correction = _fmt._extract_scenario_meta({"opt": {"results": {}}})
+        assert target_tests is None
+
+    def test_extracts_from_first_scenario(self):
+        scenarios = {
+            "optimistic": {
+                "model": {"target_tests": ["a", "b"], "correction": "holm"},
+            }
+        }
+        target_tests, correction = _fmt._extract_scenario_meta(scenarios)
+        assert target_tests == ["a", "b"]
+        assert correction == "holm"
+
+
+class TestFormatUnknownType:
+    """Unknown result type should raise."""
+
+    def test_unknown_result_type(self):
+        with pytest.raises(ValueError, match="Unknown result type"):
+            _fmt.format("nonexistent", {})
diff --git a/tests/unit/test_mixed_models_coverage.py b/tests/unit/test_mixed_models_coverage.py
new file mode 100644
index 0000000..bc99974
--- /dev/null
+++ b/tests/unit/test_mixed_models_coverage.py
@@ -0,0 +1,292 @@
+"""Tests for stats/mixed_models.py — statsmodels convergence, corrections, native wrappers.
+
+Uses pytest.mark.lme to skip when statsmodels is not installed.
+"""
+
+import warnings
+from unittest.mock import MagicMock, patch
+
+import numpy as np
+import pytest
+
+from mcpower.stats.mixed_models import (
+    _ensure_lme_crits,
+    _lme_analysis_wrapper,
+    _wrap_native_result,
+    reset_warm_start_cache,
+)
+
+pytestmark = pytest.mark.lme
+
+
+class TestWrapNativeResult:
+    """Test _wrap_native_result helper."""
+
+    def test_non_empty_non_verbose(self):
+        result = np.array([1.0, 0.0, 1.0])
+        wrapped = _wrap_native_result(result, verbose=False, solver_name="native_q1")
+        np.testing.assert_array_equal(wrapped, result)
+
+    def test_non_empty_verbose(self):
+        result = np.array([1.0, 0.0, 1.0])
+        wrapped = _wrap_native_result(result, verbose=True, solver_name="native_q1")
+        assert isinstance(wrapped, dict)
+        assert "results" in wrapped
+        assert "diagnostics" in wrapped
+        assert wrapped["diagnostics"]["solver"] == "native_q1"
+
+    def test_non_empty_verbose_with_extra_diag(self):
+        result = np.array([1.0])
+        wrapped = _wrap_native_result(
+            result, verbose=True, solver_name="native_general",
+            extra_diag={"q": 3},
+        )
+        assert wrapped["diagnostics"]["q"] == 3
+
+    def test_empty_non_verbose_returns_none(self):
+        result = np.array([])
+        assert _wrap_native_result(result, verbose=False, solver_name="native_q1") is None
+
+    def test_empty_verbose_returns_failure_dict(self):
+        result = np.array([])
+        wrapped = _wrap_native_result(result, verbose=True, solver_name="native_q1")
+        assert wrapped["results"] is None
+        assert "failure_reason" in wrapped
+        assert "empty result" in wrapped["failure_reason"]
+
+
+class TestEnsureLMECrits:
+    """Test _ensure_lme_crits computes when None."""
+
+    def test_computes_when_none(self):
+        chi2, z, crits = _ensure_lme_crits(
+            alpha=0.05, p=3, n_targets=2, correction_method=0,
+            chi2_crit=None, z_crit=None, correction_z_crits=None,
+        )
+        assert np.isfinite(chi2)
+        assert np.isfinite(z)
+        assert len(crits) == 2
+
+    def test_passthrough_when_provided(self):
+        chi2, z, crits = _ensure_lme_crits(
+            alpha=0.05, p=3, n_targets=2, correction_method=0,
+            chi2_crit=7.8, z_crit=1.96, correction_z_crits=np.array([1.96, 1.96]),
+        )
+        assert chi2 == 7.8
+        assert z == 1.96
+        assert len(crits) == 2
+
+
+class TestLMEAnalysisWrapperRouting:
+    """Test _lme_analysis_wrapper routes to correct backend."""
+
+    def test_unknown_backend_raises(self):
+        with pytest.raises(ValueError, match="Unknown backend"):
+            _lme_analysis_wrapper(
+                np.eye(10), np.ones(10), np.array([0, 1]),
+                np.zeros(10, dtype=np.int32),
+                correction_method=0, alpha=0.05, backend="nonexistent",
+            )
+
+
+class TestStatsmodelsConvergence:
+    """Test statsmodels fallback path with mocked MixedLM."""
+
+    def _make_mock_result(self, converged=True, params=None, pvalues=None, n_params=3):
+        """Create a mock MixedLM result."""
+        result = MagicMock()
+        result.converged = converged
+        result.params = params if params is not None else np.array([1.0, 0.5, 0.3])
+        result.pvalues = pvalues if pvalues is not None else np.array([0.01, 0.02, 0.04])
+        result.fe_params = result.params
+        result.bse = np.array([0.1, 0.1, 0.1])
+
+        # cov_re: random effects variance (needs .iloc[0, 0])
+        cov_re = MagicMock()
+        cov_re.iloc.__getitem__ = MagicMock(return_value=0.5)
+        result.cov_re = cov_re
+
+        result.scale = 1.0
+        result.llf = -50.0
+
+        # Make cov_params return a proper matrix
+        result.cov_params.return_value = np.eye(n_params) * 0.01
+
+        # model attribute
+        result.model = MagicMock()
+        result.model.exog = MagicMock()
+        result.model.exog.shape = (100, n_params)
+
+        return result
+
+    @patch("statsmodels.regression.mixed_linear_model.MixedLM")
+    def test_warm_start_retry_chain(self, mock_mixedlm_cls):
+        """First fit fails, cold start succeeds."""
+        from mcpower.stats.mixed_models import _lme_analysis_statsmodels, _lme_thread_local
+
+        _lme_thread_local.warm_start_params = np.array([1.0, 0.5, 0.3])
+
+        mock_model = MagicMock()
+        mock_mixedlm_cls.return_value = mock_model
+
+        good_result = self._make_mock_result()
+        mock_model.fit.side_effect = [
+            Exception("warm start diverged"),
+            good_result,
+        ]
+        mock_model.loglike.return_value = -50.0
+
+        result = _lme_analysis_statsmodels(
+            X_expanded=np.random.randn(100, 2),
+            y=np.random.randn(100),
+            target_indices=np.array([0, 1]),
+            cluster_ids=np.repeat(np.arange(10), 10),
+
+            correction_method=0,
+            alpha=0.05,
+        )
+        assert result is not None
+
+    @patch("statsmodels.regression.mixed_linear_model.MixedLM")
+    def test_all_attempts_fail_returns_none(self, mock_mixedlm_cls):
+        from mcpower.stats.mixed_models import _lme_analysis_statsmodels, _lme_thread_local
+
+        _lme_thread_local.warm_start_params = None
+
+        mock_model = MagicMock()
+        mock_mixedlm_cls.return_value = mock_model
+        mock_model.fit.side_effect = Exception("always fails")
+
+        result = _lme_analysis_statsmodels(
+            X_expanded=np.random.randn(100, 2),
+            y=np.random.randn(100),
+            target_indices=np.array([0, 1]),
+            cluster_ids=np.repeat(np.arange(10), 10),
+
+            correction_method=0,
+            alpha=0.05,
+        )
+        assert result is None
+
+    @patch("statsmodels.regression.mixed_linear_model.MixedLM")
+    def test_all_attempts_fail_verbose_returns_dict(self, mock_mixedlm_cls):
+        from mcpower.stats.mixed_models import _lme_analysis_statsmodels, _lme_thread_local
+
+        _lme_thread_local.warm_start_params = None
+
+        mock_model = MagicMock()
+        mock_mixedlm_cls.return_value = mock_model
+        mock_model.fit.side_effect = Exception("always fails")
+
+        result = _lme_analysis_statsmodels(
+            X_expanded=np.random.randn(100, 2),
+            y=np.random.randn(100),
+            target_indices=np.array([0, 1]),
+            cluster_ids=np.repeat(np.arange(10), 10),
+
+            correction_method=0,
+            alpha=0.05,
+            verbose=True,
+        )
+        assert isinstance(result, dict)
+        assert result["results"] is None
+        assert "failure_reason" in result
+
+    @patch("statsmodels.regression.mixed_linear_model.MixedLM")
+    def test_not_converged_returns_none(self, mock_mixedlm_cls):
+        """When result.converged is False for all attempts."""
+        from mcpower.stats.mixed_models import _lme_analysis_statsmodels, _lme_thread_local
+
+        _lme_thread_local.warm_start_params = None
+
+        mock_model = MagicMock()
+        mock_mixedlm_cls.return_value = mock_model
+
+        bad_result = self._make_mock_result(converged=False)
+        mock_model.fit.return_value = bad_result
+
+        result = _lme_analysis_statsmodels(
+            X_expanded=np.random.randn(100, 2),
+            y=np.random.randn(100),
+            target_indices=np.array([0, 1]),
+            cluster_ids=np.repeat(np.arange(10), 10),
+
+            correction_method=0,
+            alpha=0.05,
+        )
+        assert result is None
+
+
+class TestCorrections:
+    """Test statsmodels FDR, Holm, Bonferroni, no-correction paths."""
+
+    def _make_mock_result(self):
+        result = MagicMock()
+        result.converged = True
+        result.params = np.array([1.0, 0.5, 0.3])
+        result.pvalues = np.array([0.001, 0.02, 0.04])
+        result.fe_params = result.params
+        result.bse = np.array([0.1, 0.1, 0.1])
+        result.scale = 1.0
+        result.llf = -50.0
+        result.model = MagicMock()
+        result.model.exog = MagicMock()
+        result.model.exog.shape = (100, 3)
+
+        cov_re_mock = MagicMock()
+        cov_re_mock.iloc.__getitem__ = MagicMock(return_value=0.5)
+        result.cov_re = cov_re_mock
+        result.cov_params.return_value = np.eye(3) * 0.01
+
+        return result
+
+    def _run_with_correction(self, correction_method):
+        from mcpower.stats.mixed_models import _lme_analysis_statsmodels, _lme_thread_local
+
+        _lme_thread_local.warm_start_params = None
+
+        mock_result = self._make_mock_result()
+
+        with patch("statsmodels.regression.mixed_linear_model.MixedLM") as mock_cls:
+            mock_model = MagicMock()
+            mock_cls.return_value = mock_model
+            mock_model.fit.return_value = mock_result
+            mock_model.loglike.return_value = -50.0
+
+            out = _lme_analysis_statsmodels(
+                X_expanded=np.random.randn(100, 2),
+                y=np.random.randn(100),
+                target_indices=np.array([0, 1]),
+                cluster_ids=np.repeat(np.arange(10), 10),
+    
+                correction_method=correction_method,
+                alpha=0.05,
+            )
+        return out
+
+    def test_no_correction(self):
+        result = self._run_with_correction(0)
+        assert result is not None
+
+    def test_bonferroni(self):
+        result = self._run_with_correction(1)
+        assert result is not None
+
+    def test_fdr(self):
+        result = self._run_with_correction(2)
+        assert result is not None
+
+    def test_holm(self):
+        result = self._run_with_correction(3)
+        assert result is not None
+
+
+class TestResetWarmStartCache:
+    """Test reset_warm_start_cache."""
+
+    def test_clears_params(self):
+        from mcpower.stats.mixed_models import _lme_thread_local
+
+        _lme_thread_local.warm_start_params = np.array([1.0])
+        reset_warm_start_cache()
+        assert _lme_thread_local.warm_start_params is None
diff --git a/tests/unit/test_model_coverage.py b/tests/unit/test_model_coverage.py
new file mode 100644
index 0000000..23d47c9
--- /dev/null
+++ b/tests/unit/test_model_coverage.py
@@ -0,0 +1,117 @@
+"""Tests for model.py — parallel fallback, Tukey validation, NaN under Tukey correction."""
+
+import warnings
+from unittest.mock import MagicMock, patch
+
+import numpy as np
+import pytest
+
+from mcpower import MCPower
+
+
+class TestTukeyWithoutPosthoc:
+    """Tukey correction without posthoc specs should raise ValueError."""
+
+    def test_tukey_without_posthoc_raises(self):
+        model = MCPower("y = x1 + x2")
+        model.set_effects("x1=0.5, x2=0.3")
+
+        with pytest.raises(ValueError, match="Tukey correction requires"):
+            model.find_power(
+                sample_size=100,
+                correction="tukey",
+                print_results=False,
+            )
+
+
+class TestTukeyNaNification:
+    """Non-posthoc tests should be NaN-ified under Tukey correction."""
+
+    def test_non_posthoc_tests_nan_under_tukey(self):
+        model = MCPower("y = group + x1")
+        model.set_variable_type("group=(factor,3)")
+        model.set_effects("group[2]=0.5, group[3]=0.4, x1=0.3")
+        model.n_simulations = 50
+        model.seed = 42
+
+        result = model.find_sample_size(
+            target_test="all, all-posthoc",
+            correction="tukey",
+            from_size=30,
+            to_size=60,
+            by=30,
+            print_results=False,
+            return_results=True,
+        )
+
+        assert result is not None
+        results = result["results"]
+        corrected = results.get("powers_by_test_corrected", {})
+
+        # Post-hoc comparisons should have real power values
+        # Non-posthoc tests (like "x1", "group[2]", "group[3]", "overall")
+        # should have NaN values
+        posthoc_labels = {s.label for s in model._posthoc_specs}
+        for test_name, powers in corrected.items():
+            if test_name not in posthoc_labels:
+                assert all(isinstance(v, float) and np.isnan(v) for v in powers), \
+                    f"Expected NaN for non-posthoc test '{test_name}', got {powers}"
+
+        # first_achieved_corrected for non-posthoc should be -1
+        for test_name, n in results.get("first_achieved_corrected", {}).items():
+            if test_name not in posthoc_labels:
+                assert n == -1, f"Expected -1 for '{test_name}', got {n}"
+
+
+class TestParallelFallback:
+    """Parallel execution falls back to sequential on exception."""
+
+    def test_parallel_exception_falls_back(self, capsys):
+        model = MCPower("y = x1 + x2")
+        model.set_effects("x1=0.5, x2=0.3")
+        model.parallel = True
+        model.n_simulations = 50
+        model.seed = 42
+
+        # Parallel is imported inside the function via `from joblib import Parallel`,
+        # so we patch it at the joblib module level.
+        with patch("joblib.Parallel", side_effect=RuntimeError("joblib broken")):
+            # Should still complete via sequential fallback
+            result = model.find_sample_size(
+                from_size=30,
+                to_size=60,
+                by=30,
+                print_results=False,
+                return_results=True,
+            )
+            assert result is not None
+            captured = capsys.readouterr()
+            assert "Falling back to sequential" in captured.out
+
+
+class TestIsParallelEffective:
+    """Test _is_parallel_effective resolution."""
+
+    def test_true_always_parallel(self):
+        model = MCPower("y = x1 + x2")
+        model.parallel = True
+        assert model._is_parallel_effective() is True
+
+    def test_false_never_parallel(self):
+        model = MCPower("y = x1 + x2")
+        model.parallel = False
+        assert model._is_parallel_effective() is False
+
+    def test_mixedmodels_with_clusters(self):
+        model = MCPower("y ~ x1 + (1|school)")
+        model.set_cluster("school", ICC=0.2, n_clusters=20)
+        model.set_effects("x1=0.5")
+        model._apply()  # cluster_specs are deferred until apply()
+        model.parallel = "mixedmodels"
+        assert model._is_parallel_effective() is True
+
+    def test_mixedmodels_without_clusters(self):
+        model = MCPower("y = x1 + x2")
+        model.set_effects("x1=0.5, x2=0.3")
+        model.parallel = "mixedmodels"
+        assert model._is_parallel_effective() is False
diff --git a/tests/unit/test_native_backend.py b/tests/unit/test_native_backend.py
new file mode 100644
index 0000000..93b4a3e
--- /dev/null
+++ b/tests/unit/test_native_backend.py
@@ -0,0 +1,60 @@
+"""Tests for mcpower.backends.native — import fallback and _prep utility."""
+
+import numpy as np
+import pytest
+from unittest.mock import patch, MagicMock
+
+from mcpower.backends.native import _prep
+
+
+class TestPrep:
+    """Test _prep array coercion for C++ interop."""
+
+    def test_contiguous_passthrough(self):
+        arr = np.array([1.0, 2.0, 3.0], dtype=np.float64)
+        result = _prep(arr)
+        assert result.flags["C_CONTIGUOUS"]
+        assert result.dtype == np.float64
+
+    def test_non_contiguous_becomes_contiguous(self):
+        arr = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float64)
+        col = arr[:, 1]  # non-contiguous column slice
+        assert not col.flags["C_CONTIGUOUS"]
+        result = _prep(col)
+        assert result.flags["C_CONTIGUOUS"]
+        np.testing.assert_array_equal(result, [2.0, 4.0])
+
+    def test_dtype_conversion_float32_to_float64(self):
+        arr = np.array([1.0, 2.0], dtype=np.float32)
+        result = _prep(arr, np.float64)
+        assert result.dtype == np.float64
+
+    def test_dtype_conversion_int64_to_int32(self):
+        arr = np.array([0, 1, 2], dtype=np.int64)
+        result = _prep(arr, np.int32)
+        assert result.dtype == np.int32
+        np.testing.assert_array_equal(result, [0, 1, 2])
+
+    def test_2d_array(self):
+        arr = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float64, order="F")
+        assert not arr.flags["C_CONTIGUOUS"]
+        result = _prep(arr)
+        assert result.flags["C_CONTIGUOUS"]
+        assert result.dtype == np.float64
+
+
+class TestNativeBackendImport:
+    """Test NativeBackend init when C++ extension is unavailable."""
+
+    def test_init_raises_when_unavailable(self):
+        """NativeBackend() should raise ImportError when _NATIVE_AVAILABLE=False."""
+        with patch("mcpower.backends.native._NATIVE_AVAILABLE", False):
+            from mcpower.backends.native import NativeBackend
+            with pytest.raises(ImportError, match="Native C\\+\\+ backend not available"):
+                NativeBackend()
+
+    def test_is_native_available_reflects_module_state(self):
+        from mcpower.backends.native import is_native_available
+        # Just verify it returns a bool
+        result = is_native_available()
+        assert isinstance(result, bool)
diff --git a/tests/unit/test_ols_corrections.py b/tests/unit/test_ols_corrections.py
new file mode 100644
index 0000000..8c6728a
--- /dev/null
+++ b/tests/unit/test_ols_corrections.py
@@ -0,0 +1,251 @@
+"""Tests for OLS post-hoc contrast corrections and edge cases."""
+
+from dataclasses import dataclass
+from typing import Optional
+
+import numpy as np
+import pytest
+
+from mcpower.stats.ols import compute_posthoc_contrasts
+
+
+@dataclass
+class _PostHocSpec:
+    """Minimal PostHocSpec stub for tests."""
+    factor_name: str
+    col_idx_a: Optional[int]
+    col_idx_b: Optional[int]
+    label: str = ""
+    level_a: str = ""
+    level_b: str = ""
+    n_levels: int = 3
+
+
+def _make_ols_data(n=100, p=3, seed=42):
+    """Generate simple OLS data: X, y, and target_indices."""
+    rng = np.random.RandomState(seed)
+    X = rng.randn(n, p)
+    beta = np.array([0.5, 0.3, -0.2])[:p]
+    y = X @ beta + rng.randn(n)
+    return X, y
+
+
+class TestDegenerateDesign:
+    """When dof <= 0, posthoc should return zeros."""
+
+    def test_dof_zero_returns_zeros(self):
+        # n = p+1 → dof = 0
+        n, p = 4, 3
+        rng = np.random.RandomState(42)
+        X = rng.randn(n, p)
+        y = rng.randn(n)
+        specs = [_PostHocSpec("grp", 0, 1)]
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 2.0, {}, target_indices=np.array([0, 1, 2]),
+        )
+        assert uncorr.shape == (1,)
+        assert not uncorr[0]
+        assert not corr[0]
+        assert override is None
+
+    def test_singular_contrast_variance_stays_zero(self):
+        """When both col_idx_a and col_idx_b are None, t_abs stays 0."""
+        X, y = _make_ols_data()
+        specs = [_PostHocSpec("grp", None, None)]
+
+        uncorr, corr, _ = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 2.0, {},
+        )
+        assert not uncorr[0]
+        assert not corr[0]
+
+
+class TestCombinedFDR:
+    """FDR (correction_method=2) step-up across regular+posthoc t-stats."""
+
+    def test_fdr_combined_ranking(self):
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [
+            _PostHocSpec("grp", 0, 1),
+            _PostHocSpec("grp", 0, 2),
+        ]
+        target_indices = np.array([0, 1, 2])
+        # Create combined crits of length n_regular + n_posthoc = 5
+        # Use very lenient crits so everything passes
+        combined_crits = np.full(5, 0.01)
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 0.01, {},
+            target_indices=target_indices,
+            correction_method=2,
+            correction_t_crits_combined=combined_crits,
+        )
+        assert override is not None
+        assert len(override) == 3  # n_regular
+        assert len(corr) == 2  # n_posthoc
+
+    def test_fdr_no_significant(self):
+        """With very strict crits, nothing should be significant."""
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1)]
+        target_indices = np.array([0, 1, 2])
+        # Very strict thresholds
+        combined_crits = np.full(4, 100.0)
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 100.0, {},
+            target_indices=target_indices,
+            correction_method=2,
+            correction_t_crits_combined=combined_crits,
+        )
+        assert not np.any(corr)
+        assert override is not None
+        assert not np.any(override)
+
+
+class TestCombinedHolm:
+    """Holm (correction_method=3) step-down with early termination."""
+
+    def test_holm_combined_ranking(self):
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1)]
+        target_indices = np.array([0, 1, 2])
+        combined_crits = np.full(4, 0.01)  # Very lenient
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 0.01, {},
+            target_indices=target_indices,
+            correction_method=3,
+            correction_t_crits_combined=combined_crits,
+        )
+        assert override is not None
+        assert len(override) == 3
+
+    def test_holm_early_termination(self):
+        """If the most significant test doesn't pass, none should."""
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1)]
+        target_indices = np.array([0, 1, 2])
+        combined_crits = np.full(4, 1000.0)  # Impossible threshold
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 1000.0, {},
+            target_indices=target_indices,
+            correction_method=3,
+            correction_t_crits_combined=combined_crits,
+        )
+        assert not np.any(corr)
+
+
+class TestFallbackPaths:
+    """Fallback when correction_t_crits_combined is None or wrong length."""
+
+    def test_combined_crits_none_fallback(self):
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1)]
+        target_indices = np.array([0, 1, 2])
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 2.0, {},
+            target_indices=target_indices,
+            correction_method=2,
+            correction_t_crits_combined=None,
+        )
+        # Fallback: corrected = uncorrected copy, no override
+        np.testing.assert_array_equal(corr, uncorr)
+        assert override is None
+
+    def test_combined_crits_wrong_length_fallback(self):
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1)]
+        target_indices = np.array([0, 1, 2])
+        # Wrong length: should be 4 (3 regular + 1 posthoc)
+        wrong_crits = np.full(2, 2.0)
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 2.0, {},
+            target_indices=target_indices,
+            correction_method=2,
+            correction_t_crits_combined=wrong_crits,
+        )
+        np.testing.assert_array_equal(corr, uncorr)
+        assert override is None
+
+
+class TestTukeyMethod:
+    """Tukey post-hoc method path."""
+
+    def test_tukey_uses_factor_crit(self):
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1, n_levels=3)]
+        tukey_crits = {"grp": 0.01}  # Very lenient
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "tukey", 2.0, tukey_crits,
+        )
+        # Tukey correction: uncorrected == corrected
+        np.testing.assert_array_equal(uncorr, corr)
+        assert override is None
+
+    def test_tukey_missing_factor_uses_inf(self):
+        """When factor not in tukey_crits, inf is used → not significant."""
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("missing_factor", 0, 1)]
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "tukey", 2.0, {},
+        )
+        assert not uncorr[0]
+        assert not corr[0]
+
+
+class TestBonferroniPosthoc:
+    """Bonferroni correction for posthoc (correction_method=1)."""
+
+    def test_bonferroni_uses_combined_first_crit(self):
+        X, y = _make_ols_data(n=200, p=3, seed=10)
+        specs = [_PostHocSpec("grp", 0, 1)]
+        target_indices = np.array([0, 1, 2])
+        combined_crits = np.full(4, 0.01)  # Very lenient
+
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 0.01, {},
+            target_indices=target_indices,
+            correction_method=1,
+            correction_t_crits_combined=combined_crits,
+        )
+        assert override is None  # Bonferroni doesn't produce override
+
+
+class TestEmptySpecs:
+    """Empty posthoc specs return empty arrays."""
+
+    def test_no_specs(self):
+        X, y = _make_ols_data()
+        uncorr, corr, override = compute_posthoc_contrasts(
+            X, y, [], "t-test", 2.0, {},
+        )
+        assert len(uncorr) == 0
+        assert len(corr) == 0
+        assert override is None
+
+
+class TestSingleColumnContrasts:
+    """Contrasts where one side is the reference level (None)."""
+
+    def test_col_idx_a_none(self):
+        X, y = _make_ols_data(n=200)
+        specs = [_PostHocSpec("grp", None, 1)]
+        uncorr, corr, _ = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 2.0, {},
+        )
+        assert uncorr.shape == (1,)
+
+    def test_col_idx_b_none(self):
+        X, y = _make_ols_data(n=200)
+        specs = [_PostHocSpec("grp", 0, None)]
+        uncorr, corr, _ = compute_posthoc_contrasts(
+            X, y, specs, "t-test", 2.0, {},
+        )
+        assert uncorr.shape == (1,)
diff --git a/tests/unit/test_parsers_errors.py b/tests/unit/test_parsers_errors.py
new file mode 100644
index 0000000..b3c8c01
--- /dev/null
+++ b/tests/unit/test_parsers_errors.py
@@ -0,0 +1,168 @@
+"""Tests for parser error paths and edge cases."""
+
+import pytest
+
+from mcpower.utils.parsers import _AssignmentParser, _parse_equation
+
+
+_parser = _AssignmentParser()
+
+
+class TestAssignmentParserErrors:
+    """Error paths in _AssignmentParser._parse."""
+
+    def test_missing_equals_sign(self):
+        parsed, errors = _parser._parse("x1 0.5", "effect", ["x1"])
+        assert len(errors) == 1
+        assert "Invalid format" in errors[0]
+
+    def test_unknown_parse_type(self):
+        parsed, errors = _parser._parse("x1=0.5", "unknown_type", ["x1"])
+        assert len(errors) == 1
+        assert "Unknown parse type" in errors[0]
+
+    def test_unavailable_variable(self):
+        parsed, errors = _parser._parse("x_missing=0.5", "effect", ["x1", "x2"])
+        assert len(errors) == 1
+        assert "not found" in errors[0]
+        assert "x_missing" in errors[0]
+
+    def test_invalid_effect_value(self):
+        parsed, errors = _parser._parse("x1=abc", "effect", ["x1"])
+        assert len(errors) == 1
+        assert "Invalid effect size" in errors[0]
+
+    def test_multiple_errors(self):
+        parsed, errors = _parser._parse("x_bad=abc, x_also_bad=xyz", "effect", ["x1"])
+        assert len(errors) == 2
+
+
+class TestCorrelationParserErrors:
+    """Error paths for correlation parsing."""
+
+    def test_invalid_correlation_format(self):
+        parsed, errors = _parser._parse("x1_x2=0.5", "correlation", ["x1", "x2"])
+        assert len(errors) == 1
+        assert "Invalid format" in errors[0] or "Invalid correlation" in errors[0]
+
+    def test_correlation_var_not_found(self):
+        parsed, errors = _parser._parse("corr(x1, x_missing)=0.5", "correlation", ["x1", "x2"])
+        assert len(errors) == 1
+        assert "not found" in errors[0]
+
+    def test_self_correlation(self):
+        parsed, errors = _parser._parse("corr(x1, x1)=0.5", "correlation", ["x1", "x2"])
+        assert len(errors) == 1
+        assert "Cannot correlate variable with itself" in errors[0]
+
+    def test_correlation_value_out_of_range(self):
+        parsed, errors = _parser._parse("corr(x1, x2)=1.5", "correlation", ["x1", "x2"])
+        assert len(errors) == 1
+        assert "between -1 and 1" in errors[0]
+
+    def test_invalid_correlation_value(self):
+        parsed, errors = _parser._parse("corr(x1, x2)=abc", "correlation", ["x1", "x2"])
+        assert len(errors) == 1
+        assert "Invalid correlation value" in errors[0]
+
+
+class TestVariableTypeErrors:
+    """Error paths for variable type parsing."""
+
+    def test_unsupported_type(self):
+        parsed, errors = _parser._parse("x1=crazy_type", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "Unsupported type" in errors[0]
+
+    def test_binary_proportion_out_of_range(self):
+        parsed, errors = _parser._parse("x1=(binary,1.5)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "between 0 and 1" in errors[0]
+
+    def test_binary_non_numeric_proportion(self):
+        parsed, errors = _parser._parse("x1=(binary,abc)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "Invalid proportion" in errors[0]
+
+    def test_binary_wrong_param_count(self):
+        parsed, errors = _parser._parse("x1=(binary,0.3,0.4)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "exactly 2 values" in errors[0]
+
+    def test_factor_less_than_2_levels(self):
+        parsed, errors = _parser._parse("x1=(factor,1)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "at least 2 levels" in errors[0]
+
+    def test_factor_more_than_20_levels(self):
+        parsed, errors = _parser._parse("x1=(factor,21)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "more than 20 levels" in errors[0]
+
+    def test_factor_non_integer_levels(self):
+        parsed, errors = _parser._parse("x1=(factor,abc)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "Must be integer" in errors[0]
+
+    def test_factor_proportions_more_than_20(self):
+        props = ",".join(["0.04"] * 21)
+        parsed, errors = _parser._parse(f"x1=(factor,{props})", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "more than 20 levels" in errors[0]
+
+    def test_factor_zero_proportion(self):
+        parsed, errors = _parser._parse("x1=(factor,0.5,0.0,0.5)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "positive" in errors[0]
+
+    def test_factor_non_numeric_proportions(self):
+        parsed, errors = _parser._parse("x1=(factor,abc,def)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "numeric" in errors[0]
+
+    def test_tuple_no_comma(self):
+        parsed, errors = _parser._parse("x1=(binary)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "Invalid tuple format" in errors[0]
+
+    def test_tuple_unsupported_type_in_tuple(self):
+        parsed, errors = _parser._parse("x1=(normal,0.5)", "variable_type", ["x1"])
+        assert len(errors) == 1
+        assert "only supported for binary and factor" in errors[0]
+
+
+class TestEquationParsing:
+    """Edge cases in _parse_equation."""
+
+    def test_nested_random_effects(self):
+        dep, formula, ranefs = _parse_equation("y ~ x1 + (1|A/B)")
+        assert dep == "y"
+        assert len(ranefs) == 2
+        group_vars = {r["grouping_var"] for r in ranefs}
+        assert "A" in group_vars
+        assert "A:B" in group_vars
+
+    def test_duplicate_grouping_var_raises(self):
+        with pytest.raises(ValueError, match="Duplicate random effect grouping variable"):
+            _parse_equation("y ~ x1 + (1|school) + (1|school)")
+
+    def test_random_slopes(self):
+        dep, formula, ranefs = _parse_equation("y ~ x1 + (1 + x1|school)")
+        assert len(ranefs) == 1
+        assert ranefs[0]["type"] == "random_slope"
+        assert ranefs[0]["slope_vars"] == ["x1"]
+        assert ranefs[0]["grouping_var"] == "school"
+
+    def test_random_slope_duplicate_grouping_raises(self):
+        with pytest.raises(ValueError, match="Duplicate"):
+            _parse_equation("y ~ x1 + (1|school) + (1 + x1|school)")
+
+    def test_no_separator_uses_default_dep(self):
+        dep, formula, ranefs = _parse_equation("x1+x2")
+        assert dep == "explained_variable"
+        assert "x1" in formula
+        assert "x2" in formula
+
+    def test_nested_duplicate_parent_raises(self):
+        with pytest.raises(ValueError, match="Duplicate"):
+            _parse_equation("y ~ (1|A) + (1|A/B)")
diff --git a/tests/unit/test_progress.py b/tests/unit/test_progress.py
index 329acca..768420a 100644
--- a/tests/unit/test_progress.py
+++ b/tests/unit/test_progress.py
@@ -127,12 +127,6 @@ def test_completion_newline(self):
 class TestTqdmReporter:
     """Test TqdmReporter with mock tqdm."""
 
-    def test_tqdm_missing_raises(self):
-        reporter = TqdmReporter()
-        with patch.dict("sys.modules", {"tqdm": None}):
-            with pytest.raises(ImportError, match="tqdm"):
-                reporter(0, 100)
-
     def test_tqdm_basic_flow(self):
         mock_bar = MagicMock()
         mock_bar.n = 0
@@ -154,6 +148,51 @@ def test_tqdm_basic_flow(self):
             reporter(100, 100)  # closes
             mock_bar.close.assert_called_once()
 
+    def test_tqdm_successive_sessions(self):
+        """After close, a new session creates a fresh bar."""
+        mock_bar = MagicMock()
+        mock_bar.n = 0
+        mock_tqdm_cls = MagicMock(return_value=mock_bar)
+        mock_tqdm_module = MagicMock()
+        mock_tqdm_module.tqdm = mock_tqdm_cls
+
+        reporter = TqdmReporter()
+
+        with patch.dict("sys.modules", {"tqdm": mock_tqdm_module}):
+            # First session
+            reporter(0, 50)
+            mock_bar.n = 0
+            reporter(50, 50)
+            mock_bar.close.assert_called_once()
+            assert reporter._bar is None
+
+            # Second session — should create a new bar
+            mock_tqdm_cls.reset_mock()
+            mock_bar2 = MagicMock()
+            mock_bar2.n = 0
+            mock_tqdm_cls.return_value = mock_bar2
+
+            reporter(0, 200)
+            assert mock_tqdm_cls.call_count == 1
+            mock_tqdm_cls.assert_called_with(total=200, unit="sim")
+
+    def test_tqdm_no_negative_delta(self):
+        """When current <= bar.n, update should not be called with negative delta."""
+        mock_bar = MagicMock()
+        mock_bar.n = 50
+        mock_tqdm_cls = MagicMock(return_value=mock_bar)
+        mock_tqdm_module = MagicMock()
+        mock_tqdm_module.tqdm = mock_tqdm_cls
+
+        reporter = TqdmReporter()
+
+        with patch.dict("sys.modules", {"tqdm": mock_tqdm_module}):
+            reporter(0, 100)  # creates bar
+            mock_bar.n = 50
+            reporter(30, 100)  # current < bar.n
+            # update should NOT have been called (delta = 30 - 50 = -20, not > 0)
+            mock_bar.update.assert_not_called()
+
 
 class TestComputeTotalSimulations:
     """Test compute_total_simulations helper."""
diff --git a/tests/unit/test_results.py b/tests/unit/test_results.py
new file mode 100644
index 0000000..31c3082
--- /dev/null
+++ b/tests/unit/test_results.py
@@ -0,0 +1,138 @@
+"""Unit tests for mcpower.core.results — ResultsProcessor and builder functions."""
+
+import numpy as np
+import pytest
+
+from mcpower.core.results import ResultsProcessor, build_power_result, build_sample_size_result
+
+
+class TestCalculatePowers:
+    """Tests for ResultsProcessor.calculate_powers."""
+
+    def test_basic_two_tests(self):
+        """Power calculation with two tests (overall + one predictor)."""
+        proc = ResultsProcessor(target_power=80.0)
+        # 10 simulations, 2 columns: [overall, x1]
+        # overall: 8/10 sig, x1: 6/10 sig
+        results = [np.array([True, True])] * 6 + [
+            np.array([True, False]),
+            np.array([True, False]),
+            np.array([False, False]),
+            np.array([False, False]),
+        ]
+        corrected = results  # same for this test
+
+        out = proc.calculate_powers(results, corrected, ["overall", "x1"])
+
+        assert out["individual_powers"]["overall"] == pytest.approx(80.0)
+        assert out["individual_powers"]["x1"] == pytest.approx(60.0)
+        assert out["n_simulations_used"] == 10
+
+    def test_all_significant(self):
+        proc = ResultsProcessor()
+        results = [np.array([True, True])] * 5
+        out = proc.calculate_powers(results, results, ["overall", "x1"])
+        assert out["individual_powers"]["overall"] == pytest.approx(100.0)
+        assert out["individual_powers"]["x1"] == pytest.approx(100.0)
+
+    def test_none_significant(self):
+        proc = ResultsProcessor()
+        results = [np.array([False, False])] * 5
+        out = proc.calculate_powers(results, results, ["overall", "x1"])
+        assert out["individual_powers"]["overall"] == pytest.approx(0.0)
+        assert out["individual_powers"]["x1"] == pytest.approx(0.0)
+
+    def test_combined_probabilities(self):
+        proc = ResultsProcessor()
+        # 4 sims, 2 tests: exactly 0, 1, 2 significant
+        results = [
+            np.array([False, False]),  # 0 sig
+            np.array([True, False]),   # 1 sig
+            np.array([False, True]),   # 1 sig
+            np.array([True, True]),    # 2 sig
+        ]
+        out = proc.calculate_powers(results, results, ["overall", "x1"])
+        combined = out["combined_probabilities"]
+        assert combined["exactly_0_significant"] == pytest.approx(25.0)
+        assert combined["exactly_1_significant"] == pytest.approx(50.0)
+        assert combined["exactly_2_significant"] == pytest.approx(25.0)
+
+    def test_cumulative_probabilities(self):
+        proc = ResultsProcessor()
+        results = [
+            np.array([False, False]),
+            np.array([True, True]),
+            np.array([True, True]),
+            np.array([True, True]),
+        ]
+        out = proc.calculate_powers(results, results, ["overall", "x1"])
+        cumulative = out["cumulative_probabilities"]
+        assert cumulative["at_least_0_significant"] == pytest.approx(100.0)
+        assert cumulative["at_least_2_significant"] == pytest.approx(75.0)
+
+
+class TestBuildPowerResult:
+    """Tests for build_power_result."""
+
+    def test_basic_structure(self):
+        power_results = {
+            "individual_powers": {"overall": 80.0},
+            "n_simulations_used": 1000,
+        }
+        result = build_power_result(
+            model_type="OLS",
+            target_tests=["overall"],
+            formula_to_test=None,
+            equation="y = x1",
+            sample_size=100,
+            alpha=0.05,
+            n_simulations=1000,
+            correction=None,
+            target_power=80.0,
+            parallel=False,
+            power_results=power_results,
+        )
+        assert result["model"]["model_type"] == "OLS"
+        assert result["model"]["sample_size"] == 100
+        assert result["model"]["alpha"] == 0.05
+        assert result["results"] is power_results
+
+
+class TestBuildSampleSizeResult:
+    """Tests for build_sample_size_result."""
+
+    def test_basic_structure(self):
+        analysis_results = {"sample_sizes_tested": [50, 100]}
+        result = build_sample_size_result(
+            model_type="OLS",
+            target_tests=["overall"],
+            formula_to_test=None,
+            equation="y = x1",
+            sample_sizes=[50, 100],
+            alpha=0.05,
+            n_simulations=1000,
+            correction=None,
+            target_power=80.0,
+            parallel=False,
+            analysis_results=analysis_results,
+        )
+        assert result["model"]["sample_size_range"]["from_size"] == 50
+        assert result["model"]["sample_size_range"]["to_size"] == 100
+        assert result["model"]["sample_size_range"]["by"] == 50
+        assert result["results"] is analysis_results
+
+    def test_single_sample_size(self):
+        result = build_sample_size_result(
+            model_type="OLS",
+            target_tests=["overall"],
+            formula_to_test=None,
+            equation="y = x1",
+            sample_sizes=[100],
+            alpha=0.05,
+            n_simulations=1000,
+            correction=None,
+            target_power=80.0,
+            parallel=False,
+            analysis_results={},
+        )
+        assert result["model"]["sample_size_range"]["by"] == 1
diff --git a/tests/unit/test_scenarios_coverage.py b/tests/unit/test_scenarios_coverage.py
new file mode 100644
index 0000000..ea0d4f2
--- /dev/null
+++ b/tests/unit/test_scenarios_coverage.py
@@ -0,0 +1,218 @@
+"""Tests for scenario analysis — plot creation, correlation matrix repair, LME perturbations."""
+
+from unittest.mock import MagicMock, patch
+
+import numpy as np
+import pytest
+
+from mcpower.core.scenarios import (
+    ScenarioRunner,
+    apply_lme_perturbations,
+    apply_per_simulation_perturbations,
+)
+
+
+class TestCorrelationMatrixRepair:
+    """Spectral clipping when noise creates negative eigenvalues."""
+
+    def test_negative_eigenvalue_repaired(self):
+        """After heavy noise, result should be positive semi-definite with unit diagonal."""
+        # Create a 3x3 identity correlation matrix
+        corr = np.eye(3)
+        var_types = np.zeros(3, dtype=np.int64)  # all normal
+
+        config = {
+            "correlation_noise_sd": 2.0,  # Very heavy noise → guaranteed negative eigenvalues
+            "distribution_change_prob": 0.0,
+            "new_distributions": [],
+        }
+
+        perturbed_corr, _ = apply_per_simulation_perturbations(corr, var_types, config, sim_seed=42)
+
+        # Eigenvalues should all be >= 0
+        eigvals = np.linalg.eigvalsh(perturbed_corr)
+        assert np.all(eigvals >= -1e-10)
+
+        # Diagonal should be 1.0
+        np.testing.assert_allclose(np.diag(perturbed_corr), 1.0, atol=1e-10)
+
+        # Should be symmetric
+        np.testing.assert_allclose(perturbed_corr, perturbed_corr.T, atol=1e-10)
+
+    def test_no_repair_needed_when_no_noise(self):
+        corr = np.array([[1.0, 0.3], [0.3, 1.0]])
+        var_types = np.zeros(2, dtype=np.int64)
+
+        config = {
+            "correlation_noise_sd": 0.0,
+            "distribution_change_prob": 0.0,
+            "new_distributions": [],
+        }
+
+        perturbed_corr, _ = apply_per_simulation_perturbations(corr, var_types, config, sim_seed=42)
+        np.testing.assert_array_equal(perturbed_corr, corr)
+
+
+class TestDistributionPerturbation:
+    """Variable type swaps in scenario mode."""
+
+    def test_distribution_swap_occurs(self):
+        var_types = np.zeros(10, dtype=np.int64)  # All normal
+        config = {
+            "correlation_noise_sd": 0.0,
+            "distribution_change_prob": 1.0,  # Always swap
+            "new_distributions": ["right_skewed"],
+        }
+
+        _, perturbed_types = apply_per_simulation_perturbations(
+            np.eye(10), var_types, config, sim_seed=42,
+        )
+        # All should be swapped from 0 to 2 (right_skewed)
+        assert np.all(perturbed_types == 2)
+
+    def test_non_normal_not_swapped(self):
+        """Binary (1) and uploaded (99) vars should not be swapped."""
+        var_types = np.array([0, 1, 99], dtype=np.int64)
+        config = {
+            "correlation_noise_sd": 0.0,
+            "distribution_change_prob": 1.0,
+            "new_distributions": ["right_skewed"],
+        }
+
+        _, perturbed_types = apply_per_simulation_perturbations(
+            np.eye(3), var_types, config, sim_seed=42,
+        )
+        assert perturbed_types[0] == 2  # normal → right_skewed
+        assert perturbed_types[1] == 1  # binary unchanged
+        assert perturbed_types[2] == 99  # uploaded unchanged
+
+    def test_none_config_passthrough(self):
+        corr = np.eye(2)
+        var_types = np.zeros(2, dtype=np.int64)
+        result_corr, result_types = apply_per_simulation_perturbations(
+            corr, var_types, None, sim_seed=42,
+        )
+        np.testing.assert_array_equal(result_corr, corr)
+        np.testing.assert_array_equal(result_types, var_types)
+
+
+class TestLMEPerturbations:
+    """LME perturbation computation."""
+
+    def test_icc_noise_creates_multipliers(self):
+        cluster_specs = {"school": {"n_clusters": 20, "cluster_size": 10, "icc": 0.2}}
+        config = {
+            "icc_noise_sd": 0.3,
+            "random_effect_dist": "normal",
+            "random_effect_df": 5,
+        }
+
+        result = apply_lme_perturbations(cluster_specs, config, sim_seed=42)
+        assert result is not None
+        assert "tau_squared_multipliers" in result
+        assert "school" in result["tau_squared_multipliers"]
+        # Multiplier should be exp(N(0, 0.3)) — positive, around 1
+        mult = result["tau_squared_multipliers"]["school"]
+        assert mult > 0
+
+    def test_no_perturbation_returns_none(self):
+        cluster_specs = {"school": {"n_clusters": 20, "cluster_size": 10, "icc": 0.2}}
+        config = {
+            "icc_noise_sd": 0.0,
+            "random_effect_dist": "normal",
+            "random_effect_df": 5,
+        }
+        result = apply_lme_perturbations(cluster_specs, config, sim_seed=42)
+        assert result is None
+
+    def test_empty_cluster_specs_returns_none(self):
+        result = apply_lme_perturbations({}, {"icc_noise_sd": 0.5}, sim_seed=42)
+        assert result is None
+
+    def test_heavy_tailed_re_dist(self):
+        cluster_specs = {"school": {"n_clusters": 20, "cluster_size": 10, "icc": 0.2}}
+        config = {
+            "icc_noise_sd": 0.0,
+            "random_effect_dist": "heavy_tailed",
+            "random_effect_df": 3,
+        }
+        result = apply_lme_perturbations(cluster_specs, config, sim_seed=42)
+        assert result is not None
+        assert result["random_effect_dist"] == "heavy_tailed"
+        assert result["random_effect_df"] == 3
+
+
+class TestScenarioRunnerPlots:
+    """Test _create_scenario_plots path."""
+
+    def test_plot_creation_with_mock(self):
+        model = MagicMock()
+        model.power = 80.0
+        runner = ScenarioRunner(model)
+
+        results = {
+            "analysis_type": "sample_size",
+            "scenarios": {
+                "optimistic": {
+                    "model": {
+                        "target_tests": ["x1"],
+                        "correction": None,
+                    },
+                    "results": {
+                        "sample_sizes_tested": [50, 100],
+                        "powers_by_test": {"x1": [50.0, 85.0]},
+                        "first_achieved": {"x1": 100},
+                    },
+                },
+            },
+        }
+
+        with patch("mcpower.core.scenarios._create_power_plot") as mock_plot:
+            runner._create_scenario_plots(results)
+            mock_plot.assert_called_once()
+
+    def test_plot_with_correction(self):
+        model = MagicMock()
+        model.power = 80.0
+        runner = ScenarioRunner(model)
+
+        results = {
+            "analysis_type": "sample_size",
+            "scenarios": {
+                "optimistic": {
+                    "model": {
+                        "target_tests": ["x1"],
+                        "correction": "bonferroni",
+                    },
+                    "results": {
+                        "sample_sizes_tested": [50, 100],
+                        "powers_by_test": {"x1": [50.0, 85.0]},
+                        "powers_by_test_corrected": {"x1": [40.0, 75.0]},
+                        "first_achieved": {"x1": 100},
+                        "first_achieved_corrected": {"x1": 150},
+                    },
+                },
+            },
+        }
+
+        with patch("mcpower.core.scenarios._create_power_plot") as mock_plot:
+            runner._create_scenario_plots(results)
+            # Should be called for both uncorrected and corrected
+            assert mock_plot.call_count == 2
+
+    def test_no_plot_when_missing_sample_sizes(self):
+        model = MagicMock()
+        model.power = 80.0
+        runner = ScenarioRunner(model)
+
+        results = {
+            "scenarios": {
+                "optimistic": {
+                    "results": {"powers_by_test": {"x1": [50.0]}},
+                },
+            },
+        }
+
+        with patch("mcpower.core.scenarios._create_power_plot") as mock_plot:
+            runner._create_scenario_plots(results)
+            mock_plot.assert_not_called()
diff --git a/tests/unit/test_simulation_coverage.py b/tests/unit/test_simulation_coverage.py
new file mode 100644
index 0000000..8bae7ff
--- /dev/null
+++ b/tests/unit/test_simulation_coverage.py
@@ -0,0 +1,274 @@
+"""Tests for simulation.py — failure handling, Wald fallback, verbose diagnostics, ICC mismatch."""
+
+import warnings
+from typing import Dict, List, Optional
+from unittest.mock import MagicMock, patch
+
+import numpy as np
+import pytest
+
+from mcpower.core.simulation import SimulationMetadata, SimulationRunner, _warn_icc_mismatch
+
+
+def _make_metadata(
+    n_targets=2,
+    cluster_specs=None,
+    verbose=False,
+    correction_method=0,
+):
+    """Create a minimal SimulationMetadata for testing."""
+    return SimulationMetadata(
+        target_indices=np.arange(n_targets),
+        n_non_factor_vars=n_targets,
+        correlation_matrix=np.eye(n_targets),
+        var_types=np.zeros(n_targets, dtype=np.int64),
+        var_params=np.zeros(n_targets, dtype=np.float64),
+        factor_specs=[],
+        upload_normal_values=np.zeros((2, 2), dtype=np.float64),
+        upload_data_values=np.zeros((2, 2), dtype=np.float64),
+        effect_sizes=np.array([0.5] * n_targets),
+        correction_method=correction_method,
+        cluster_specs=cluster_specs or {},
+        verbose=verbose,
+    )
+
+
+def _noop_perturbations(corr, types, config, seed):
+    return corr, types
+
+
+class TestAllSimulationsFail:
+    """When all simulations return None, RuntimeError should be raised."""
+
+    def test_all_fail_raises(self):
+        runner = SimulationRunner(n_simulations=5, seed=42)
+        metadata = _make_metadata()
+
+        def failing_sim(*args, **kwargs):
+            return None
+
+        with patch.object(runner, "_single_simulation", return_value=None):
+            with pytest.raises(RuntimeError, match="All simulations failed"):
+                runner.run_power_simulations(
+                    sample_size=100,
+                    metadata=metadata,
+                    generate_y_func=MagicMock(),
+                    analyze_func=MagicMock(),
+                    create_X_extended_func=MagicMock(),
+                    apply_perturbations_func=_noop_perturbations,
+                )
+
+
+class TestLMEThresholdExceeded:
+    """LME failure rate exceeding threshold raises RuntimeError."""
+
+    def test_high_failure_rate_raises(self):
+        runner = SimulationRunner(n_simulations=10, seed=42, max_failed_simulations=0.05)
+        metadata = _make_metadata(cluster_specs={"school": {"n_clusters": 5, "cluster_size": 10}})
+
+        call_count = [0]
+
+        def sometimes_fail(*args, **kwargs):
+            call_count[0] += 1
+            if call_count[0] <= 5:
+                return None  # 5 out of 10 fail = 50%
+            return (np.array([1, 1, 1]), np.array([1, 1, 1]), False)
+
+        with patch.object(runner, "_single_simulation", side_effect=sometimes_fail):
+            with pytest.raises(RuntimeError, match="Too many failed simulations"):
+                runner.run_power_simulations(
+                    sample_size=100,
+                    metadata=metadata,
+                    generate_y_func=MagicMock(),
+                    analyze_func=MagicMock(),
+                    create_X_extended_func=MagicMock(),
+                    apply_perturbations_func=_noop_perturbations,
+                )
+
+
+class TestOLSHighFailureWarns:
+    """OLS high failure rate warns but doesn't raise."""
+
+    def test_ols_warns_above_10_percent(self):
+        runner = SimulationRunner(n_simulations=10, seed=42)
+        metadata = _make_metadata()  # No cluster_specs = OLS
+
+        call_count = [0]
+
+        def sometimes_fail(*args, **kwargs):
+            call_count[0] += 1
+            if call_count[0] <= 2:
+                return None  # 2 out of 10 fail = 20%
+            return (np.array([1, 1, 1]), np.array([1, 1, 1]))
+
+        with patch.object(runner, "_single_simulation", side_effect=sometimes_fail):
+            with warnings.catch_warnings(record=True) as w:
+                warnings.simplefilter("always")
+                result = runner.run_power_simulations(
+                    sample_size=100,
+                    metadata=metadata,
+                    generate_y_func=MagicMock(),
+                    analyze_func=MagicMock(),
+                    create_X_extended_func=MagicMock(),
+                    apply_perturbations_func=_noop_perturbations,
+                )
+                assert any("failed" in str(warning.message).lower() for warning in w)
+
+
+class TestWaldFallbackWarning:
+    """Warn if >10% iterations use Wald test."""
+
+    def test_wald_warning_above_threshold(self):
+        runner = SimulationRunner(n_simulations=10, seed=42)
+        metadata = _make_metadata()
+
+        call_count = [0]
+
+        def wald_heavy(*args, **kwargs):
+            call_count[0] += 1
+            # All return wald_flag=True
+            return (np.array([1, 1, 1]), np.array([1, 1, 1]), True)
+
+        with patch.object(runner, "_single_simulation", side_effect=wald_heavy):
+            with warnings.catch_warnings(record=True) as w:
+                warnings.simplefilter("always")
+                result = runner.run_power_simulations(
+                    sample_size=100,
+                    metadata=metadata,
+                    generate_y_func=MagicMock(),
+                    analyze_func=MagicMock(),
+                    create_X_extended_func=MagicMock(),
+                    apply_perturbations_func=_noop_perturbations,
+                )
+                assert any("Wald test fallback" in str(warning.message) for warning in w)
+                assert result["n_wald_fallbacks"] == 10
+
+
+class TestVerboseDiagnostics:
+    """Verbose mode collects diagnostics and failure reasons."""
+
+    def test_verbose_success_collects_diagnostics(self):
+        runner = SimulationRunner(n_simulations=3, seed=42)
+        metadata = _make_metadata(verbose=True)
+
+        def verbose_result(*args, **kwargs):
+            return {
+                "results": (np.array([1, 1, 1]), np.array([1, 1, 1])),
+                "diagnostics": {"icc_estimated": 0.2},
+                "wald_fallback": False,
+            }
+
+        with patch.object(runner, "_single_simulation", side_effect=verbose_result):
+            result = runner.run_power_simulations(
+                sample_size=100,
+                metadata=metadata,
+                generate_y_func=MagicMock(),
+                analyze_func=MagicMock(),
+                create_X_extended_func=MagicMock(),
+                apply_perturbations_func=_noop_perturbations,
+            )
+            assert "diagnostics" in result
+            assert len(result["diagnostics"]) == 3
+
+    def test_verbose_failure_tracking(self):
+        runner = SimulationRunner(n_simulations=5, seed=42)
+        metadata = _make_metadata(verbose=True)
+
+        call_count = [0]
+
+        def mixed_results(*args, **kwargs):
+            call_count[0] += 1
+            if call_count[0] <= 2:
+                return {"failed": True, "failure_reason": "Convergence failed"}
+            return {
+                "results": (np.array([1, 1, 1]), np.array([1, 1, 1])),
+                "diagnostics": {},
+                "wald_fallback": False,
+            }
+
+        with patch.object(runner, "_single_simulation", side_effect=mixed_results):
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", UserWarning)
+                result = runner.run_power_simulations(
+                    sample_size=100,
+                    metadata=metadata,
+                    generate_y_func=MagicMock(),
+                    analyze_func=MagicMock(),
+                    create_X_extended_func=MagicMock(),
+                    apply_perturbations_func=_noop_perturbations,
+                )
+            assert "failure_reasons" in result
+            assert result["failure_reasons"]["Convergence failed"] == 2
+
+    def test_verbose_none_tracking(self):
+        """None results in verbose mode are tracked as unknown failures."""
+        runner = SimulationRunner(n_simulations=3, seed=42)
+        metadata = _make_metadata(verbose=True)
+
+        call_count = [0]
+
+        def mixed(*args, **kwargs):
+            call_count[0] += 1
+            if call_count[0] == 1:
+                return None
+            return {
+                "results": (np.array([1, 1, 1]), np.array([1, 1, 1])),
+                "diagnostics": {},
+                "wald_fallback": False,
+            }
+
+        with patch.object(runner, "_single_simulation", side_effect=mixed):
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", UserWarning)
+                result = runner.run_power_simulations(
+                    sample_size=100,
+                    metadata=metadata,
+                    generate_y_func=MagicMock(),
+                    analyze_func=MagicMock(),
+                    create_X_extended_func=MagicMock(),
+                    apply_perturbations_func=_noop_perturbations,
+                )
+            assert "Unknown (returned None)" in result["failure_reasons"]
+
+
+class TestICCMismatchWarning:
+    """ICC mismatch warning when estimated ICC differs by >50%."""
+
+    def test_large_mismatch_warns(self):
+        metadata = _make_metadata(
+            cluster_specs={"school": {"icc": 0.2, "n_clusters": 20, "cluster_size": 10}},
+        )
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            _warn_icc_mismatch(metadata, mean_estimated_icc=0.05)  # 75% deviation
+            assert any("differs from specified" in str(warning.message) for warning in w)
+
+    def test_within_tolerance_no_warning(self):
+        metadata = _make_metadata(
+            cluster_specs={"school": {"icc": 0.2, "n_clusters": 20, "cluster_size": 10}},
+        )
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            _warn_icc_mismatch(metadata, mean_estimated_icc=0.18)  # 10% deviation
+            icc_warnings = [x for x in w if "differs from specified" in str(x.message)]
+            assert len(icc_warnings) == 0
+
+    def test_zero_estimated_icc_no_warning(self):
+        metadata = _make_metadata(
+            cluster_specs={"school": {"icc": 0.2, "n_clusters": 20, "cluster_size": 10}},
+        )
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            _warn_icc_mismatch(metadata, mean_estimated_icc=0.0)
+            icc_warnings = [x for x in w if "differs from specified" in str(x.message)]
+            assert len(icc_warnings) == 0
+
+    def test_no_icc_in_spec_no_warning(self):
+        metadata = _make_metadata(
+            cluster_specs={"school": {"icc": None, "n_clusters": 20, "cluster_size": 10}},
+        )
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            _warn_icc_mismatch(metadata, mean_estimated_icc=0.5)
+            icc_warnings = [x for x in w if "differs from specified" in str(x.message)]
+            assert len(icc_warnings) == 0
diff --git a/tests/unit/test_test_formula_utils.py b/tests/unit/test_test_formula_utils.py
new file mode 100644
index 0000000..f3db882
--- /dev/null
+++ b/tests/unit/test_test_formula_utils.py
@@ -0,0 +1,319 @@
+"""Tests for test_formula parsing utilities."""
+
+from collections import OrderedDict
+from unittest.mock import MagicMock
+
+import numpy as np
+
+
+class TestExtractTestFormulaEffects:
+    """Test _extract_test_formula_effects helper."""
+
+    def _make_registry(
+        self,
+        effect_names,
+        factor_names=None,
+        factor_dummies=None,
+        cluster_effect_names=None,
+    ):
+        """Create a minimal mock registry for testing."""
+        reg = MagicMock()
+        reg.effect_names = effect_names
+        reg.factor_names = factor_names or []
+        reg.cluster_effect_names = cluster_effect_names or []
+
+        # Build _effects dict with correct ordering
+        effects = OrderedDict()
+        for name in effect_names:
+            eff = MagicMock()
+            eff.effect_type = "interaction" if ":" in name else "main"
+            effects[name] = eff
+        reg._effects = effects
+
+        # Factor dummies
+        reg._factor_dummies = factor_dummies or {}
+        return reg
+
+    def test_simple_subset(self):
+        """y ~ x1 + x2 from generation y ~ x1 + x2 + x3."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x3"])
+        effects, random_effects = _extract_test_formula_effects("y ~ x1 + x2", registry)
+        assert effects == ["x1", "x2"]
+        assert random_effects == []
+
+    def test_single_variable(self):
+        """y ~ x1 from generation y ~ x1 + x2 + x3."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x3"])
+        effects, random_effects = _extract_test_formula_effects("y ~ x1", registry)
+        assert effects == ["x1"]
+
+    def test_with_interaction(self):
+        """y ~ x1 + x2 + x1:x2 from generation y ~ x1 + x2 + x3 + x1:x2."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x3", "x1:x2"])
+        effects, _ = _extract_test_formula_effects("y ~ x1 + x2 + x1:x2", registry)
+        assert effects == ["x1", "x2", "x1:x2"]
+
+    def test_interaction_omitted(self):
+        """y ~ x1 + x2 from generation y ~ x1 + x2 + x1:x2."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x1:x2"])
+        effects, _ = _extract_test_formula_effects("y ~ x1 + x2", registry)
+        assert effects == ["x1", "x2"]
+
+    def test_factor_expands_to_dummies(self):
+        """y ~ x1 + gender from generation y ~ x1 + x2 + gender."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(
+            ["x1", "x2", "gender[F]", "gender[Other]"],
+            factor_names=["gender"],
+            factor_dummies={
+                "gender[F]": {"factor_name": "gender", "level": "F"},
+                "gender[Other]": {"factor_name": "gender", "level": "Other"},
+            },
+        )
+        effects, _ = _extract_test_formula_effects("y ~ x1 + gender", registry)
+        assert effects == ["x1", "gender[F]", "gender[Other]"]
+
+    def test_factor_omitted(self):
+        """y ~ x1 from generation y ~ x1 + gender."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(
+            ["x1", "gender[F]", "gender[Other]"],
+            factor_names=["gender"],
+            factor_dummies={
+                "gender[F]": {"factor_name": "gender", "level": "F"},
+                "gender[Other]": {"factor_name": "gender", "level": "Other"},
+            },
+        )
+        effects, _ = _extract_test_formula_effects("y ~ x1", registry)
+        assert effects == ["x1"]
+
+    def test_with_random_effects(self):
+        """y ~ x1 + (1|school) extracts random effects."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2"])
+        effects, random_effects = _extract_test_formula_effects(
+            "y ~ x1 + (1|school)", registry
+        )
+        assert effects == ["x1"]
+        assert len(random_effects) == 1
+        assert random_effects[0]["grouping_var"] == "school"
+
+    def test_star_operator_expands(self):
+        """y ~ x1*x2 expands to x1 + x2 + x1:x2."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x3", "x1:x2"])
+        effects, _ = _extract_test_formula_effects("y ~ x1*x2", registry)
+        assert effects == ["x1", "x2", "x1:x2"]
+
+    def test_equals_sign_formula(self):
+        """y = x1 + x2 works same as y ~ x1 + x2."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x3"])
+        effects, _ = _extract_test_formula_effects("y = x1 + x2", registry)
+        assert effects == ["x1", "x2"]
+
+    def test_preserves_registry_order(self):
+        """Effects returned in registry order, not formula order."""
+        from mcpower.utils.test_formula_utils import _extract_test_formula_effects
+
+        registry = self._make_registry(["x1", "x2", "x3", "x1:x2"])
+        # Formula lists x2 before x1
+        effects, _ = _extract_test_formula_effects("y ~ x2 + x1", registry)
+        assert effects == ["x1", "x2"]  # registry order preserved
+
+
+class TestComputeTestColumnIndices:
+    """Test _compute_test_column_indices helper."""
+
+    def test_subset_two_of_three(self):
+        """Selecting 2 of 3 effects gives correct indices."""
+        from mcpower.utils.test_formula_utils import _compute_test_column_indices
+
+        all_effect_names = ["x1", "x2", "x3"]
+        test_effect_names = ["x1", "x2"]
+        result = _compute_test_column_indices(all_effect_names, test_effect_names)
+        assert list(result) == [0, 1]
+
+    def test_skip_middle(self):
+        """Selecting first and last of 3 effects."""
+        from mcpower.utils.test_formula_utils import _compute_test_column_indices
+
+        all_effect_names = ["x1", "x2", "x3"]
+        test_effect_names = ["x1", "x3"]
+        result = _compute_test_column_indices(all_effect_names, test_effect_names)
+        assert list(result) == [0, 2]
+
+    def test_single_effect(self):
+        """Single effect selected."""
+        from mcpower.utils.test_formula_utils import _compute_test_column_indices
+
+        all_effect_names = ["x1", "x2", "x3"]
+        test_effect_names = ["x2"]
+        result = _compute_test_column_indices(all_effect_names, test_effect_names)
+        assert list(result) == [1]
+
+    def test_all_effects_returns_all_indices(self):
+        """Selecting all effects returns full range."""
+        from mcpower.utils.test_formula_utils import _compute_test_column_indices
+
+        all_effect_names = ["x1", "x2", "x3"]
+        test_effect_names = ["x1", "x2", "x3"]
+        result = _compute_test_column_indices(all_effect_names, test_effect_names)
+        assert list(result) == [0, 1, 2]
+
+    def test_with_interactions(self):
+        """Interaction effects have correct indices."""
+        from mcpower.utils.test_formula_utils import _compute_test_column_indices
+
+        all_effect_names = ["x1", "x2", "x3", "x1:x2"]
+        test_effect_names = ["x1", "x2", "x1:x2"]
+        result = _compute_test_column_indices(all_effect_names, test_effect_names)
+        assert list(result) == [0, 1, 3]
+
+
+class TestRemapTargetIndices:
+    """Test _remap_target_indices helper."""
+
+    def test_simple_remap(self):
+        """Target indices remapped to positions within test columns."""
+        from mcpower.utils.test_formula_utils import _remap_target_indices
+
+        # Original target_indices: [0, 1] (x1, x2 in full model)
+        # test_column_indices: [0, 1] (x1, x2 at positions 0, 1 in X_expanded)
+        # In X_test, x1 is at 0, x2 is at 1 -> remapped: [0, 1]
+        original = np.array([0, 1])
+        test_cols = np.array([0, 1])
+        result = _remap_target_indices(original, test_cols)
+        assert list(result) == [0, 1]
+
+    def test_remap_with_gap(self):
+        """Target indices remapped when test columns skip positions."""
+        from mcpower.utils.test_formula_utils import _remap_target_indices
+
+        # Full model: [x1=0, x2=1, x3=2, x1:x2=3]
+        # Test model: [x1=0, x1:x2=3] -> X_test columns at [0, 3]
+        # target_test="x1" -> original target_indices=[0]
+        # In X_test, x1 is at position 0 -> remapped: [0]
+        original = np.array([0])
+        test_cols = np.array([0, 3])
+        result = _remap_target_indices(original, test_cols)
+        assert list(result) == [0]
+
+    def test_remap_target_at_end(self):
+        """Target index that moves to different position in X_test."""
+        from mcpower.utils.test_formula_utils import _remap_target_indices
+
+        # Full model: [x1=0, x2=1, x3=2]
+        # Test model: [x2=1, x3=2] -> test_column_indices=[1, 2]
+        # target_test="x3" -> original target_indices=[2]
+        # In X_test, x3 is at position 1 (second column) -> remapped: [1]
+        original = np.array([2])
+        test_cols = np.array([1, 2])
+        result = _remap_target_indices(original, test_cols)
+        assert list(result) == [1]
+
+
+class TestPrepareMetadataWithTestFormula:
+    """Integration test: prepare_metadata with test_formula_effects."""
+
+    def test_metadata_has_test_indices_when_provided(self):
+        from mcpower import MCPower
+        from mcpower.core.simulation import prepare_metadata
+
+        model = MCPower("y = x1 + x2 + x3")
+        model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+        model._apply()
+
+        metadata = prepare_metadata(model, ["x1", "x2"], test_formula_effects=["x1", "x2"])
+        assert metadata.test_column_indices is not None
+        assert list(metadata.test_column_indices) == [0, 1]
+        assert metadata.test_target_indices is not None
+        assert metadata.test_effect_count == 2
+
+    def test_metadata_no_test_indices_by_default(self):
+        from mcpower import MCPower
+        from mcpower.core.simulation import prepare_metadata
+
+        model = MCPower("y = x1 + x2")
+        model.set_effects("x1=0.5, x2=0.3")
+        model._apply()
+
+        metadata = prepare_metadata(model, ["x1", "x2"])
+        assert metadata.test_column_indices is None
+
+    def test_remap_skips_targets_not_in_test_formula(self):
+        from mcpower import MCPower
+        from mcpower.core.simulation import prepare_metadata
+
+        model = MCPower("y = x1 + x2 + x3")
+        model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+        model._apply()
+
+        # target_tests = all 3, but test formula only has x1, x2
+        metadata = prepare_metadata(model, ["x1", "x2", "x3"], test_formula_effects=["x1", "x2"])
+        # test_target_indices should only have indices for x1 and x2 in X_test
+        assert len(metadata.test_target_indices) == 2
+
+
+class TestParseTargetTestsWithTestFormula:
+    """Test _parse_target_tests limits 'all' when test_formula is active."""
+
+    def test_all_expands_to_test_formula_effects_only(self):
+        from mcpower import MCPower
+
+        model = MCPower("y = x1 + x2 + x3")
+        model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+        model._apply()
+
+        result = model._parse_target_tests("all", test_formula_effects=["x1", "x2"])
+        assert "x3" not in result
+        assert "x1" in result
+        assert "x2" in result
+        assert "overall" in result
+
+    def test_explicit_target_not_in_test_formula_raises(self):
+        from mcpower import MCPower
+
+        import pytest
+
+        model = MCPower("y = x1 + x2 + x3")
+        model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+        model._apply()
+
+        with pytest.raises(ValueError, match="x3"):
+            model._parse_target_tests("x3", test_formula_effects=["x1", "x2"])
+
+    def test_overall_always_allowed(self):
+        from mcpower import MCPower
+
+        model = MCPower("y = x1 + x2 + x3")
+        model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+        model._apply()
+
+        result = model._parse_target_tests("overall", test_formula_effects=["x1", "x2"])
+        assert "overall" in result
+
+    def test_no_test_formula_uses_all_effects(self):
+        from mcpower import MCPower
+
+        model = MCPower("y = x1 + x2 + x3")
+        model.set_effects("x1=0.5, x2=0.3, x3=0.2")
+        model._apply()
+
+        result = model._parse_target_tests("all")
+        assert "x1" in result
+        assert "x2" in result
+        assert "x3" in result
diff --git a/tests/unit/test_updates.py b/tests/unit/test_updates.py
index ee7a2a0..90b3003 100644
--- a/tests/unit/test_updates.py
+++ b/tests/unit/test_updates.py
@@ -101,14 +101,17 @@ def test_shows_warning_when_newer(self, monkeypatch):
         """Show warning when PyPI version is newer."""
         monkeypatch.delenv("_MCPOWER_UPDATE_CHECKED", raising=False)
 
-        # Write a cache file at the path the installed module actually reads from
-        from datetime import datetime
-
         import mcpower.utils.updates as upd_mod
+
+        # Reset the module-level dedup flag
+        upd_mod._already_checked = False
+
+        # Write a cache file at the path the module actually reads from
+        from datetime import datetime
         from pathlib import Path
 
-        cache_path = Path(upd_mod.__file__).parent.parent / ".mcpower_cache.json"
-        cache_path.parent.mkdir(exist_ok=True)
+        cache_path = Path.home() / ".cache" / "mcpower" / "update_cache.json"
+        cache_path.parent.mkdir(parents=True, exist_ok=True)
         cache_data = {
             "last_check": datetime.now().isoformat(),
             "latest_version": "99.0.0",
@@ -120,5 +123,6 @@ def test_shows_warning_when_newer(self, monkeypatch):
             with pytest.warns(match="NEW MCPower VERSION"):
                 _check_for_updates("1.0.0")
         finally:
-            # Clean up the cache file
+            # Clean up the cache file and reset flag
             cache_path.unlink(missing_ok=True)
+            upd_mod._already_checked = False
diff --git a/tests/unit/test_upload_data_utils.py b/tests/unit/test_upload_data_utils.py
new file mode 100644
index 0000000..c498b6a
--- /dev/null
+++ b/tests/unit/test_upload_data_utils.py
@@ -0,0 +1,62 @@
+"""Unit tests for mcpower.utils.upload_data_utils — normalize_upload_input."""
+
+import numpy as np
+import pytest
+
+from mcpower.utils.upload_data_utils import normalize_upload_input
+
+
+class TestNormalizeUploadInput:
+    """Tests for normalize_upload_input."""
+
+    def test_dict_input(self):
+        data = {"x1": [1.0, 2.0, 3.0], "x2": [4.0, 5.0, 6.0]}
+        arr, cols = normalize_upload_input(data)
+        assert cols == ["x1", "x2"]
+        assert arr.shape == (3, 2)
+        np.testing.assert_array_equal(arr[:, 0], [1.0, 2.0, 3.0])
+
+    def test_dict_with_strings(self):
+        data = {"group": ["a", "b", "a"], "x1": [1.0, 2.0, 3.0]}
+        arr, cols = normalize_upload_input(data)
+        assert arr.dtype == object
+        assert cols == ["group", "x1"]
+
+    def test_list_input(self):
+        data = [1.0, 2.0, 3.0]
+        arr, cols = normalize_upload_input(data)
+        assert arr.shape == (3, 1)
+        assert cols == ["column_1"]
+
+    def test_1d_array(self):
+        data = np.array([1.0, 2.0, 3.0])
+        arr, cols = normalize_upload_input(data)
+        assert arr.shape == (3, 1)
+        assert cols == ["column_1"]
+
+    def test_2d_array(self):
+        data = np.array([[1.0, 2.0], [3.0, 4.0]])
+        arr, cols = normalize_upload_input(data)
+        assert arr.shape == (2, 2)
+        assert cols == ["column_1", "column_2"]
+
+    def test_2d_array_with_columns(self):
+        data = np.array([[1.0, 2.0], [3.0, 4.0]])
+        arr, cols = normalize_upload_input(data, columns=["a", "b"])
+        assert cols == ["a", "b"]
+
+    def test_dataframe_input(self):
+        pd = pytest.importorskip("pandas")
+        df = pd.DataFrame({"x1": [1.0, 2.0], "x2": [3.0, 4.0]})
+        arr, cols = normalize_upload_input(df)
+        assert cols == ["x1", "x2"]
+        assert arr.shape == (2, 2)
+
+    def test_mismatched_columns_raises(self):
+        data = np.array([[1.0, 2.0], [3.0, 4.0]])
+        with pytest.raises(ValueError, match="columns length"):
+            normalize_upload_input(data, columns=["a", "b", "c"])
+
+    def test_unsupported_type_raises(self):
+        with pytest.raises(TypeError, match="data must be"):
+            normalize_upload_input("not valid data")
diff --git a/tests/unit/test_utils_mixed_models.py b/tests/unit/test_utils_mixed_models.py
new file mode 100644
index 0000000..de4d98a
--- /dev/null
+++ b/tests/unit/test_utils_mixed_models.py
@@ -0,0 +1,27 @@
+"""Tests for mcpower.utils.mixed_models backward-compat re-exports."""
+
+import threading
+
+from mcpower.utils.mixed_models import (
+    _lme_analysis_wrapper,
+    _lme_thread_local,
+    reset_warm_start_cache,
+)
+
+
+class TestReExports:
+    """Verify that the backward-compatibility re-exports resolve correctly."""
+
+    def test_lme_analysis_wrapper_is_callable(self):
+        assert callable(_lme_analysis_wrapper)
+
+    def test_lme_thread_local_is_threading_local(self):
+        assert isinstance(_lme_thread_local, threading.local)
+
+    def test_reset_warm_start_cache_is_callable(self):
+        assert callable(reset_warm_start_cache)
+
+    def test_reset_warm_start_cache_clears_params(self):
+        _lme_thread_local.warm_start_params = "dummy"
+        reset_warm_start_cache()
+        assert _lme_thread_local.warm_start_params is None