diff --git a/test-env-optimization/murally-ci-build-times-research.md b/test-env-optimization/murally-ci-build-times-research.md
new file mode 100644
index 0000000..94d7ab0
--- /dev/null
+++ b/test-env-optimization/murally-ci-build-times-research.md
@@ -0,0 +1,150 @@
+# Murally CI Build Times Research
+
+**Date:** 2026-02-06
+**Workflow:** `CI (if needed)` (ID: 33654084)
+**Repository:** `tactivos/murally`
+**Period:** 2026-01-12 to 2026-02-06 (~25 days)
+
+---
+
+## Data Collection
+
+- **Source:** GitHub Actions API via `gh` CLI
+- **Total unique runs collected:** 2,794
+- **Successful runs with valid duration:** 1,553
+- **Failed runs:** 773
+- **Cancelled runs:** 468
+- **Success rate:** 55.6%
+
+Duration is measured as `updated_at - run_started_at` for completed, successful runs only.
+
+---
+
+## Summary Statistics
+
+### All Successful Runs (n=1,553)
+
+| Metric       | Value     |
+|-------------|-----------|
+| Min          | 0.43 min  |
+| Max          | 56.90 min |
+| Mean         | 7.90 min  |
+| Std Dev      | 5.92 min  |
+| Median (P50) | 8.97 min  |
+| P75          | 11.15 min |
+| P90          | 13.78 min |
+| **P95**      | **18.47 min** |
+| P99          | 23.62 min |
+
+### Full CI Builds Only (>=2 min, n=1,107)
+
+446 runs (28.7%) completed in under 2 minutes (mean 0.67 min). These are short-circuit/skip runs where CI determined no build was needed. Excluding those:
+
+| Metric       | Value     |
+|-------------|-----------|
+| Min          | 2.12 min  |
+| Max          | 56.90 min |
+| **Mean**     | **10.81 min** |
+| **Std Dev**  | **4.44 min**  |
+| Median (P50) | 10.10 min |
+| P75          | 12.13 min |
+| P90          | 15.12 min |
+| **P95**      | **21.60 min** |
+| P99          | 24.67 min |
+
+---
+
+## Histogram — Full CI Builds (>=2 min)
+
+```
+  Duration    | Distribution                                               Count
+  ------------|------------------------------------------------------------
+   2-  4 min  | ########                                                    48
+   4-  6 min  | ########                                                    48
+   6-  8 min  | #################                                           97
+   8- 10 min  | ############################################################338
+  10- 12 min  | ###################################################         288
+  12- 14 min  | ##########################                                  147
+  14- 16 min  | #######                                                     43
+  16- 18 min  | ##                                                          16
+  18- 20 min  | ###                                                         22
+  20- 22 min  | ##                                                          16
+  22- 24 min  | #####                                                       30
+  24- 26 min  | #                                                           7
+  26- 28 min  |                                                             3
+  28- 30 min  |                                                             2
+  30- 32 min  |                                                             1
+  56- 58 min  |                                                             1 (outlier)
+```
+
+**Distribution shape:** The bulk of builds (83%) fall in the 6–14 minute range, with a clear mode at 8–12 minutes. There is a long right tail with occasional builds extending to 20+ minutes.
+
+---
+
+## Daily Average Build Time
+
+| Date       | Avg Build Time | Run Count |
+|-----------|---------------|-----------|
+| 2026-01-12 | 7.78 min       | 47        |
+| 2026-01-13 | 8.65 min       | 86        |
+| 2026-01-14 | 8.13 min       | 76        |
+| 2026-01-15 | 9.66 min       | 103       |
+| 2026-01-16 | 9.01 min       | 90        |
+| 2026-01-17 | 10.29 min      | 7         |
+| 2026-01-19 | 10.45 min      | 49        |
+| 2026-01-20 | 9.66 min       | 104       |
+| 2026-01-21 | 8.60 min       | 71        |
+| 2026-01-22 | 7.78 min       | 56        |
+| 2026-01-23 | 7.84 min       | 69        |
+| 2026-01-26 | 8.04 min       | 48        |
+| 2026-01-27 | 8.70 min       | 89        |
+| 2026-01-28 | 6.48 min       | 58        |
+| 2026-01-29 | 5.42 min       | 77        |
+| 2026-01-30 | 6.63 min       | 65        |
+| 2026-01-31 | 5.51 min       | 5         |
+| 2026-02-01 | 7.19 min       | 2         |
+| 2026-02-02 | 6.23 min       | 86        |
+| 2026-02-03 | 8.25 min       | 98        |
+| 2026-02-04 | 6.24 min       | 97        |
+| 2026-02-05 | 7.62 min       | 80        |
+| 2026-02-06 | 6.32 min       | 90        |
+
+**Trend:** Average build times appear to have decreased slightly from mid-January (~9-10 min) to early February (~6-8 min). Weekends (Jan 17-18, Jan 31-Feb 1) show very low volume.
+
+---
+
+## Run Outcome Breakdown
+
+| Conclusion  | Count | Percentage |
+|------------|-------|------------|
+| Success     | 1,553 | 55.6%      |
+| Failure     | 773   | 27.7%      |
+| Cancelled   | 468   | 16.7%      |
+
+---
+
+## Key Findings
+
+1. **Typical build time is ~10 minutes.** The median full build takes 10.10 minutes, with most builds completing in the 8–12 minute range.
+
+2. **P95 is ~21.6 minutes for full builds.** 1 in 20 builds takes over 21 minutes. This is roughly 2x the median, suggesting queue contention or resource constraints on slower runs.
+
+3. **28.7% of CI runs are skipped** (< 2 min), meaning the "CI (if needed)" conditional logic is working — almost a third of pushes don't trigger a full build.
+
+4. **~45% of runs fail or get cancelled.** The 27.7% failure rate and 16.7% cancellation rate are notable and may warrant separate investigation.
+
+5. **The 22-24 min bucket has a small secondary bump (30 runs)** which could indicate a subset of builds hitting a specific bottleneck (e.g., runner queuing, specific test suite timing out and retrying).
+
+6. **One extreme outlier at ~57 minutes** — likely a runner issue or resource starvation event.
+
+7. **Build times trended down ~20-30% from mid-Jan to early Feb**, from averages of 9-10 min to 6-8 min. This could reflect infrastructure improvements, caching improvements, or changes in PR volume/complexity.
+
+---
+
+## Methodology Notes
+
+- Data was collected via GitHub Actions API (`gh api repos/tactivos/murally/actions/workflows/...`) paginated in 100-run batches.
+- GitHub API caps results at 1,000 per query window; three overlapping queries were used to cover the full period, then deduplicated by run ID.
+- Duration = `updated_at - run_started_at` (excludes queue wait time before the run starts).
+- Only `conclusion: success` runs are included in timing analysis. Failed/cancelled runs were excluded since their duration doesn't reflect full build time.
+- Runs with duration > 180 minutes were filtered as data anomalies.
diff --git a/test-env-optimization/pr-volume-risk-analysis.md b/test-env-optimization/pr-volume-risk-analysis.md
new file mode 100644
index 0000000..abf19a7
--- /dev/null
+++ b/test-env-optimization/pr-volume-risk-analysis.md
@@ -0,0 +1,279 @@
+# Pull Request Volume & Auto-Provisioning Risk Analysis
+
+**Author**: Willis Kirkham  
+**Analysis Date**: February 3, 2026  
+**Data Period**: February 2025 - February 2026 (52 weeks)
+
+## Executive Summary
+
+Auto-provisioning test environments for all PRs would result in approximately **41 concurrent environments on average**, with peaks reaching **69 environments**. This is well within current infrastructure capacity.
+
+| Metric | Value |
+|--------|-------|
+| Average concurrent environments | 41 |
+| Peak concurrent environments | 69 |
+| P95 concurrent environments | 55 |
+| Cross-repo deduplication savings | 9.4% |
+
+**Key assumption**: Environments expire after 7 days of inactivity (current TTL policy). See [Appendix B](#appendix-b-ttl-configuration-and-impact) for details.
+
+**Baseline validation**: Current cluster has 32 legitimate environments (28 PR-linked + 4 pinned). At ~50% opt-in, this aligns with projections showing 41 average at 100% opt-in.
+
+**Recommendation**: Proceed with auto-provisioning. No infrastructure scaling required.
+
+---
+
+## The Question
+
+**Decision**: Should we auto-provision test environments for all PRs, rather than requiring manual provisioning?
+
+**Current state**: Approximately 50% of PRs receive test environments (those where developers manually request them).
+
+**Proposed state**: 100% of PRs receive test environments automatically.
+
+**Stakes**: If concurrent environment count exceeds infrastructure capacity (~70-100 environments), we would face provisioning delays, increased costs, or service degradation.
+
+---
+
+## Key Findings
+
+### Concurrent Environment Projections
+
+Analysis of 8,746 hourly data points projects the following concurrent environment counts:
+
+| Metric | Value | Interpretation |
+|--------|-------|----------------|
+| **Average** | 41 | Typical infrastructure load |
+| **Median (P50)** | 42 | Half of all hours below this |
+| **P95** | 55 | 95% of all hours below this |
+| **Peak** | 69 | Maximum observed (July 29, 2025) |
+
+### Infrastructure Assessment
+
+| Resource | Current Capacity | Required (peak + 25% headroom) | Status |
+|----------|------------------|--------------------------------|--------|
+| K8s nodes | ~70-100 envs | ~85 envs | ✓ Sufficient |
+| MongoDB | ~100 connections | ~85 connections | ✓ Sufficient |
+| Azure resources | Current allocation | Minimal increase | ✓ Sufficient |
+
+Current infrastructure supports the projected load with adequate headroom.
+
+### Baseline Validation
+
+A point-in-time measurement (February 3, 2026) cross-referenced cluster namespaces and open PRs:
+
+| Category | Count | Notes |
+|----------|-------|-------|
+| PR-linked environments | 28 | Active development (~50% of open PRs) |
+| Pinned environments | 4 | Intentionally kept (demos, fixtures) |
+| **Total legitimate** | **32** | |
+
+**Validation**: With 28 PR-linked environments at ~50% opt-in, scaling to 100% opt-in implies ~56 concurrent environments. This is close to the projected average of 41 and well below the projected peak of 69, confirming the model's accuracy.
+
+**Note**: The cluster also contains 24 orphan namespaces from a cleanup bug that should be addressed separately. See [Appendix C](#appendix-c-orphan-namespace-cleanup) for details.
+
+---
+
+## Supporting Analysis
+
+### PR Volume
+
+Over 52 weeks, both repositories show consistent PR creation patterns:
+
+| Metric | murally | mural-api | Combined |
+|--------|---------|-----------|----------|
+| Total PRs | 4,304 | 2,390 | 6,694 |
+| Unique branches | 4,230 | 2,335 | 6,135* |
+| Weekly average | 81.3 | 44.9 | 118.0* |
+| Cross-repo matches | - | - | 430 (6.5% of PRs) |
+
+*After cross-repo deduplication
+
+### PR Lifespan Distribution
+
+PR lifespan explains why 118 weekly branches result in only 41 average concurrent environments:
+
+| Duration | % of PRs | Cumulative |
+|----------|----------|------------|
+| 0-1 hour | 12.7% | 12.7% |
+| 1-4 hours | 14.8% | 27.5% |
+| 4-24 hours | 21.0% | 48.5% |
+| 1-2 days | 11.5% | 60.0% |
+| 2-7 days | 22.2% | 82.2% |
+| **>7 days** | **17.8%** | 100% |
+
+82% of PRs close within 7 days. The remaining 18% have their environments capped by the TTL policy, preventing accumulation.
+
+### Cross-Repo Deduplication
+
+When the same branch exists in both murally and mural-api, they share a single environment:
+
+| Metric | Value |
+|--------|-------|
+| Max branches active in both repos simultaneously | 10 |
+| Average branches active in both repos | 4.3 |
+| Deduplication savings | 9.4% of concurrent environments |
+
+Cross-repo branches represent features spanning both repositories—typically larger changes that take longer to complete. Their longer lifespan means deduplication saves 9.4% of environments despite representing only 6.5% of PRs.
+
+---
+
+## Risk Assessment
+
+### Capacity Exceeds Projections
+
+| | |
+|---|---|
+| **Severity** | Low |
+| **Likelihood** | Low |
+
+Peak concurrent environments (69) are well within infrastructure capacity (~70-100 environments).
+
+**Monitoring recommendations**:
+1. Track concurrent environment count in real-time
+2. Alert at 60, 70, and 80 concurrent environments
+3. No preemptive scaling required
+
+### TTL Bypass via User Interaction
+
+| | |
+|---|---|
+| **Severity** | Low |
+| **Likelihood** | Low |
+
+Users could keep environments alive indefinitely by periodically interacting with them. Current data does not suggest this is a significant pattern.
+
+**Monitoring recommendations**:
+1. Track environment age distribution
+2. Consider 14-day absolute TTL cap if abuse is observed
+
+### Unexpected Cost Increase
+
+| | |
+|---|---|
+| **Severity** | Low |
+| **Likelihood** | Low |
+
+With concurrent environments similar to current levels, cost increase is minimal.
+
+---
+
+## Recommendations
+
+### Infrastructure
+
+1. **No scaling required** — current capacity supports projected load
+2. **Add monitoring** — track concurrent environments and set alerts
+3. **Review after 2 weeks** — validate projections against actual data
+
+### Policy
+
+1. **Proceed with auto-provisioning** — infrastructure risk is low
+2. **Maintain current TTL** — 7-day inactivity TTL is effective
+3. **Optional opt-out** — support `[skip-env]` label for PRs that don't need environments
+
+### Rollout
+
+1. Enable for both repos simultaneously
+2. Monitor for 2 weeks to validate projections
+3. Adjust only if needed
+
+---
+
+## Success Criteria
+
+| Metric | Target | Confidence |
+|--------|--------|------------|
+| Peak concurrent envs | <85 | High |
+| Average concurrent envs | <50 | High |
+| Provisioning queue time | <5 min | High |
+| Infrastructure scaling needed | No | High |
+
+---
+
+## Appendix A: Methodology
+
+### Modeling Approach
+
+Environment lifespan is modeled as `min(PR_lifespan, 7_days)` to reflect the TTL policy. This simulates real-world behavior where environments are deleted after 7 days of inactivity, regardless of PR status.
+
+Concurrent environments at each hour are calculated by counting open PRs (with TTL-capped lifespans) across both repositories, deduplicating branches that exist in both.
+
+### Data Sources
+
+- GitHub API for PR creation and close timestamps
+- 4,301 murally PRs (Feb 2025 - Feb 2026)
+- 2,381 mural-api PRs (Feb 2025 - Feb 2026)
+- 8,746 hourly data points generated
+
+### Assumptions
+
+1. Environments expire at 7-day TTL (no user interaction modeled)
+2. All PRs receive auto-provisioned environments
+3. Cross-repo branches share a single environment
+4. PR close triggers environment deletion
+
+---
+
+## Appendix B: TTL Configuration and Impact
+
+### Current Configuration
+
+| Setting | Value |
+|---------|-------|
+| TTL duration | 168 hours (7 days) |
+| Trigger | Inactivity (time since `lastInteractionAt`) |
+| Reset actions | Override secrets, extend environment |
+| PR close behavior | Triggers deletion via webhook |
+| Protection | `preventDeletion: true` flag exempts environments |
+
+### Why TTL Is Effective
+
+The 7-day TTL caps the "long tail" of PRs that stay open for extended periods:
+
+| Metric | murally | mural-api |
+|--------|---------|-----------|
+| Raw P90 lifespan | 317 hrs (13 days) | 710 hrs (30 days) |
+| Raw P95 lifespan | 1,058 hrs (44 days) | 1,660 hrs (69 days) |
+| **Effective lifespan** | **168 hrs (7 days)** | **168 hrs (7 days)** |
+
+### Impact of TTL on Projections
+
+Without the TTL mechanism, concurrent environments would be significantly higher:
+
+| Metric | With TTL | Without TTL |
+|--------|----------|-------------|
+| Average concurrent | 41 | ~132 |
+| Peak concurrent | 69 | ~194 |
+
+The TTL reduces concurrent environments by approximately 68%, making auto-provisioning feasible within current infrastructure capacity.
+
+---
+
+## Appendix C: Orphan Namespace Cleanup
+
+### Problem
+
+The baseline measurement identified 24 orphan test envs—envs where the test env was supposed to be deleted but it wasn't.
+
+| Category | Count |
+|----------|-------|
+| Legitimate environments | 32 |
+| Orphan namespaces | 24 |
+| **Total namespaces observed** | **56** |
+
+### Impact
+
+- These namespaces consume cluster resources unnecessarily
+- They do not affect the auto-provisioning analysis (projections are based on legitimate environments only)
+- The discrepancy between namespace count (56) and CRD count (32) initially suggested the model might be underestimating, but investigation confirmed the model is accurate
+
+### Root Cause
+
+The `test-envs-operator` successfully deletes TestEnv CRDs when environments become stale, but namespace deletion is failing. This is a separate operational issue from auto-provisioning.
+
+### Recommendations
+
+1. **Investigate operator** — determine why namespace deletion fails after CRD cleanup
+2. **Clean up orphans** — delete the 24 orphan namespaces to recover resources
+3. **Add monitoring** — alert when namespace count diverges from active CRD count