diff --git a/test-env-optimization/murally-ci-build-times-research.md b/test-env-optimization/murally-ci-build-times-research.md new file mode 100644 index 0000000..94d7ab0 --- /dev/null +++ b/test-env-optimization/murally-ci-build-times-research.md @@ -0,0 +1,150 @@ +# Murally CI Build Times Research + +**Date:** 2026-02-06 +**Workflow:** `CI (if needed)` (ID: 33654084) +**Repository:** `tactivos/murally` +**Period:** 2026-01-12 to 2026-02-06 (~25 days) + +--- + +## Data Collection + +- **Source:** GitHub Actions API via `gh` CLI +- **Total unique runs collected:** 2,794 +- **Successful runs with valid duration:** 1,553 +- **Failed runs:** 773 +- **Cancelled runs:** 468 +- **Success rate:** 55.6% + +Duration is measured as `updated_at - run_started_at` for completed, successful runs only. + +--- + +## Summary Statistics + +### All Successful Runs (n=1,553) + +| Metric | Value | +|-------------|-----------| +| Min | 0.43 min | +| Max | 56.90 min | +| Mean | 7.90 min | +| Std Dev | 5.92 min | +| Median (P50) | 8.97 min | +| P75 | 11.15 min | +| P90 | 13.78 min | +| **P95** | **18.47 min** | +| P99 | 23.62 min | + +### Full CI Builds Only (>=2 min, n=1,107) + +446 runs (28.7%) completed in under 2 minutes (mean 0.67 min). These are short-circuit/skip runs where CI determined no build was needed. Excluding those: + +| Metric | Value | +|-------------|-----------| +| Min | 2.12 min | +| Max | 56.90 min | +| **Mean** | **10.81 min** | +| **Std Dev** | **4.44 min** | +| Median (P50) | 10.10 min | +| P75 | 12.13 min | +| P90 | 15.12 min | +| **P95** | **21.60 min** | +| P99 | 24.67 min | + +--- + +## Histogram — Full CI Builds (>=2 min) + +``` + Duration | Distribution Count + ------------|------------------------------------------------------------ + 2- 4 min | ######## 48 + 4- 6 min | ######## 48 + 6- 8 min | ################# 97 + 8- 10 min | ############################################################338 + 10- 12 min | ################################################### 288 + 12- 14 min | ########################## 147 + 14- 16 min | ####### 43 + 16- 18 min | ## 16 + 18- 20 min | ### 22 + 20- 22 min | ## 16 + 22- 24 min | ##### 30 + 24- 26 min | # 7 + 26- 28 min | 3 + 28- 30 min | 2 + 30- 32 min | 1 + 56- 58 min | 1 (outlier) +``` + +**Distribution shape:** The bulk of builds (83%) fall in the 6–14 minute range, with a clear mode at 8–12 minutes. There is a long right tail with occasional builds extending to 20+ minutes. + +--- + +## Daily Average Build Time + +| Date | Avg Build Time | Run Count | +|-----------|---------------|-----------| +| 2026-01-12 | 7.78 min | 47 | +| 2026-01-13 | 8.65 min | 86 | +| 2026-01-14 | 8.13 min | 76 | +| 2026-01-15 | 9.66 min | 103 | +| 2026-01-16 | 9.01 min | 90 | +| 2026-01-17 | 10.29 min | 7 | +| 2026-01-19 | 10.45 min | 49 | +| 2026-01-20 | 9.66 min | 104 | +| 2026-01-21 | 8.60 min | 71 | +| 2026-01-22 | 7.78 min | 56 | +| 2026-01-23 | 7.84 min | 69 | +| 2026-01-26 | 8.04 min | 48 | +| 2026-01-27 | 8.70 min | 89 | +| 2026-01-28 | 6.48 min | 58 | +| 2026-01-29 | 5.42 min | 77 | +| 2026-01-30 | 6.63 min | 65 | +| 2026-01-31 | 5.51 min | 5 | +| 2026-02-01 | 7.19 min | 2 | +| 2026-02-02 | 6.23 min | 86 | +| 2026-02-03 | 8.25 min | 98 | +| 2026-02-04 | 6.24 min | 97 | +| 2026-02-05 | 7.62 min | 80 | +| 2026-02-06 | 6.32 min | 90 | + +**Trend:** Average build times appear to have decreased slightly from mid-January (~9-10 min) to early February (~6-8 min). Weekends (Jan 17-18, Jan 31-Feb 1) show very low volume. + +--- + +## Run Outcome Breakdown + +| Conclusion | Count | Percentage | +|------------|-------|------------| +| Success | 1,553 | 55.6% | +| Failure | 773 | 27.7% | +| Cancelled | 468 | 16.7% | + +--- + +## Key Findings + +1. **Typical build time is ~10 minutes.** The median full build takes 10.10 minutes, with most builds completing in the 8–12 minute range. + +2. **P95 is ~21.6 minutes for full builds.** 1 in 20 builds takes over 21 minutes. This is roughly 2x the median, suggesting queue contention or resource constraints on slower runs. + +3. **28.7% of CI runs are skipped** (< 2 min), meaning the "CI (if needed)" conditional logic is working — almost a third of pushes don't trigger a full build. + +4. **~45% of runs fail or get cancelled.** The 27.7% failure rate and 16.7% cancellation rate are notable and may warrant separate investigation. + +5. **The 22-24 min bucket has a small secondary bump (30 runs)** which could indicate a subset of builds hitting a specific bottleneck (e.g., runner queuing, specific test suite timing out and retrying). + +6. **One extreme outlier at ~57 minutes** — likely a runner issue or resource starvation event. + +7. **Build times trended down ~20-30% from mid-Jan to early Feb**, from averages of 9-10 min to 6-8 min. This could reflect infrastructure improvements, caching improvements, or changes in PR volume/complexity. + +--- + +## Methodology Notes + +- Data was collected via GitHub Actions API (`gh api repos/tactivos/murally/actions/workflows/...`) paginated in 100-run batches. +- GitHub API caps results at 1,000 per query window; three overlapping queries were used to cover the full period, then deduplicated by run ID. +- Duration = `updated_at - run_started_at` (excludes queue wait time before the run starts). +- Only `conclusion: success` runs are included in timing analysis. Failed/cancelled runs were excluded since their duration doesn't reflect full build time. +- Runs with duration > 180 minutes were filtered as data anomalies. diff --git a/test-env-optimization/pr-volume-risk-analysis.md b/test-env-optimization/pr-volume-risk-analysis.md new file mode 100644 index 0000000..abf19a7 --- /dev/null +++ b/test-env-optimization/pr-volume-risk-analysis.md @@ -0,0 +1,279 @@ +# Pull Request Volume & Auto-Provisioning Risk Analysis + +**Author**: Willis Kirkham +**Analysis Date**: February 3, 2026 +**Data Period**: February 2025 - February 2026 (52 weeks) + +## Executive Summary + +Auto-provisioning test environments for all PRs would result in approximately **41 concurrent environments on average**, with peaks reaching **69 environments**. This is well within current infrastructure capacity. + +| Metric | Value | +|--------|-------| +| Average concurrent environments | 41 | +| Peak concurrent environments | 69 | +| P95 concurrent environments | 55 | +| Cross-repo deduplication savings | 9.4% | + +**Key assumption**: Environments expire after 7 days of inactivity (current TTL policy). See [Appendix B](#appendix-b-ttl-configuration-and-impact) for details. + +**Baseline validation**: Current cluster has 32 legitimate environments (28 PR-linked + 4 pinned). At ~50% opt-in, this aligns with projections showing 41 average at 100% opt-in. + +**Recommendation**: Proceed with auto-provisioning. No infrastructure scaling required. + +--- + +## The Question + +**Decision**: Should we auto-provision test environments for all PRs, rather than requiring manual provisioning? + +**Current state**: Approximately 50% of PRs receive test environments (those where developers manually request them). + +**Proposed state**: 100% of PRs receive test environments automatically. + +**Stakes**: If concurrent environment count exceeds infrastructure capacity (~70-100 environments), we would face provisioning delays, increased costs, or service degradation. + +--- + +## Key Findings + +### Concurrent Environment Projections + +Analysis of 8,746 hourly data points projects the following concurrent environment counts: + +| Metric | Value | Interpretation | +|--------|-------|----------------| +| **Average** | 41 | Typical infrastructure load | +| **Median (P50)** | 42 | Half of all hours below this | +| **P95** | 55 | 95% of all hours below this | +| **Peak** | 69 | Maximum observed (July 29, 2025) | + +### Infrastructure Assessment + +| Resource | Current Capacity | Required (peak + 25% headroom) | Status | +|----------|------------------|--------------------------------|--------| +| K8s nodes | ~70-100 envs | ~85 envs | ✓ Sufficient | +| MongoDB | ~100 connections | ~85 connections | ✓ Sufficient | +| Azure resources | Current allocation | Minimal increase | ✓ Sufficient | + +Current infrastructure supports the projected load with adequate headroom. + +### Baseline Validation + +A point-in-time measurement (February 3, 2026) cross-referenced cluster namespaces and open PRs: + +| Category | Count | Notes | +|----------|-------|-------| +| PR-linked environments | 28 | Active development (~50% of open PRs) | +| Pinned environments | 4 | Intentionally kept (demos, fixtures) | +| **Total legitimate** | **32** | | + +**Validation**: With 28 PR-linked environments at ~50% opt-in, scaling to 100% opt-in implies ~56 concurrent environments. This is close to the projected average of 41 and well below the projected peak of 69, confirming the model's accuracy. + +**Note**: The cluster also contains 24 orphan namespaces from a cleanup bug that should be addressed separately. See [Appendix C](#appendix-c-orphan-namespace-cleanup) for details. + +--- + +## Supporting Analysis + +### PR Volume + +Over 52 weeks, both repositories show consistent PR creation patterns: + +| Metric | murally | mural-api | Combined | +|--------|---------|-----------|----------| +| Total PRs | 4,304 | 2,390 | 6,694 | +| Unique branches | 4,230 | 2,335 | 6,135* | +| Weekly average | 81.3 | 44.9 | 118.0* | +| Cross-repo matches | - | - | 430 (6.5% of PRs) | + +*After cross-repo deduplication + +### PR Lifespan Distribution + +PR lifespan explains why 118 weekly branches result in only 41 average concurrent environments: + +| Duration | % of PRs | Cumulative | +|----------|----------|------------| +| 0-1 hour | 12.7% | 12.7% | +| 1-4 hours | 14.8% | 27.5% | +| 4-24 hours | 21.0% | 48.5% | +| 1-2 days | 11.5% | 60.0% | +| 2-7 days | 22.2% | 82.2% | +| **>7 days** | **17.8%** | 100% | + +82% of PRs close within 7 days. The remaining 18% have their environments capped by the TTL policy, preventing accumulation. + +### Cross-Repo Deduplication + +When the same branch exists in both murally and mural-api, they share a single environment: + +| Metric | Value | +|--------|-------| +| Max branches active in both repos simultaneously | 10 | +| Average branches active in both repos | 4.3 | +| Deduplication savings | 9.4% of concurrent environments | + +Cross-repo branches represent features spanning both repositories—typically larger changes that take longer to complete. Their longer lifespan means deduplication saves 9.4% of environments despite representing only 6.5% of PRs. + +--- + +## Risk Assessment + +### Capacity Exceeds Projections + +| | | +|---|---| +| **Severity** | Low | +| **Likelihood** | Low | + +Peak concurrent environments (69) are well within infrastructure capacity (~70-100 environments). + +**Monitoring recommendations**: +1. Track concurrent environment count in real-time +2. Alert at 60, 70, and 80 concurrent environments +3. No preemptive scaling required + +### TTL Bypass via User Interaction + +| | | +|---|---| +| **Severity** | Low | +| **Likelihood** | Low | + +Users could keep environments alive indefinitely by periodically interacting with them. Current data does not suggest this is a significant pattern. + +**Monitoring recommendations**: +1. Track environment age distribution +2. Consider 14-day absolute TTL cap if abuse is observed + +### Unexpected Cost Increase + +| | | +|---|---| +| **Severity** | Low | +| **Likelihood** | Low | + +With concurrent environments similar to current levels, cost increase is minimal. + +--- + +## Recommendations + +### Infrastructure + +1. **No scaling required** — current capacity supports projected load +2. **Add monitoring** — track concurrent environments and set alerts +3. **Review after 2 weeks** — validate projections against actual data + +### Policy + +1. **Proceed with auto-provisioning** — infrastructure risk is low +2. **Maintain current TTL** — 7-day inactivity TTL is effective +3. **Optional opt-out** — support `[skip-env]` label for PRs that don't need environments + +### Rollout + +1. Enable for both repos simultaneously +2. Monitor for 2 weeks to validate projections +3. Adjust only if needed + +--- + +## Success Criteria + +| Metric | Target | Confidence | +|--------|--------|------------| +| Peak concurrent envs | <85 | High | +| Average concurrent envs | <50 | High | +| Provisioning queue time | <5 min | High | +| Infrastructure scaling needed | No | High | + +--- + +## Appendix A: Methodology + +### Modeling Approach + +Environment lifespan is modeled as `min(PR_lifespan, 7_days)` to reflect the TTL policy. This simulates real-world behavior where environments are deleted after 7 days of inactivity, regardless of PR status. + +Concurrent environments at each hour are calculated by counting open PRs (with TTL-capped lifespans) across both repositories, deduplicating branches that exist in both. + +### Data Sources + +- GitHub API for PR creation and close timestamps +- 4,301 murally PRs (Feb 2025 - Feb 2026) +- 2,381 mural-api PRs (Feb 2025 - Feb 2026) +- 8,746 hourly data points generated + +### Assumptions + +1. Environments expire at 7-day TTL (no user interaction modeled) +2. All PRs receive auto-provisioned environments +3. Cross-repo branches share a single environment +4. PR close triggers environment deletion + +--- + +## Appendix B: TTL Configuration and Impact + +### Current Configuration + +| Setting | Value | +|---------|-------| +| TTL duration | 168 hours (7 days) | +| Trigger | Inactivity (time since `lastInteractionAt`) | +| Reset actions | Override secrets, extend environment | +| PR close behavior | Triggers deletion via webhook | +| Protection | `preventDeletion: true` flag exempts environments | + +### Why TTL Is Effective + +The 7-day TTL caps the "long tail" of PRs that stay open for extended periods: + +| Metric | murally | mural-api | +|--------|---------|-----------| +| Raw P90 lifespan | 317 hrs (13 days) | 710 hrs (30 days) | +| Raw P95 lifespan | 1,058 hrs (44 days) | 1,660 hrs (69 days) | +| **Effective lifespan** | **168 hrs (7 days)** | **168 hrs (7 days)** | + +### Impact of TTL on Projections + +Without the TTL mechanism, concurrent environments would be significantly higher: + +| Metric | With TTL | Without TTL | +|--------|----------|-------------| +| Average concurrent | 41 | ~132 | +| Peak concurrent | 69 | ~194 | + +The TTL reduces concurrent environments by approximately 68%, making auto-provisioning feasible within current infrastructure capacity. + +--- + +## Appendix C: Orphan Namespace Cleanup + +### Problem + +The baseline measurement identified 24 orphan test envs—envs where the test env was supposed to be deleted but it wasn't. + +| Category | Count | +|----------|-------| +| Legitimate environments | 32 | +| Orphan namespaces | 24 | +| **Total namespaces observed** | **56** | + +### Impact + +- These namespaces consume cluster resources unnecessarily +- They do not affect the auto-provisioning analysis (projections are based on legitimate environments only) +- The discrepancy between namespace count (56) and CRD count (32) initially suggested the model might be underestimating, but investigation confirmed the model is accurate + +### Root Cause + +The `test-envs-operator` successfully deletes TestEnv CRDs when environments become stale, but namespace deletion is failing. This is a separate operational issue from auto-provisioning. + +### Recommendations + +1. **Investigate operator** — determine why namespace deletion fails after CRD cleanup +2. **Clean up orphans** — delete the 24 orphan namespaces to recover resources +3. **Add monitoring** — alert when namespace count diverges from active CRD count