7xuanlu · 7xuanlu · Apr 28, 2026 · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/.githooks/pre-push b/.githooks/pre-push
@@ -1,10 +1,9 @@
 #!/usr/bin/env bash
 set -euo pipefail
 
-# Skip the heavy workspace clippy + test + coverage gate when the push
-# touches only docs/markdown. The gate exists to catch broken Rust code;
-# docs-only changes can't trigger that, so the gate adds friction without
-# protection here.
+# Skip the workspace clippy + tests when the push touches only docs/markdown.
+# Those checks exist to catch broken Rust code; docs-only changes can't trigger
+# that, so the gate adds friction without protection here.
 ZERO_SHA="0000000000000000000000000000000000000000"
 DOCS_RE='^(README\.md|CHANGELOG\.md|RELEASING\.md|LICENSE|CODE_OF_CONDUCT\.md|CONTRIBUTING\.md|SECURITY\.md|CLAUDE\.md|docs/.*|\.github/.*\.md)$'
 
@@ -28,46 +27,29 @@ if [ -n "$all_changed" ]; then
     non_docs=$(printf '%s\n' "$all_changed" | grep -vE "$DOCS_RE" || true)
     if [ -z "$non_docs" ]; then
         count=$(printf '%s\n' "$all_changed" | wc -l | tr -d ' ')
-        echo "Pre-push: docs-only push ($count file(s)), skipping clippy + tests + coverage."
+        echo "Pre-push: docs-only push ($count file(s)), skipping clippy + tests."
         exit 0
     fi
 fi
 
-echo "Pre-push: running clippy + full test suite with coverage..."
+# Pre-push runs FAST checks: clippy + library tests only. Integration tests
+# (app/tests/eval_harness.rs) need the ONNX BGE model and run for minutes;
+# they belong in CI. Coverage gates are NOT enforced here either — the
+# instrumented `cargo llvm-cov` rebuild can take 5-15min and overload memory.
+# Frontend tests run in CI on every PR. Use `bash scripts/coverage.sh` for a
+# local coverage report. See CLAUDE.md "Local vs CI test responsibilities".
+echo "Pre-push: running fast checks..."
 
 echo "  Running Clippy..."
 cargo clippy --workspace --all-targets -- -D warnings 2>&1 || {
     echo "FAIL: Clippy warnings found. Fix before pushing."
     exit 1
 }
 
-if ! command -v cargo-llvm-cov &> /dev/null && ! cargo llvm-cov --version &> /dev/null 2>&1; then
-    echo "WARNING: cargo-llvm-cov not installed. Running tests without coverage gate."
-    echo "Install with: cargo install cargo-llvm-cov"
-    cargo test --workspace 2>&1 || {
-        echo "FAIL: Rust tests failed."
-        exit 1
-    }
-else
-    echo "  Running Rust tests with coverage..."
-    # Gate on origin-core + origin-server coverage (where testable logic lives).
-    # The app crate is Tauri command proxies — untestable without a GUI runtime.
-    # Tests still run from the app crate (eval_harness etc.) but coverage is
-    # scoped to the library crates via --package flags.
-    cargo llvm-cov \
-        --package origin-core --package origin-server --package origin \
-        --fail-under-lines 90 2>&1 || {
-        echo "FAIL: Rust tests failed or coverage below 90%."
-        echo "Run 'cargo llvm-cov --package origin-core --package origin-server --package origin --html && open target/llvm-cov/html/index.html' to see report."
-        exit 1
-    }
-fi
-
-echo "  Running frontend tests with coverage..."
-pnpm vitest run --coverage 2>&1 || {
-    echo "FAIL: Frontend tests failed or coverage below threshold."
-    echo "Run 'pnpm vitest run --coverage' to see report."
+echo "  Running library tests..."
+cargo test --workspace --lib --quiet 2>&1 || {
+    echo "FAIL: Library tests failed. Fix before pushing."
     exit 1
 }
 
-echo "Pre-push: all checks passed. Safe to push."
+echo "Pre-push: all fast checks passed. Safe to push (CI runs full suite + coverage)."
diff --git a/.github/workflows/coverage.yml b/.github/workflows/coverage.yml
@@ -0,0 +1,98 @@
+name: Coverage (informational)
+
+# Non-blocking coverage report posted to PRs. Pre-push and the main CI lane
+# both skip coverage; this workflow exists to give visibility without slowing
+# the merge gate. See CLAUDE.md "Local vs CI test responsibilities".
+
+on:
+  pull_request:
+    branches: [main]
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
+      - '.github/ISSUE_TEMPLATE/**'
+      - '.github/pull_request_template.md'
+      - 'LICENSE'
+  workflow_dispatch:
+
+# Don't run on every push — only PRs and manual triggers.
+# Cancel in-flight runs when a new commit lands on the PR.
+concurrency:
+  group: coverage-${{ github.ref }}
+  cancel-in-progress: true
+
+env:
+  CARGO_TERM_COLOR: always
+
+jobs:
+  coverage:
+    name: Coverage report
+    runs-on: macos-latest
+    if: >-
+      !startsWith(github.event.head_commit.message, 'chore(main): release')
+    # Informational: never blocks merges. Failing this job is a warning, not a
+    # gate. Required-status-checks should NOT include this job.
+    continue-on-error: true
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install Rust stable
+        uses: dtolnay/rust-toolchain@stable
+        with:
+          components: llvm-tools-preview
+
+      - name: Cache Rust dependencies
+        uses: Swatinem/rust-cache@v2
+
+      - name: Cache FastEmbed ONNX model
+        uses: actions/cache@v4
+        with:
+          path: ~/.fastembed_cache
+          key: fastembed-bge-base-en-v1.5-q-v2
+          restore-keys: |
+            fastembed-bge-base-en-v1.5-q-
+
+      - name: Install pnpm
+        uses: pnpm/action-setup@v4
+
+      - name: Install Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: pnpm
+
+      - name: Install frontend dependencies
+        run: pnpm install
+
+      - name: Create sidecar placeholders for Tauri build script
+        run: |
+          mkdir -p app/binaries
+          touch app/binaries/origin-server-aarch64-apple-darwin
+          touch app/binaries/origin-mcp-aarch64-apple-darwin
+          touch app/binaries/cloudflared-aarch64-apple-darwin
+
+      - name: Install cargo-llvm-cov
+        uses: taiki-e/install-action@cargo-llvm-cov
+
+      - name: Run Rust coverage (origin-core + origin-server)
+        # Skip --package origin (Tauri app); its crate is Tauri command proxies
+        # which can't be exercised meaningfully without a GUI runtime, and
+        # including it explodes memory on the runner.
+        run: |
+          cargo llvm-cov --package origin-core --package origin-server \
+            --summary-only --json --output-path rust-coverage.json
+          cargo llvm-cov --package origin-core --package origin-server \
+            --summary-only
+
+      - name: Run frontend coverage
+        run: pnpm vitest run --coverage
+
+      - name: Upload coverage artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-reports
+          path: |
+            rust-coverage.json
+            coverage/
+          retention-days: 7
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -71,7 +71,37 @@ cargo test -p origin --test eval_harness save_longmemeval_expanded_baseline -- -
 # Baselines saved to app/eval/baselines/*.json (gitignored)
 ```
 
-Frontend tests use Vitest + React Testing Library. Git hooks auto-activate on `pnpm install` -- pre-commit auto-formats and checks compilation, pre-push runs clippy + full tests with 90% coverage gate.
+Frontend tests use Vitest + React Testing Library. Git hooks auto-activate on `pnpm install` -- pre-commit auto-formats and checks compilation, pre-push runs clippy + workspace tests (no coverage gate, see below).
+
+## Local vs CI test responsibilities
+
+Origin runs across several layers. The split is driven by three questions: **(1) Can a hosted runner do this?** (no GPU, no API keys, no cost). **(2) Is it under 60s on cold cache?** **(3) Does it gate correctness or measure quality?** Quality measures never gate.
+
+| Layer | What runs | Where | When | Time | Blocks? |
+|---|---|---|---|---|---|
+| **L1 dev loop** | rust-analyzer / IDE | Local | Every save | <1s | No |
+| **L2 pre-commit** | `cargo fmt --all`, clippy on staged crates, vitest if FE staged | Local | `git commit` | ~5s | Yes |
+| **L3 pre-push** | `cargo clippy --workspace --all-targets`, `cargo test --workspace`, `pnpm vitest run --bail 1` | Local | `git push` | ~60-90s | Yes |
+| **L4 CI on PR** | Same checks workspace-wide, plus `cargo test -p origin --lib`, `pnpm test` | GitHub (`ci.yml`) | Every PR | ~10min | Yes (required) |
+| **L5 coverage on PR** | `cargo llvm-cov` on origin-core + origin-server only; vitest --coverage | GitHub (`coverage.yml`) | Every PR | ~10min | **No (informational)** |
+| **L6 main canary** | Embedding-only eval (`cargo test -p origin-core --lib eval::token_efficiency -- --ignored`) | GitHub (`ci.yml`) | Push to `main` | ~10min | No (post-merge) |
+| **L7 manual local** | `bash scripts/coverage.sh` (HTML coverage), GPU eval suite (`cargo test -- --ignored`), Anthropic batch judge (`ANTHROPIC_API_KEY=... cargo test ...`) | Your laptop | On demand | minutes-hours | No |
+| **L8 pre-release** | Full eval suite vs saved baseline. Record deltas in vault/memory **never git** (AGPL public-repo rule) | Your laptop | Per release | hours | Soft gate |
+
+### What does NOT run in CI and why
+
+- **GPU evals (LongMemEval / LoCoMo runner functions, Qwen3.5-9B inference)** — GitHub macOS runners have no Metal acceleration. The tests are `#[ignore]`d so they don't accidentally run.
+- **Anthropic API batch judge** — costs $0.35/run and requires `ANTHROPIC_API_KEY` which we don't expose to PR runs from forks.
+- **Tauri app coverage** — `--package origin` (the Tauri app crate) is mostly command proxies that can't be exercised without a GUI runtime, and instrumented compilation peaks at 8-16GB RSS. Coverage is scoped to `origin-core + origin-server`.
+
+### Why pre-push doesn't run coverage
+
+Earlier versions of `.githooks/pre-push` enforced a 90% `cargo llvm-cov` gate. That violated the principles above:
+- **Slow:** instrumented rebuild of the Tauri-app-pulling workspace took 5-15min and overloaded memory.
+- **Not mirrored in CI:** the main `ci.yml` lane doesn't run coverage at all, so the gate added local friction without upstream protection.
+- **Percentage gates rot:** any new untestable surface (Tauri commands, GPU-only eval) drops the percentage and forces busywork.
+
+The current pre-push runs only clippy + non-instrumented tests. Coverage is L5 (informational on PR) or L7 (manual command on laptop).
 
 ## Releasing (release-please)