Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/product/e2e-attribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# E2E-to-code attribution

How Terrain links e2e tests back to the source code they exercise —
and the explicit limit of that linking in 0.2.

## The problem

Unit and integration tests *import* the code they exercise. Their
imports become edges in Terrain's import graph, and from those edges
Terrain can answer: "if I change `src/auth/login.ts`, which tests
should I run?" The graph traversal is sound; the imports are
ground truth; the attribution is precise.

E2E tests don't work that way. A Playwright spec navigates to
`http://localhost:3000/login`, types into a form, clicks a button,
and asserts on rendered DOM. Nothing in that file imports
`src/auth/login.ts`. The test exercises the same code path that
unit tests exercise — but the link between the two is not in the
import graph.

This is a real limit, not a Terrain shortcoming. Every static-
analysis tool faces it: e2e tests are deliberately decoupled from
implementation so they survive refactors, and that decoupling
removes the import-graph signal Terrain relies on for unit /
integration attribution.

## What 0.2 does

Terrain attributes e2e tests to code units only via **structural**
signals. Adopters should read these as best-effort heuristics, not
as ground truth.

### 1. Path co-location

If `e2e/login.spec.ts` lives next to a feature directory whose
sibling unit tests link to specific code units, e2e attribution
borrows those links transitively. Confidence: **medium**. Common
case: a monorepo packages directory where each feature owns both
its unit tests and its e2e specs.

### 2. Declared associations in framework configs

Playwright and Cypress configs sometimes declare which routes /
features each spec exercises (via `testMatch` patterns or
`describe()` titles). When parseable, Terrain folds those
declarations into the link set. Confidence: **medium-high** when
the config is explicit, **low** when only the test name carries
the signal.

### 3. Shared fixture paths

If an e2e spec imports a fixture file (page-object, test-data,
auth-helper), and that fixture is imported by other tests with
known code-unit links, Terrain transitively links the e2e spec to
those code units. Confidence: **low** — fixtures are often shared
across many features and the transitive link can be loose.

### 4. Convention-based mapping (last resort)

For repos without explicit configs or co-location structure,
Terrain falls back to convention: the `e2e/auth/` directory is
assumed to exercise `src/auth/`. Confidence: **low**, marked as
`structural-only` in evidence.

## What 0.2 explicitly does NOT do

These are out of scope and remain so until 0.3 or later. We
document them up front so adopters don't infer guarantees that
aren't there.

### Runtime trace ingestion

A natural way to close the gap is to run the e2e suite once with
coverage instrumentation, capture which source lines each spec
hits, and use that as ground truth. We don't do this in 0.2. It
requires running tests, which Terrain explicitly does not do (see
[`docs/product/ai-trust-boundary.md`](ai-trust-boundary.md)).
Adopters who want coverage-grade e2e attribution should run their
e2e suite with `--coverage` and feed the resulting LCOV / Istanbul
artifact through Terrain's coverage ingestion path. Terrain will
read it. Terrain will not produce it.

### URL-to-route mapping

A web app's e2e spec navigates to `/login`. The route handler is
in `src/routes/auth.ts:loginHandler`. Linking the two requires
parsing the framework's router config (Express, Next.js, FastAPI,
Rails) and matching URL patterns to handler functions. We don't
do this in 0.2. The router-parsing surface area is large
(every framework has a different shape) and the precision /
recall trade-offs are not yet measured.

### DOM-selector to component mapping

A Playwright spec interacts with `page.locator('[data-testid="login-button"]')`.
The component that renders that button is `src/components/Login.tsx`.
Linking the two requires parsing the test for selectors, parsing
the source for component definitions, and matching them. We don't
do this in 0.2. Test selectors aren't standardized — `data-testid`,
`aria-label`, `id`, role-based, text-based — and matching them
correctly across rebuilt components is research-grade work.

### Cross-language e2e attribution

A Playwright spec written in TypeScript exercising a Go backend
service via HTTP. The backend handlers are in
`internal/handlers/users.go`. Cross-language linking is not in 0.2.
The import-graph crosses ecosystems imperfectly even for unit
tests; for e2e, where the link goes through a network boundary,
we explicitly do not attempt it.

## How this surfaces in output

When Terrain reports impact analysis on a change to source code:

```
terrain report impact --base main
```

The output now distinguishes attribution confidence per test:

```
Recommended Tests
src/auth/__tests__/login.test.ts [exact] (import-graph)
Covers: src/auth/login.ts:loginUser
e2e/auth/login.spec.ts [structural-only] (path co-location)
Covers: src/auth/login.ts (file-level, not symbol-level)
e2e/checkout.spec.ts [convention] (low confidence)
Reason: directory mapping suggests this exercises src/checkout/
```

The `[exact]` / `[structural-only]` / `[convention]` tag is the
attribution-confidence signal. `--explain-selection` (added in
Track 3.2) renders the full reason chain — important for e2e
because the reasons are looser than for unit tests and adopters
need to inspect them.

When `terrain report posture` is invoked, the analysis-completeness
signal flags repos where e2e attribution is the only source of
coverage for a code unit:

```
Posture
coverage_diversity: ELEVATED
e2e/checkout.spec.ts is the only test linked to src/checkout/cart.ts
— but the link is structural-only. Treat this as suggestive,
not as proof of coverage.
```

## Why we ship structural-only attribution at all

The alternative — emitting *no* link for e2e specs — is worse than
shipping low-confidence links that adopters can inspect. With no
link, `terrain report impact` would silently exclude e2e specs
from the recommended-tests set whenever a source file changed,
even when the e2e spec is the only test exercising that path. With
low-confidence links plus an honest `[structural-only]` tag,
adopters see the link and can decide: trust it for now, or run all
e2e specs as a precaution, or invest in coverage instrumentation.

The same principle drives every limit on this page: visible
imperfection beats invisible omission.

## 0.3 roadmap

Order of likely investment:

1. **Coverage-artifact ingestion for e2e** — read LCOV / Istanbul
produced by `playwright test --coverage` and use it as ground
truth, replacing the structural fallback whenever present.
2. **Router config parsing** — Express, Next.js, FastAPI, Rails.
Map URLs in test specs to handler functions in source.
3. **Selector-to-component mapping** — opt-in per repo via a
`.terrain/e2e-config.yaml` that declares the selector
convention used (e.g. `data-testid` only).

None of these are 0.2 work. None will block 0.2. The honest carve-
out documented here is the 0.2 contract.

## Related reading

- [`docs/product/test-type-classification.md`](test-type-classification.md)
— how Terrain decides a test is e2e in the first place
- [`docs/product/impact-analysis-model.md`](impact-analysis-model.md)
— full impact-analysis architecture
- [`docs/architecture/04-deterministic-test-identity.md`](../architecture/04-deterministic-test-identity.md)
— test-identity model that makes attribution stable across runs
184 changes: 184 additions & 0 deletions docs/product/test-type-classification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Test-type classification

How Terrain decides whether a test is a unit test, an integration
test, or an end-to-end test — and the explicit limits of that
classification in 0.2.

## Why this matters

The pitch claims Terrain "maps how your unit, integration, e2e, and
AI tests actually relate to your code." That promise depends on
classifying test files into those four categories accurately. The
launch-readiness review flagged classification as the weakest link
in that promise: the path/suite/framework heuristics worked well for
repos that organize tests by directory, but missed the common case
of integration tests living alongside unit tests in a flat layout
and identifying themselves only through HTTP-testing imports.

Track 3.3 of the 0.2 release plan addressed this gap. This page
documents what now ships and what remains explicitly out of scope
for 0.2.

## What 0.2 detects

Terrain runs three classification passes on each test file, in
order, and merges the results.

### Pass 1 — path / framework / suite name (metadata)

The original heuristic, retained without change:

- Path components: `e2e/`, `integration/`, `unit/`, `__tests__/`,
`smoke/`, `component/`
- File name patterns: `.e2e.`, `.integration.`, `.cy.{js,ts}`
(Cypress), `.spec.{js,ts}` (ambiguous, low-confidence unit)
- Framework hints: `playwright`, `cypress`, `puppeteer`,
`webdriverio`, `testcafe` → e2e; `jest`, `vitest`, `mocha`,
`pytest`, `junit*` → unit (low confidence — these run integration
tests too)
- Suite hierarchy names containing "Integration", "E2E", "Smoke"

Confidence ranges from 0.4 (ambiguous `.spec` extension) to 0.9
(explicit e2e framework).

### Pass 2 — content-based integration libraries (new in 0.2)

Terrain reads each test file once and checks for explicit imports
of HTTP-testing or contract-testing libraries that strongly signal
integration testing:

| Ecosystem | Libraries detected | Confidence |
|-----------|-------------------|------------|
| JS / TS | `supertest`, `nock`, `msw`, `pactum`, `testcontainers` | 0.85–0.9 |
| Go | `net/http/httptest` | 0.9 |
| Python | `requests` (call sites), `httpx`, `responses`, `pact` | 0.85–0.9 |
| Java | `MockMvc`, `RestAssured` | 0.9 |
| Ruby | `rack/test`, `webmock` | 0.85–0.9 |
| Tooling | `dredd`, `testcontainers` | 0.85–0.9 |

A match promotes the test from whatever Pass 1 said to
`integration` with the matched library cited in evidence. When
Pass 1 disagrees (e.g. path says `unit/`), the content-based signal
wins because explicit imports are harder to fake than directory
naming — but the conflict is preserved in evidence so downstream
consumers can see it.

The pattern allowlist lives in
`internal/testtype/integration_imports.go`. Adding a library is the
documented extension point.

### Pass 3 — e2e attribution (structural, see limits below)

E2E tests don't normally import the source units they exercise —
they hit a running browser or HTTP boundary. Terrain attributes e2e
tests to code units only via *structural* signals: shared fixture
paths, file-co-location heuristics, declared associations in
playwright/cypress configs. This is honestly weaker than the
import-graph attribution unit and integration tests get. See
[`docs/product/e2e-attribution.md`](e2e-attribution.md) for the
full carve-out.

## What 0.2 does NOT classify

These cases are deliberately out of scope; document them up front
so adopters don't infer a guarantee that isn't there.

### Property-based tests as a separate category

Property-based tests (`fast-check`, `hypothesis`, `quickcheck`) are
classified as `unit` today even though they're a meaningfully
different shape. Adding `property-based` as a first-class type is
0.3 work — the framework type enum already has the slot
(`FrameworkTypePropertyBased`); the inference rules don't.

### Contract tests vs. integration tests

Pact and Dredd both surface as `integration` today even though
contract testing is a distinct discipline. Splitting them would
require reading the specific contract artifact (pact JSON,
OpenAPI spec) — 0.3 work.

### Mutation tests

`stryker`, `mutmut`, `mutpy` outputs are not classified at all
today. They aren't really tests in the same shape — they exercise
existing tests against mutated source. Out of scope for 0.2's
classification model.

### AI evals as a fourth pillar in classification

AI eval scenarios are tracked separately via the AI surface
inventory and eval ingestion path (`terrain ai list`,
`terrain ai run`); they aren't merged into the unit / integration /
e2e classification because they exercise a different surface
(prompts and tools, not code units). The PR comment surfaces both
sides as adjacent stanzas — see
[`docs/product/unified-pr-comment.md`](unified-pr-comment.md).

## Confidence and conflict reporting

Every classified test case carries:

- `testType` — `unit` / `integration` / `e2e` / `component` /
`smoke` / `unknown`
- `testTypeConfidence` — `[0.0, 1.0]`; values below `0.5` indicate
the inference disagrees with itself or has only weak signals
- `testTypeEvidence` — list of strings citing what fired

When `--explain-selection` (or `terrain explain`) is invoked, this
evidence is rendered alongside the test in the reason chain. False
positives in classification are visible by inspection rather than
hidden inside a black-box label.

## Known false positives

These are the cases adopters are most likely to hit. They're not
bugs we plan to fix in 0.2 — they're the trade-offs of conservative
heuristics. Suppress per-file with a `.terrain/suppressions.yaml`
entry if needed.

- **Unit tests that import `nock` to mock outbound HTTP** — Terrain
classifies as integration. The import alone signals "this test
cares about HTTP," which is the integration-shaped concern; if
the test is conceptually unit, the path/suite name overrides the
content signal only when path/suite are highly confident.
- **Python unit tests that import `requests` for type hints only** —
Pattern requires a call-site (`requests.get(`, etc.), not a bare
import, so this should not over-fire. Report a false positive if
it does.
- **Go unit tests in the same package as integration tests** — If
the package has *any* file that imports `net/http/httptest`, that
file is classified integration; sibling unit tests in the same
package are not affected unless they too import httptest.

## How to extend integration-library detection

If your stack uses a library not on the allowlist, the extension
shape is small:

1. Open `internal/testtype/integration_imports.go`.
2. Add an `integrationImportPattern` entry: substring (with quote /
paren context to avoid matching prose), library name,
confidence (0.85 default; 0.9 for libraries that are
integration-only).
3. Add at least one test in `integration_imports_test.go` that
exercises the new pattern and at least one negative case
(prose mention should not match).
4. Run `make calibrate` to ensure the addition doesn't shift any
existing fixture's classification unexpectedly.

The bar for adding a pattern: the library should be either
purpose-built for integration testing (supertest, httptest) or its
presence in a test file should overwhelmingly indicate the test
crosses a real HTTP / database boundary. Conservative is better
than aggressive — false-positive integration claims distort the
test-system inventory more than false negatives do.

## Related reading

- [`internal/testtype/integration_imports.go`](../../internal/testtype/integration_imports.go)
— the pattern allowlist
- [`docs/product/e2e-attribution.md`](e2e-attribution.md) —
honest carve-out for e2e-to-code-unit linking
- [`docs/release/feature-status.md`](../release/feature-status.md) —
which capabilities are publicly claimable
Loading
Loading