Skip to content

feat: backend API test harness — 36 tests across 6 trust boundaries#141

Open
MichaelCordner wants to merge 1 commit intomainfrom
feat/backend-test-harness
Open

feat: backend API test harness — 36 tests across 6 trust boundaries#141
MichaelCordner wants to merge 1 commit intomainfrom
feat/backend-test-harness

Conversation

@MichaelCordner
Copy link
Copy Markdown
Contributor

@MichaelCordner MichaelCordner commented Apr 1, 2026

Summary

Adds the first backend test suite for Jentic Mini — a pytest-based test harness that verifies the API's trust boundaries at the HTTP level.

Testing strategy: harness engineering

This is not a traditional unit test suite. Jentic Mini is a harness — it constrains AI agents by injecting credentials they never see, enforcing policies they can't bypass, and tracing every API call they make. The tests verify that the harness itself is trustworthy.

The approach follows harness engineering principles:

  • Test at the HTTP boundary, not internal functions. Every test sends a real HTTP request and asserts on the response. This makes the tests portable — they verify the API contract regardless of the underlying implementation.
  • Real database, no mocks. Each test run creates a temp SQLite DB, runs real Alembic migrations, and exercises real encryption. The only thing skipped is network-dependent startup (catalog refresh, BM25 index, self-registration).
  • Organized by trust boundary, not by source file. Each test file covers one security perimeter: auth, policy, vault, broker, etc.
  • Invariant tests over coverage metrics. We don't aim for line coverage — we aim for "every trust boundary has a test that would catch a violation."

What's covered (36 tests, 6 files)

File Trust boundary Key assertions
test_health_and_meta.py API contract /health and /version return correct shapes, version string present, telemetry opt-out works
test_auth_boundary.py Auth perimeter 401 without key, 401 with bogus key, agent key works for search, agents blocked from credential writes, human session accesses protected endpoints, public paths need no auth
test_policy_engine.py Policy enforcement System safety rules deny all writes by default, deny sensitive paths, agent allow rules override system deny (first-match-wins), method and path matching semantics
test_credential_vault.py Credential isolation Write-only invariant: credential values are never returned on GET (single, list, or after create). Values encrypted and stored correctly, api_id binding works
test_toolkit_lifecycle.py Toolkit management Default toolkit exists, list returns counts (regression #60), key creation and listing, credential count matches total for default toolkit
test_broker_contracts.py Broker routing Dot-in-host requirement, error response on unknown host, unauthenticated passthrough behavior

What's NOT yet covered

This is a starting point. The following trust boundaries need tests in follow-up PRs:

  • Broker credential injection — verifying that the right credential gets injected for the right host/toolkit combination (requires upstream mock or simulate mode)
  • Broker fail-closed behavior — verifying that exceptions during credential resolution or policy check result in denial, not passthrough (relates to Security: Broker catch-all proxies unregistered operations with credentials, bypassing RBAC policies #95 phase 1)
  • Policy enforcement at the broker level — API-level tests where a credentialled broker call is denied by policy rules (vs the pure-function tests we have now)
  • IP allowlisting — per-key CIDR restriction enforcement
  • Key revocation — revoked keys immediately rejected
  • Credential update/delete — PATCH and DELETE contract tests
  • Toolkit CRUD — create custom toolkit, bind credentials, delete
  • Access request flow — agent requests permission, human approves, policy applied
  • Workflow execution — Arazzo workflow dispatch through the broker
  • Search ranking — BM25 results are relevant and ordered correctly
  • Rate limiting — when implemented

Infrastructure

  • tests/conftest.py — temp SQLite DB, minimal test lifespan (migrations only), admin session + agent key fixtures
  • requirements.txt — adds pytest>=8.0,<9 and pytest-asyncio>=0.24,<1
  • .github/workflows/ci-backend.yml — new CI job, path-filtered to src/, tests/, alembic/, requirements.txt

How to run

pip install -r requirements.txt
python -m pytest tests/ -v

36 tests, 0.63 seconds, zero network calls.

Test plan

  • All 36 tests pass locally
  • CI backend-tests job passes on this PR
  • Docker build still works (test deps don't affect production image)

Adds a pytest-based test harness that verifies Jentic Mini's API
contracts at the HTTP boundary. Uses a real temp SQLite DB with real
Alembic migrations — no mocking of the database or vault.

Test files organized by trust boundary:
- test_health_and_meta: /health and /version contract shapes
- test_auth_boundary: 401/403 perimeter, agent vs human session
- test_policy_engine: _check_policy pure function + system safety rules
- test_credential_vault: write-only invariant (values never returned)
- test_toolkit_lifecycle: CRUD, key management, credential counts (#60)
- test_broker_contracts: dot-in-host routing, error responses

Infrastructure:
- tests/conftest.py: temp DB, minimal test lifespan, auth fixtures
- pytest + pytest-asyncio added to requirements.txt
- ci-backend.yml: runs on src/tests/alembic changes, path-filtered

36 tests, 0.63 seconds, zero network calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant