Skip to content

feat: API key schema isolation — database-level tenant separation#855

Open
salvormallow wants to merge 1 commit intovectorize-io:mainfrom
salvormallow:feat/bank-scoped-access-control
Open

feat: API key schema isolation — database-level tenant separation#855
salvormallow wants to merge 1 commit intovectorize-io:mainfrom
salvormallow:feat/bank-scoped-access-control

Conversation

@salvormallow
Copy link
Copy Markdown

Summary

Adds ApiKeySchemaTenantExtension — a built-in tenant extension that maps API keys to isolated PostgreSQL schemas, providing database-level memory isolation between tenants. Follows the same pattern as SupabaseTenantExtension but uses static API key mapping instead of JWT auth.

Threat model: prompt injection against AI agents

AI agents execute tool calls — including Hindsight recall, retain, and reflect — based on conversation content. A prompt injection delivered via chat message, email, or web search result can trick an agent into querying another tenant's memory banks.

Example attack:

  1. Attacker sends a message to Agent A containing a crafted prompt injection
  2. The injection tricks Agent A into calling hindsight recall --bank tenant-b-bank --query "private data"
  3. Without schema isolation, this succeeds — both banks are in the same schema, and the single API key grants access to everything
  4. Agent A returns Tenant B's private data in its response

Why application-layer bank filtering isn't enough:

RequestContext.allowed_bank_ids exists on the model but is not enforced by the engine. An OperationValidatorExtension could check it, but:

  • Requires configuring two extensions correctly (tenant + validator) — configuring only one gives a false sense of security
  • Fail-open: if allowed_bank_ids is None (the default), all access is granted
  • Internal operations skip tenant auth, so allowed_bank_ids is never set for background tasks
  • A single missed code path in the engine bypasses the check entirely

Why schema isolation works:

The API key determines the PostgreSQL schema at authentication time, before any bank lookup or query executes. The SQL itself is scoped via fully-qualified table names. Even a fully compromised agent can only access banks within its assigned schema. Banks from other schemas don't exist in its view of the database.

Attacker → prompt injection → Agent A → hindsight recall --bank tenant_b_bank
                                              ↓
                                API key resolves to schema "tenant_a"
                                              ↓
                                tenant_b_bank doesn't exist in this schema
                                              ↓
                                empty results → attack fails

How it works

  1. Operator configures key-to-schema mapping via environment variable
  2. Each request authenticates by API key → resolves to a dedicated PostgreSQL schema
  3. All database operations are scoped to that schema
  4. Schemas are auto-created with full table migrations on first access (same as SupabaseTenantExtension)
  5. No separate validator extension needed — isolation is in the database

Configuration

HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.bank_scoped_tenant:ApiKeySchemaTenantExtension
HINDSIGHT_API_TENANT_KEY_MAP=team_a_key:team_a;team_b_key:team_b

# Optional: prefix all schema names
HINDSIGHT_API_TENANT_SCHEMA_PREFIX=hs    # team_a becomes hs_team_a

# Optional: disable auth for MCP endpoints (falls back to default schema)
HINDSIGHT_API_TENANT_MCP_AUTH_DISABLED=true

Design decisions

Opt-in, zero breaking changes. If HINDSIGHT_API_TENANT_EXTENSION is not set, Hindsight uses DefaultTenantExtension — identical to current behavior. Existing deployments are unaffected.

One key = one schema. Each API key maps to exactly one PostgreSQL schema. A single key cannot access multiple schemas. This is intentional: one key = one blast radius. The TenantContext returns a single schema_name, and the engine scopes all queries to it. Cross-schema queries are not possible without direct Postgres access.

Admin access. There is no "superuser key" that spans all schemas. Operators who need cross-tenant visibility should query Postgres directly or use separate keys per schema. This is a conscious trade-off: admin convenience vs. the guarantee that no single compromised key grants access to all tenants.

MCP auth disabled = default schema only. When mcp_auth_disabled=true, MCP requests fall back to the default schema (from HINDSIGHT_API_DATABASE_SCHEMA), not a tenant schema.

Schema name validation. Schema names must be valid Postgres identifiers (letters, digits, underscores). Hyphens, spaces, and names starting with digits are rejected at startup.

Why not allowed_bank_ids + OperationValidatorExtension? See threat model above. Application-layer checks are defense-in-depth, not a security boundary. Schema isolation moves the enforcement into the database where it can't be bypassed by missed code paths.

Files changed

File Description
hindsight-api-slim/.../builtin/bank_scoped_tenant.py ApiKeySchemaTenantExtension (~170 lines)
hindsight-api-slim/tests/test_bank_scoped.py 20 unit tests + prompt injection defense tests

Test plan

  • 20 unit tests passing inside Hindsight 0.4.22 container (Python 3.11, pytest 9.0)
  • Integration tested against live Hindsight with real memory banks
  • Verified: schema auto-creation via run_migration works on first authenticated request
  • Verified: key A can retain and recall memories in its own schema
  • Verified: key A gets empty results when querying banks in key B's schema (bank doesn't exist)
  • Verified: key A cannot see data in the public schema (existing deployment data)
  • Verified: admin key mapped to public schema reads existing data correctly
  • Verified: unknown keys rejected with 401
  • Verified: existing deployments unaffected (no env var = no change)
  • Verified: production data intact after test (stop/start cycle preserved all memories)

Adds ApiKeySchemaTenantExtension: maps API keys to isolated PostgreSQL
schemas, providing database-level memory isolation between tenants.

Threat model: prompt injection against AI agents. Agents execute tool
calls based on conversation content. A prompt injection can trick an
agent into querying another tenant's banks. Schema isolation scopes all
SQL to the authenticated schema — banks from other schemas don't exist.

Configuration:
  HINDSIGHT_API_TENANT_EXTENSION=...bank_scoped_tenant:ApiKeySchemaTenantExtension
  HINDSIGHT_API_TENANT_KEY_MAP=key_a:schema_a;key_b:schema_b

Follows the SupabaseTenantExtension pattern. Opt-in, zero breaking
changes. Includes 20 tests.
@salvormallow
Copy link
Copy Markdown
Author

Dashboard caveat

When bank_scoped_tenant is active, the control plane dashboard shows no banks because it calls the dataplane API without an API key.

Root cause: hindsight-client.ts reads HINDSIGHT_CP_DATAPLANE_API_KEY at startup. Without it, every SDK call returns Authentication failed: Missing API key. The dashboard's /api/banks route catches this and returns {"error":"Failed to fetch banks from API"}.

Workaround: Set HINDSIGHT_CP_DATAPLANE_API_KEY to one of the tenant API keys from your HINDSIGHT_API_TENANT_KEY_MAP. The dashboard will show that tenant's banks. Example:

HINDSIGHT_CP_DATAPLANE_API_KEY=<one-of-your-tenant-keys>

Longer-term: The dashboard should support multi-tenant awareness — a tenant selector that switches which API key is used for dataplane calls. I'm working on a follow-up PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant