From 9df20b8c04f658311a05225f94768fbd6a6fb8f9 Mon Sep 17 00:00:00 2001 From: Mark Turansky Date: Fri, 24 Apr 2026 23:44:29 +0000 Subject: [PATCH] docs(iam): add IAM architecture map and consolidation plan Current-state map of all tokens, credentials, service accounts, and auth flows across frontend, backend, operator, control plane, and ambient-api-server. Consolidation plan covering three improvements: 1. Unify identity around RH SSO (token exchange for runners, Keycloak clients for access keys, elimination of RSA exchange hack) 2. DB RBAC as source of truth with K8s reconciliation (Option A) 3. Extend credentials table to replace scattered K8s OAuth secrets Co-Authored-By: Claude Sonnet 4.6 --- .../internal/architecture/iam-architecture.md | 382 ++++++++++++++++ .../proposals/iam-consolidation-plan.md | 409 ++++++++++++++++++ 2 files changed, 791 insertions(+) create mode 100644 docs/internal/architecture/iam-architecture.md create mode 100644 docs/internal/proposals/iam-consolidation-plan.md diff --git a/docs/internal/architecture/iam-architecture.md b/docs/internal/architecture/iam-architecture.md new file mode 100644 index 000000000..c98bd738e --- /dev/null +++ b/docs/internal/architecture/iam-architecture.md @@ -0,0 +1,382 @@ +# Ambient Platform — Full IAM Architecture + +## THE BIG PICTURE (End-to-End Flow) + +``` +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ IDENTITY ENTRY POINTS ║ +╠══════════════════╦════════════════════════╦═════════════════╦════════════════════════╣ +║ HUMAN (Browser) ║ CLI / SDK USER ║ BOT / API KEY ║ SERVICE (internal) ║ +║ ║ ║ ║ ║ +║ RH SSO / OCP ║ oc whoami -t ║ K8s SA JWT ║ OIDC client creds ║ +║ OAuth login ║ sha256~... token ║ (ambient-key-*)║ or AMBIENT_API_TOKEN ║ +╚══════╦═══════════╩═══════════╦════════════╩════════╦════════╩═══════════╦════════════╝ + │ │ │ │ + ▼ │ │ │ +╔══════════════╗ │ │ │ +║ OAuth Proxy ║ │ │ │ +║ (sidecar) ║ │ │ │ +║ ║ │ │ │ +║ Validates ║ │ │ │ +║ OCP token ║ │ │ │ +║ ║ │ │ │ +║ Injects: ║ │ │ │ +║ X-Forwarded-║ │ │ │ +║ User ║ │ │ │ +║ Email ║ │ │ │ +║ Groups ║ │ │ │ +║ Access- ║ │ │ │ +║ Token ║ │ │ │ +╚══════╦═══════╝ │ │ │ + │ │ │ │ + ▼ ▼ │ │ +╔══════════════════════════════════════════╗ │ │ +║ NEXT.JS FRONTEND ║ │ │ +║ (components/frontend) ║ │ │ +║ ║ │ │ +║ buildForwardHeadersAsync() ║ │ │ +║ ┌──────────────────────────────────┐ ║ │ │ +║ │ Reads incoming headers │ ║ │ │ +║ │ Passes through X-Forwarded-* │ ║ │ │ +║ │ Sets BOTH: │ ║ │ │ +║ │ Authorization: Bearer │ ║ │ │ +║ │ X-Forwarded-Access-Token: ... │ ║ │ │ +║ └──────────────────────────────────┘ ║ │ │ +║ ║ │ │ +║ /api/projects/[name]/* → proxy → ║ │ │ +╚══════════════════╦═══════════════════════╝ │ │ + │ │ │ + └───────────────────────────────────┘ │ + │ │ + ▼ ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ BACKEND API SERVER ║ +║ (components/backend) SA: backend-api ║ +║ ║ +║ MIDDLEWARE CHAIN (every request): ║ +║ 1. Logger (redacts tokens from logs) ║ +║ 2. forwardedIdentityMiddleware() ║ +║ ├─ X-Forwarded-User → ctx["userID"] ║ +║ ├─ X-Forwarded-Email → ctx["userEmail"] ║ +║ ├─ X-Forwarded-Groups → ctx["userGroups"] ║ +║ ├─ X-Forwarded-Access-Token → ctx["forwardedAccessToken"] ◄── PRIORITY 1 ║ +║ ├─ Authorization: Bearer → ctx["authorizationHeader"] ◄── PRIORITY 2 ║ +║ └─ if no X-Forwarded-User: resolveServiceAccountFromToken() ◄── BOT/API KEY path ║ +║ └─ TokenReview API → extracts SA name → reads annotation for userID ║ +║ 3. CORS ║ +║ 4. ValidateProjectContext() (per /api/projects/:name group) ║ +║ ├─ extractRequestToken() (X-Fwd-Access-Token > Bearer > ?token=) ║ +║ ├─ GetK8sClientsForRequest() → user-scoped K8s client ║ +║ ├─ check globalSSARCache (30s TTL, SHA256 keyed) ║ +║ └─ SelfSubjectAccessReview: can user LIST agenticsessions in namespace? ║ +║ └─ 403 if denied, cache result ║ +║ ║ +║ TOKEN TYPES VALIDATED: ║ +║ • sha256~... (OCP tokens) SSAR hit → K8s validates against cluster ║ +║ • eyJ... (K8s SA JWT) SSAR hit → K8s validates against cluster ║ +║ • ghp_... (GitHub PAT) SSAR hit → K8s validates against cluster ║ +║ • generic 20+ chars (bearer) SSAR hit → K8s validates against cluster ║ +║ ║ +║ TWO K8s CLIENTS: ║ +║ • K8sClient / DynamicClient (SA: backend-api) → privileged writes after RBAC check ║ +║ • reqK8s / reqDyn (user token) → SSAR checks, list operations ║ +╚═══════╦════════════════════════╦═════════════════════════════════╦════════════════════╝ + │ │ │ + │ create/update │ credentials stored │ OAuth token exchange + │ AgenticSession CR │ in K8s Secrets │ GitHub, Google, etc. + ▼ ▼ ▼ +╔══════════════╗ ╔═════════════════════════╗ ╔═══════════════════════════════╗ +║ KUBERNETES ║ ║ K8s SECRETS ║ ║ EXTERNAL OAuth PROVIDERS ║ +║ API SERVER ║ ║ ║ ║ ║ +║ ║ ║ gitlab-user-tokens ║ ║ GitHub App → JWT minted ║ +║ Validates ║ ║ oauth-callbacks ║ ║ GitHub PAT → stored ║ +║ every token ║ ║ {session}-google-oauth ║ ║ Google Drive → refresh tok ║ +║ against ║ ║ google-creds-{userID} ║ ║ GitLab PAT → stored ║ +║ cluster JWKS ║ ║ jira-creds-{userID} ║ ║ Jira token → stored ║ +║ ║ ║ gerrit-creds-{userID} ║ ║ Gerrit HTTP → stored ║ +║ Enforces ║ ║ coderabbit-creds-{uid} ║ ║ Coderabbit → stored ║ +║ RBAC at K8s ║ ║ ambient-runner-token-* ║ ╚═══════════════════════════════╝ +║ level ║ ║ ambient-cp-token-keypair║ +╚═══════╦═══════╝ ╚═════════════════════════╝ + │ + │ (operator watches AgenticSession CRs) + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ KUBERNETES OPERATOR ║ +║ (components/operator) SA: agentic-operator ║ +║ ║ +║ On new AgenticSession CR: ║ +║ 1. Create ServiceAccount: ambient-session- ║ +║ 2. Create Role + RoleBinding (least-privilege for runner) ║ +║ 3. Mint token: K8sClient.ServiceAccounts(ns).CreateToken(sa, ...) ← JWT ~1hr ║ +║ 4. Store in Secret: ambient-runner-token- key: k8s-token ║ +║ 5. Create Job/Pod with: ║ +║ • volumeMount: /var/run/secrets/ambient/bot-token (from above secret) ║ +║ • NetworkPolicy: ingress only from backend ║ +║ 6. Every 45min: regenerate token, update secret (kubelet refreshes mount) ║ +║ ║ +║ On StopSession: delete Pod, Secret, RoleBinding, SA ║ +╚═══════════════════════════════════╦════════════════════════════════════════════════════╝ + │ /var/run/secrets/ambient/bot-token (JWT) + │ CP_TOKEN_URL env var + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ RUNNER POD ║ +║ SA: ambient-session- ║ +║ ║ +║ Two auth paths: ║ +║ ║ +║ PATH A: Call Backend API ║ +║ Authorization: Bearer ║ +║ → Backend validates via SSAR → handler executes ║ +║ ║ +║ PATH B: Call Control Plane for fresh API token ║ +║ POST /token ║ +║ Authorization: Bearer ║ +║ → Control plane decrypts with RSA private key ║ +║ → Returns OIDC/static API token for downstream calls ║ +╚═══════════════════════════════════╦════════════════════════════════════════════════════╝ + │ (calls for API token) + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ AMBIENT CONTROL PLANE ║ +║ (components/ambient-control-plane) SA: ambient-control-plane ║ +║ ║ +║ Token Server (RSA-based exchange): ║ +║ • Keypair stored in Secret: ambient-cp-token-keypair ║ +║ • Receives: Bearer ║ +║ • Decrypts session ID → validates → returns API token ║ +║ ║ +║ Outbound auth to API server: ║ +║ Either: ║ +║ • StaticTokenProvider: reads AMBIENT_API_TOKEN env var ║ +║ • OIDCTokenProvider: client_credentials flow to RH SSO ║ +║ OIDC_TOKEN_URL = https://sso.redhat.com/auth/realms/redhat-external/... ║ +║ OIDC_CLIENT_ID + OIDC_CLIENT_SECRET ║ +║ Caches token with 30s refresh buffer ║ +╚═══════════════════════════════════╦════════════════════════════════════════════════════╝ + │ + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ AMBIENT API SERVER (Database-backed RBAC) ║ +║ (components/ambient-api-server) ║ +║ ║ +║ Authentication: ║ +║ 1. ForwardedAccessToken middleware: X-Forwarded-Access-Token → Authorization header ║ +║ 2. JWT validation: signature verified against RH SSO JWKS ║ +║ • Dev: secrets/kind-jwks.json (local file) ║ +║ • Prod: https://sso.redhat.com/.../openid-connect/certs ║ +║ 3. gRPC: AMBIENT_API_TOKEN (static) or GRPC_SERVICE_ACCOUNT (JWT username match) ║ +║ ║ +║ Authorization (DB RBAC): ║ +║ DBAuthorizationMiddleware → queries PostgreSQL ║ +║ role_bindings → roles → permissions (resource:action JSON array) ║ +║ ║ +║ PostgreSQL Schema: ║ +║ users: id, username, name, email ║ +║ roles: id, name, permissions (JSON ["session:read", "credential:token", ...]) ║ +║ role_bindings: user_id, role_id, scope (platform|project), scope_id ║ +║ credentials: id, project_id, name, provider, token*, url, email ║ +║ *token stored plaintext — no DB-level encryption ║ +║ ║ +║ Built-in roles: ║ +║ platform:admin ["*:*"] ║ +║ platform:viewer [read-only subset] ║ +║ project:owner ["project:*", "agent:*", "session:*", ...] ║ +║ project:editor [create/update, not delete project] ║ +║ project:viewer [read-only] ║ +║ agent:runner [runtime identity for agent pods] ║ +║ credential:token-reader ["credential:token"] ║ +╚══════════════════════════════════════════════════════════════════════════════════════╝ +``` + +--- + +## ALL SERVICE ACCOUNTS + +### System Service Accounts (static, in manifests) + +| SA Name | Namespace | ClusterRole | Key Permissions | +|---|---|---|---| +| `agentic-operator` | ambient-code | `agentic-operator` | Create/delete pods, jobs, PVCs, SAs, RoleBindings; mint SA tokens (`serviceaccounts/token`) | +| `ambient-control-plane` | ambient-code | `ambient-control-plane` | Manage projects/namespaces/RBAC | +| `backend-api` | ambient-code | `backend-api` | Create/update AgenticSessions, mint tokens, manage access keys | +| `frontend` | ambient-code | `ambient-frontend-auth` | TokenReview, authz checks | +| `ambient-backend` | ambient-system | `ambient-backend-cluster-role` | Legacy/backup backend | + +### Dynamic Service Accounts (created at runtime) + +| SA Name Pattern | Created By | Purpose | Bound Role | +|---|---|---|---| +| `ambient-session-` | Operator | Runner pod identity | Least-privilege project Role (read ConfigMaps, etc.) | +| `ambient-key--` | Backend | User API access keys | `ambient-project-admin/edit/view` (user's choice) | + +--- + +## ALL TOKEN TYPES IN THE SYSTEM + +### User-Facing Tokens + +| Token | Format | Source | Used For | Validated By | +|---|---|---|---|---| +| OCP/RH SSO bearer | `sha256~...` | `oc whoami -t` or browser login | All user API calls | K8s API server (SSAR) | +| K8s SA JWT | `eyJ...` | TokenRequest API | Access keys, runner pods | K8s API server (SSAR) | +| GitHub PAT | `ghp_...` | User creates in GitHub | Git operations | GitHub API | +| Generic bearer | 20+ chars | Various | SDK/CLI access | K8s SSAR | + +### System Tokens + +| Token | Source | Used For | Lifetime | +|---|---|---|---| +| Runner pod JWT | Operator → TokenRequest on `ambient-session-*` SA | Runner → Backend auth | ~1hr, refreshed every 45min | +| Access key JWT | Backend → TokenRequest on `ambient-key-*` SA | CI/CD → Backend auth | User-specified, max 1yr | +| Control plane OIDC token | RH SSO client_credentials flow | Control plane → API server | Short-lived, auto-refreshed (30s buffer) | +| Control plane static token | `AMBIENT_API_TOKEN` env var | Dev/simple deployments | Static (until rotated) | +| GitHub App installation token | Backend mints via GitHub App JWT | Git clone in sessions | ~1hr (GitHub enforced) | + +### OAuth Integration Tokens + +| Token | Provider | Stored In | Keyed By | +|---|---|---|---| +| Google access + refresh token | Google OAuth | K8s Secret (backend ns) | userID | +| GitLab PAT | User provides | K8s Secret `gitlab-user-tokens` | userID | +| Jira API token | User provides | K8s Secret (backend ns) | userID | +| Gerrit HTTP/cookie | User provides | K8s Secret (backend ns) | userID | +| CodeRabbit API key | User provides | K8s Secret (backend ns) | userID | +| Session-specific OAuth creds | OAuth callback | K8s Secret `{session}-{provider}-oauth` | session-scoped | + +--- + +## ALL SECRETS IN THE SYSTEM + +| Secret Name | Namespace | Contents | Owner | +|---|---|---|---| +| `gitlab-user-tokens` | project | GitLab PATs keyed by userID | Backend writes, runner reads | +| `gitlab-connections` | project | GitLab connection metadata | Backend | +| `oauth-callbacks` | backend | Temporary OAuth state (UUID keyed) | Backend (TTL) | +| `{session}-{provider}-oauth` | project | Session-scoped OAuth creds | Backend (GC via OwnerRef) | +| `google-creds-{hash}` | backend | Google OAuth access+refresh token | Backend | +| `jira-creds-{hash}` | backend | Jira URL + email + token | Backend | +| `gerrit-creds-{hash}` | backend | Gerrit instance credentials | Backend | +| `coderabbit-creds-{hash}` | backend | CodeRabbit API key | Backend | +| `ambient-runner-token-{session}` | project | Runner pod K8s JWT (`k8s-token` key) | Operator creates, pod mounts | +| `ambient-cp-token-keypair` | ambient-code | RSA-4096 pub+priv key for runner↔CP auth | Control plane | +| Access key SA token secrets | project | (managed by K8s) | K8s auto-manages for SA JWTs | + +--- + +## HOW THE TOKEN HEADERS FLOW + +``` +Browser/User + │ + │ (browser session cookie, managed by OAuth proxy) + ▼ +OAuth Proxy sidecar + │ + │ Adds to every proxied request: + │ X-Forwarded-User: alice + │ X-Forwarded-Email: alice@example.com + │ X-Forwarded-Groups: platform-admins,dev-team + │ X-Forwarded-Access-Token: sha256~ + ▼ +Next.js Frontend API Route + │ + │ buildForwardHeadersAsync() extracts all X-Forwarded-* headers + │ Sets BOTH on backend call: + │ X-Forwarded-Access-Token: sha256~ + │ Authorization: Bearer sha256~ + │ Also forwards: X-Forwarded-User, Email, Groups + ▼ +Backend API Server + │ + │ forwardedIdentityMiddleware(): + │ ctx.userID ← X-Forwarded-User + │ ctx.userEmail ← X-Forwarded-Email + │ ctx.userGroups← X-Forwarded-Groups + │ ctx.token ← X-Forwarded-Access-Token (priority 1) + │ Authorization: Bearer (priority 2) + │ + │ ValidateProjectContext(): + │ GetK8sClientsForRequest(token) → user-scoped K8s client + │ SSAR: can user LIST agenticsessions in project namespace? + │ (cached 30s) + │ + │ Handler: + │ User token for READ operations (SSAR) + │ Backend SA for WRITE operations (after SSAR validates) + ▼ +Kubernetes API Server + (validates token signature against cluster JWKS) +``` + +--- + +## THE DUAL AUTHORIZATION MODEL + +``` +Every API request hits BOTH authorization layers: + +Layer 1: Kubernetes RBAC (components/backend) + ┌─────────────────────────────────────────────┐ + │ SelfSubjectAccessReview │ + │ "Can THIS TOKEN perform VERB on RESOURCE │ + │ in NAMESPACE?" │ + │ │ + │ Enforced by: K8s API server │ + │ ClusterRoles: ambient-project-admin/edit/view│ + │ Source of truth: K8s RBAC objects │ + └─────────────────────────────────────────────┘ + +Layer 2: Database RBAC (components/ambient-api-server) + ┌─────────────────────────────────────────────┐ + │ DBAuthorizationMiddleware │ + │ "Does this JWT username have a role_binding │ + │ granting RESOURCE:ACTION in this project?" │ + │ │ + │ Enforced by: PostgreSQL query │ + │ Roles: platform:admin, project:owner, etc. │ + │ Source of truth: PostgreSQL DB │ + └─────────────────────────────────────────────┘ + +These are INDEPENDENT systems. A user needs: +• K8s RBAC binding for backend operations +• DB role binding for ambient-api-server operations +``` + +--- + +## RH SSO / CLUSTER JWT — HOW THEY RELATE + +``` +Red Hat SSO (external OIDC provider) + URL: https://sso.redhat.com/auth/realms/redhat-external + │ + ├─► Issues user tokens for human login (browser OAuth flow) + │ → OAuth proxy validates these, injects X-Forwarded-* headers + │ + ├─► Issues service tokens for control plane (client_credentials flow) + │ OIDC_CLIENT_ID + OIDC_CLIENT_SECRET → AMBIENT_API_TOKEN equiv + │ + └─► JWKS endpoint used by ambient-api-server to verify JWT signatures + https://sso.redhat.com/.../openid-connect/certs + +OpenShift / Kubernetes API Server (cluster JWT issuer) + │ + ├─► Issues user tokens (sha256~...) — what you get from `oc whoami -t` + │ These are validated when backend does SSAR + │ + ├─► Issues SA tokens via TokenRequest API + │ Used for: runner pods, access keys + │ Operator mints these for runners + │ Backend mints these for user access keys + │ + └─► TokenReview API — validates any bearer token against cluster + Backend uses this to identify which SA called (for BOT_TOKEN path) + +Key distinction: + RH SSO tokens → validated by ambient-api-server (DB RBAC layer) + OCP/K8s tokens → validated by K8s API server via SSAR (K8s RBAC layer) + BOTH types accepted at backend — which layer you hit depends on which + component you're calling. +``` diff --git a/docs/internal/proposals/iam-consolidation-plan.md b/docs/internal/proposals/iam-consolidation-plan.md new file mode 100644 index 000000000..d38a3d0f5 --- /dev/null +++ b/docs/internal/proposals/iam-consolidation-plan.md @@ -0,0 +1,409 @@ +# Ambient IAM — Three Improvement Plans + +--- + +## 1. Consolidate Around RH SSO + +### The Goal + +One issuer. Every token in the system — user, runner, access key, service — comes from or is +validated by RH SSO. No RSA keypairs, no K8s SA minting loops, no two-step exchanges. + +### What Changes (by identity type) + +#### A. Human users — no change +Already through RH SSO OAuth proxy. OCP issues `sha256~` tokens; the proxy validates them and +injects `X-Forwarded-*` headers. The backend and ambient-api-server already validate JWTs against +the RH SSO JWKS endpoint. This path is already right. + +#### B. Access keys — replace K8s SAs with RH SSO service accounts + +**Current:** Backend creates `ambient-key--` K8s ServiceAccount, creates RoleBinding, +calls `TokenRequest` API, returns JWT. User stores the JWT and sends it as Bearer on every call. +Tracking is done via a `last-used-at` annotation. + +**Target:** Backend calls the Keycloak Admin REST API to create a **confidential client** (service +account) in RH SSO. Client credentials (`client_id` / `client_secret`) are returned to the user +once. User calls RH SSO token endpoint to get a short-lived OIDC access token, sends it as Bearer. + +- Revocation: delete the client in Keycloak → all future token requests fail immediately +- Role assignment: Keycloak client roles map to `project:admin/edit/view` +- Token introspection: any component can call `/introspect` to verify a key is still active +- No K8s SA objects, no K8s RoleBindings for access keys, no `TokenRequest` calls + +#### C. Runner pods — replace K8s SA + RSA exchange with OIDC Token Exchange (RFC 8693) + +**Current:** +1. Operator creates `ambient-session-` SA +2. Operator calls `TokenRequest` → stores JWT in Secret `ambient-runner-token-` +3. Pod mounts the Secret, sends JWT to backend +4. Pod also calls control plane `/token` with RSA-encrypted session ID +5. Control plane decrypts with RSA-4096 private key, returns OIDC token +6. Operator refreshes the K8s JWT every 45 minutes + +**Target:** +1. OCP automatically projects a short-lived K8s SA token into every pod at + `/var/run/secrets/kubernetes.io/serviceaccount/token` (standard, no setup needed) +2. On startup, runner calls RH SSO token exchange endpoint: + ``` + POST /auth/realms/redhat-external/protocol/openid-connect/token + grant_type=urn:ietf:params:oauth:grant-type:token-exchange + subject_token= + subject_token_type=urn:ietf:params:oauth:token-type:jwt + client_id=ambient-runner-exchange + client_secret= + requested_token_type=urn:ietf:params:oauth:token-type:access_token + audience=ambient-platform + ``` +3. RH SSO validates the K8s JWT against the cluster JWKS, issues a scoped OIDC token with custom + claims: `session_id`, `project`, `role=agent:runner` +4. Runner uses this OIDC token for all downstream API calls (backend, ambient-api-server) +5. Token expiry is handled by standard OIDC refresh (no operator refresh loop needed) + +The entire control plane token server, RSA keypair bootstrap, and the 45-minute refresh loop in +the operator go away. + +#### D. Service-to-service (control plane, backend SA) — already right or align + +- Control plane already uses OIDC client credentials ✓ +- Backend SA (`backend-api`) should get its own Keycloak confidential client and use client + credentials for outbound calls to ambient-api-server (currently uses in-cluster SA token) + +--- + +### RH SSO: What Needs to Be Registered + +#### Clients (Confidential, Service Account enabled) + +| Client ID | Grant Type | Purpose | Roles Needed | +|---|---|---|---| +| `ambient-control-plane` | client_credentials | Already exists. Control plane → API server | `platform:admin` or equivalent | +| `ambient-backend` | client_credentials | Backend → ambient-api-server auth | `platform:admin` or equivalent | +| `ambient-runner-exchange` | token_exchange | Accept K8s JWT, issue scoped runner token | `token-exchange` permission on realm | +| `ambient-key-manager` | client_credentials | Keycloak Admin API — create/delete access key clients | `manage-clients`, `view-clients` realm roles | +| `ambient-key--` (dynamic) | client_credentials | Per user-created access key | Project-scoped role (admin/edit/view) | + +#### Realm Configuration + +| Setting | Value | Why | +|---|---|---| +| Token Exchange feature | Enabled | Required for RFC 8693 runner flow | +| K8s cluster as Identity Provider | Add cluster OIDC endpoint | So RH SSO can validate K8s-issued JWTs | +| Cluster JWKS URL | `https://api.:6443/openid/v1/jwks` | RH SSO fetches this to verify runner tokens | +| Client roles | `project:admin`, `project:editor`, `project:viewer` | Assigned to access key clients | +| Custom claim mapper | `session_id`, `project`, `role` on runner tokens | Downstream components read these claims | + +#### K8s Secrets Required (in `ambient-code` namespace) + +| Secret Name | Keys | Purpose | +|---|---|---| +| `ambient-sso-admin-credentials` | `client_id`, `client_secret` | Keycloak Admin API for access key lifecycle | +| `ambient-runner-exchange-credentials` | `client_id`, `client_secret` | Runner token exchange client | +| `ambient-backend-oidc` | `client_id`, `client_secret` | Backend service-to-service auth | + +#### Environment Variables (changes/additions) + +| Component | Variable | Value | +|---|---|---| +| Backend | `SSO_ADMIN_CLIENT_ID` | `ambient-key-manager` | +| Backend | `SSO_ADMIN_CLIENT_SECRET` | from Secret | +| Backend | `SSO_REALM_URL` | `https://sso.redhat.com/auth/realms/redhat-external` | +| Control plane | `OIDC_CLIENT_ID` | `ambient-control-plane` (existing) | +| Control plane | `OIDC_CLIENT_SECRET` | from Secret (existing) | +| Runner | `SSO_TOKEN_EXCHANGE_URL` | RH SSO token endpoint | +| Runner | `SSO_EXCHANGE_CLIENT_ID` | `ambient-runner-exchange` | +| Runner | `SSO_EXCHANGE_CLIENT_SECRET` | from Secret | + +--- + +### What Gets Deleted + +| Component | What Goes Away | +|---|---| +| Operator | SA creation code for `ambient-session-*` | +| Operator | `TokenRequest` minting code | +| Operator | 45-minute token refresh loop | +| Operator | Secret `ambient-runner-token-*` creation | +| Control plane | Entire `internal/tokenserver/` package | +| Control plane | Entire `internal/keypair/` package | +| Control plane | Secret `ambient-cp-token-keypair` | +| Control plane | `CPTokenListenAddr`, `CPTokenURL`, `ProjectKubeTokenFile` config fields | +| Backend | SA creation in `CreateProjectKey()` | +| Backend | `DeleteProjectKey()` SA/RoleBinding deletion | +| Backend | `ListProjectKeys()` SA label selector query | +| Backend | `updateAccessKeyLastUsedAnnotation()` | +| Manifests | All `ambient-key-*` ClusterRole bindings (no longer static) | + +### What Gets Added + +| Component | What's New | +|---|---| +| Backend | Keycloak Admin API client (`pkg/keycloak/`) | +| Backend | `CreateProjectKey()` → create Keycloak client, assign roles, return `client_id`+`client_secret` | +| Backend | `DeleteProjectKey()` → delete Keycloak client | +| Backend | `ListProjectKeys()` → list Keycloak clients with `ambient-key-` prefix | +| Runner | OIDC token exchange on startup (call SSO, cache token, refresh before expiry) | +| ambient-api-server | No change — already validates RH SSO JWTs | + +--- + +### Migration Path + +1. **Register all clients in RH SSO** and validate token exchange with the cluster +2. **Deploy runner with dual-mode**: try exchange first, fall back to RSA for existing sessions +3. **Deploy operator without SA creation** for new sessions only (existing sessions unaffected) +4. **Once all active sessions are on new path**: remove RSA exchange from control plane +5. **Migrate access keys**: for each existing K8s SA access key, create Keycloak client, + notify users to re-issue credentials (old K8s SA tokens expire naturally) +6. **Remove old K8s SAs and Secrets**: `kubectl delete sa -l app=ambient-access-key -A` + +--- + +--- + +## 2. DB RBAC as Source of Truth — Options + +### The Problem to Solve + +Today a project admin must grant access in two independent systems: K8s RoleBindings (for the +backend/K8s API layer) and DB role_bindings (for ambient-api-server). They're not synced. You +can grant someone in one and forget the other. Neither system knows the other exists. + +### Constraint You Can't Remove + +K8s enforces RBAC natively for K8s API operations. When the backend does `SSAR` to check if a +user can `list agenticsessions`, K8s itself makes that decision using RoleBindings. You cannot +bypass this without rewriting how K8s works. So K8s RBAC **for K8s operations** always exists. + +The question is: *who is the write plane* — where does an admin go to say "give Alice access to +project X", and how does that propagate. + +--- + +### Option A: DB Drives K8s (Reconciliation) — Recommended + +**DB is the write plane. K8s RoleBindings are a derived artifact.** + +When a user is added to a project in ambient-api-server's `role_bindings` table, a new reconciler +(in the control plane or operator) watches for those changes and creates/deletes the corresponding +K8s RoleBinding automatically. + +``` +Admin calls: POST /api/ambient/v1/role_bindings + { user_id: "alice", role_id: "project:editor", scope: "project", scope_id: "my-project" } + ↓ +ambient-api-server writes to role_bindings table + ↓ +Reconciler watches role_bindings (polling or change-data-capture) + ↓ +Reconciler creates K8s RoleBinding in namespace "my-project": + subject: alice → ClusterRole: ambient-project-edit + ↓ +Backend SSAR continues to work unchanged +``` + +**Role mapping table** (DB role → K8s ClusterRole): + +| DB Role | K8s ClusterRole | +|---|---| +| `project:owner` | `ambient-project-admin` | +| `project:editor` | `ambient-project-edit` | +| `project:viewer` | `ambient-project-view` | +| `platform:admin` | cluster-admin or custom | +| `agent:runner` | (no K8s ClusterRole needed — runner uses token exchange) | +| Fine-grained (`credential:token-reader`, etc.) | (DB RBAC only, no K8s mapping needed) | + +**What changes:** +- New reconciler in control plane: watches `role_bindings` table, syncs K8s RoleBindings +- Backend permissions handler (`/api/projects/:name/permissions`) delegates writes to + ambient-api-server instead of directly creating K8s RoleBindings +- Frontend permissions UI calls ambient-api-server instead of backend +- Backend SSAR, middleware — no change + +**Tradeoffs:** +- Eventual consistency: DB write → K8s propagation has a lag (aim for < 5s) +- Reconciler needs K8s admin permissions to create RoleBindings +- Fine-grained DB permissions (`credential:token`) have no K8s equivalent — they're DB-only + and that's fine (ambient-api-server enforces them directly) + +--- + +### Option B: ambient-api-server as Authorization Service + +**DB is authoritative. Backend calls ambient-api-server for every authz decision.** + +Backend replaces `SelfSubjectAccessReview` calls with HTTP calls to a new ambient-api-server +endpoint: `POST /api/ambient/v1/authz/check`. + +``` +Backend request arrives with user token + ↓ +Backend extracts user identity from JWT claims (preferred_username) + ↓ +Backend calls: POST /api/ambient/v1/authz/check + { user: "alice", resource: "agenticsessions", action: "list", project: "my-project" } + ↓ +ambient-api-server queries role_bindings → roles → permissions +Returns: { allowed: true } + ↓ +Backend proceeds (or returns 403) +``` + +Backend caches results for 30 seconds (same as current SSAR cache). + +**What changes:** +- Backend: `globalSSARCache` logic remains, but calls ambient-api-server instead of K8s API +- ambient-api-server: new `/authz/check` endpoint +- K8s RoleBindings: can be removed for project-level user bindings (only system SAs need them) +- The K8s ClusterRoles `ambient-project-admin/edit/view` can be retired for user access + +**Tradeoffs:** +- Backend takes a **hard synchronous dependency** on ambient-api-server. If ambient-api-server + is down, the backend cannot authorize any request. +- Risk of circular dependency if ambient-api-server itself calls backend for anything. +- Eliminates K8s audit trail for user actions (SSAR no longer used). +- K8s RBAC for K8s operations still required for system SAs (operator, control plane, etc.) +- Net result: simpler for users, harder operationally. + +--- + +### Option C: Explicit Split (No Single Source) + +Accept that two systems exist but make the split **intentional and documented**: + +- **K8s RBAC** owns: "can this identity access this namespace at all" (coarse gate) +- **DB RBAC** owns: "can this identity do this specific action on this resource" (fine-grained) + +Both are authoritative for their domain. No overlap. Documented contract. + +The only change: everywhere a human admin today has to grant in both systems, replace with a +single API call that writes to both atomically. The backend's permissions handler writes a K8s +RoleBinding **and** a DB role_binding in the same request. + +**Tradeoffs:** +- Still two systems, but the dual-write is explicit and visible +- No reconciler needed, no new service dependencies +- Easiest to implement +- Doesn't actually solve the sync problem — just moves the two-write burden to the backend + +--- + +### Recommendation + +**Option A** is the right call. It gives you a single human-facing write plane (the DB) while +keeping K8s RBAC functioning as it does today. The backend changes minimally. The reconciler is +a small, focused component (< 200 lines of controller-runtime code). + +The reconciler fits naturally in the control plane, which already has K8s admin permissions and +watches resources for reconciliation. Add it alongside the existing project namespace reconciler. + +**One thing to decide:** what to do with fine-grained permissions like `credential:token-reader` +that have no K8s equivalent. The answer is: leave them DB-only. K8s RBAC enforces the coarse +gate (can you access the project). DB RBAC enforces the fine-grained gate (can you read a token +within the project). This split is actually correct — they serve different enforcement points. + +--- + +--- + +## 3. Extend the Credentials Table + +### The Goal + +Move all provider OAuth tokens (GitHub, GitLab, Google, Jira, Gerrit, CodeRabbit) from the +scattered K8s Secrets in the backend namespace into the `credentials` table in ambient-api-server. +Single audit trail, single access control model, single API. + +### Schema Change + +Add `user_id` and `scope` columns (one migration): + +```sql +ALTER TABLE credentials + ADD COLUMN user_id TEXT, + ADD COLUMN scope TEXT NOT NULL DEFAULT 'project'; + +-- scope = 'project': project_id set, user_id null (existing behavior) +-- scope = 'user': user_id set, project_id may be null +``` + +New unique index for user credentials: +```sql +CREATE UNIQUE INDEX credentials_user_provider_url + ON credentials (user_id, provider, url) + WHERE scope = 'user' AND deleted_at IS NULL; +``` + +### New Routes (ambient-api-server) + +``` +GET /api/ambient/v1/users/me/credentials +POST /api/ambient/v1/users/me/credentials +DELETE /api/ambient/v1/users/me/credentials/{id} +GET /api/ambient/v1/users/me/credentials/{id}/token +``` + +The `/me` route resolves `user_id` from the JWT `preferred_username` claim — no user ID in URL. + +### What Moves From K8s Secrets to DB + +| K8s Secret (backend namespace) | → DB credential | +|---|---| +| `gitlab-user-tokens` (key: userID) | `scope=user, provider=gitlab, token=, url=` | +| `google-creds-{hash}` | `scope=user, provider=google, token=`, refresh token in `annotations` JSON | +| `jira-creds-{hash}` | `scope=user, provider=jira, token=, url=, email=` | +| `gerrit-creds-{hash}` | `scope=user, provider=gerrit, token=, url=` (one row per instance) | +| `coderabbit-creds-{hash}` | `scope=user, provider=coderabbit, token=` | +| `{session}-{provider}-oauth` | `scope=project, provider=` + session ID in `labels` JSON | + +### What Stays Where It Is + +| Token | Stays Because | +|---|---| +| `oauth-callbacks` Secret | Transient state (UUID-keyed, short TTL) — a K8s Secret or Redis is fine | +| GitHub App installation tokens | Never stored; minted on demand from private key | +| Runner pod K8s JWT | Changes to OIDC exchange (see plan 1) — not a credential to store | + +### What Changes in the Backend + +Every `StoreX()` / `GetX()` / `DeleteX()` function in `handlers/oauth.go` and `handlers/secrets.go` +becomes an API call to ambient-api-server instead of a K8s Secret operation: + +``` +StoreGitLabToken(userID, token) → POST /users/me/credentials {provider: "gitlab", token: ...} +GetGitLabToken(userID) → GET /users/me/credentials?provider=gitlab + /token +DeleteGitLabToken(userID) → DELETE /users/me/credentials/{id} +``` + +The backend's K8s Secret operations for OAuth credentials reduce to zero. + +### RBAC for User Credentials (DB RBAC) + +New permission: `user_credential:token` (fetch raw token for my own credential) +New built-in role: `user:self` — every authenticated user gets this automatically (bound at login) + +Permissions: +```json +["user_credential:read", "user_credential:list", "user_credential:create", + "user_credential:update", "user_credential:delete", "user_credential:token"] +``` + +Users can only see and fetch their own credentials (enforced by `user_id = JWT.sub` filter, +not just RBAC — defense in depth). + +### Encryption (later) + +When ready, add a `kek_id` column (key-encryption-key ID) and encrypt `token` with AES-256-GCM +using a DEK wrapped by the KEK. The KMS can be OCP's built-in etcd encryption, Vault, or RHKMS. +The schema is designed so this is an additive change — no routes change, only the service layer. + +--- + +### Migration Order + +1. Deploy ambient-api-server schema migration (additive — no downtime) +2. Deploy new `/users/me/credentials` routes +3. Deploy backend with dual-write: write to both K8s Secret AND DB (dark launch) +4. Validate reads from DB return correct data +5. Flip backend to read from DB (write to K8s Secret removed) +6. Clean up orphaned K8s Secrets