diff --git a/docs/internal/architecture/iam-architecture.md b/docs/internal/architecture/iam-architecture.md new file mode 100644 index 000000000..c98bd738e --- /dev/null +++ b/docs/internal/architecture/iam-architecture.md @@ -0,0 +1,382 @@ +# Ambient Platform — Full IAM Architecture + +## THE BIG PICTURE (End-to-End Flow) + +``` +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ IDENTITY ENTRY POINTS ║ +╠══════════════════╦════════════════════════╦═════════════════╦════════════════════════╣ +║ HUMAN (Browser) ║ CLI / SDK USER ║ BOT / API KEY ║ SERVICE (internal) ║ +║ ║ ║ ║ ║ +║ RH SSO / OCP ║ oc whoami -t ║ K8s SA JWT ║ OIDC client creds ║ +║ OAuth login ║ sha256~... token ║ (ambient-key-*)║ or AMBIENT_API_TOKEN ║ +╚══════╦═══════════╩═══════════╦════════════╩════════╦════════╩═══════════╦════════════╝ + │ │ │ │ + ▼ │ │ │ +╔══════════════╗ │ │ │ +║ OAuth Proxy ║ │ │ │ +║ (sidecar) ║ │ │ │ +║ ║ │ │ │ +║ Validates ║ │ │ │ +║ OCP token ║ │ │ │ +║ ║ │ │ │ +║ Injects: ║ │ │ │ +║ X-Forwarded-║ │ │ │ +║ User ║ │ │ │ +║ Email ║ │ │ │ +║ Groups ║ │ │ │ +║ Access- ║ │ │ │ +║ Token ║ │ │ │ +╚══════╦═══════╝ │ │ │ + │ │ │ │ + ▼ ▼ │ │ +╔══════════════════════════════════════════╗ │ │ +║ NEXT.JS FRONTEND ║ │ │ +║ (components/frontend) ║ │ │ +║ ║ │ │ +║ buildForwardHeadersAsync() ║ │ │ +║ ┌──────────────────────────────────┐ ║ │ │ +║ │ Reads incoming headers │ ║ │ │ +║ │ Passes through X-Forwarded-* │ ║ │ │ +║ │ Sets BOTH: │ ║ │ │ +║ │ Authorization: Bearer │ ║ │ │ +║ │ X-Forwarded-Access-Token: ... │ ║ │ │ +║ └──────────────────────────────────┘ ║ │ │ +║ ║ │ │ +║ /api/projects/[name]/* → proxy → ║ │ │ +╚══════════════════╦═══════════════════════╝ │ │ + │ │ │ + └───────────────────────────────────┘ │ + │ │ + ▼ ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ BACKEND API SERVER ║ +║ (components/backend) SA: backend-api ║ +║ ║ +║ MIDDLEWARE CHAIN (every request): ║ +║ 1. Logger (redacts tokens from logs) ║ +║ 2. forwardedIdentityMiddleware() ║ +║ ├─ X-Forwarded-User → ctx["userID"] ║ +║ ├─ X-Forwarded-Email → ctx["userEmail"] ║ +║ ├─ X-Forwarded-Groups → ctx["userGroups"] ║ +║ ├─ X-Forwarded-Access-Token → ctx["forwardedAccessToken"] ◄── PRIORITY 1 ║ +║ ├─ Authorization: Bearer → ctx["authorizationHeader"] ◄── PRIORITY 2 ║ +║ └─ if no X-Forwarded-User: resolveServiceAccountFromToken() ◄── BOT/API KEY path ║ +║ └─ TokenReview API → extracts SA name → reads annotation for userID ║ +║ 3. CORS ║ +║ 4. ValidateProjectContext() (per /api/projects/:name group) ║ +║ ├─ extractRequestToken() (X-Fwd-Access-Token > Bearer > ?token=) ║ +║ ├─ GetK8sClientsForRequest() → user-scoped K8s client ║ +║ ├─ check globalSSARCache (30s TTL, SHA256 keyed) ║ +║ └─ SelfSubjectAccessReview: can user LIST agenticsessions in namespace? ║ +║ └─ 403 if denied, cache result ║ +║ ║ +║ TOKEN TYPES VALIDATED: ║ +║ • sha256~... (OCP tokens) SSAR hit → K8s validates against cluster ║ +║ • eyJ... (K8s SA JWT) SSAR hit → K8s validates against cluster ║ +║ • ghp_... (GitHub PAT) SSAR hit → K8s validates against cluster ║ +║ • generic 20+ chars (bearer) SSAR hit → K8s validates against cluster ║ +║ ║ +║ TWO K8s CLIENTS: ║ +║ • K8sClient / DynamicClient (SA: backend-api) → privileged writes after RBAC check ║ +║ • reqK8s / reqDyn (user token) → SSAR checks, list operations ║ +╚═══════╦════════════════════════╦═════════════════════════════════╦════════════════════╝ + │ │ │ + │ create/update │ credentials stored │ OAuth token exchange + │ AgenticSession CR │ in K8s Secrets │ GitHub, Google, etc. + ▼ ▼ ▼ +╔══════════════╗ ╔═════════════════════════╗ ╔═══════════════════════════════╗ +║ KUBERNETES ║ ║ K8s SECRETS ║ ║ EXTERNAL OAuth PROVIDERS ║ +║ API SERVER ║ ║ ║ ║ ║ +║ ║ ║ gitlab-user-tokens ║ ║ GitHub App → JWT minted ║ +║ Validates ║ ║ oauth-callbacks ║ ║ GitHub PAT → stored ║ +║ every token ║ ║ {session}-google-oauth ║ ║ Google Drive → refresh tok ║ +║ against ║ ║ google-creds-{userID} ║ ║ GitLab PAT → stored ║ +║ cluster JWKS ║ ║ jira-creds-{userID} ║ ║ Jira token → stored ║ +║ ║ ║ gerrit-creds-{userID} ║ ║ Gerrit HTTP → stored ║ +║ Enforces ║ ║ coderabbit-creds-{uid} ║ ║ Coderabbit → stored ║ +║ RBAC at K8s ║ ║ ambient-runner-token-* ║ ╚═══════════════════════════════╝ +║ level ║ ║ ambient-cp-token-keypair║ +╚═══════╦═══════╝ ╚═════════════════════════╝ + │ + │ (operator watches AgenticSession CRs) + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ KUBERNETES OPERATOR ║ +║ (components/operator) SA: agentic-operator ║ +║ ║ +║ On new AgenticSession CR: ║ +║ 1. Create ServiceAccount: ambient-session- ║ +║ 2. Create Role + RoleBinding (least-privilege for runner) ║ +║ 3. Mint token: K8sClient.ServiceAccounts(ns).CreateToken(sa, ...) ← JWT ~1hr ║ +║ 4. Store in Secret: ambient-runner-token- key: k8s-token ║ +║ 5. Create Job/Pod with: ║ +║ • volumeMount: /var/run/secrets/ambient/bot-token (from above secret) ║ +║ • NetworkPolicy: ingress only from backend ║ +║ 6. Every 45min: regenerate token, update secret (kubelet refreshes mount) ║ +║ ║ +║ On StopSession: delete Pod, Secret, RoleBinding, SA ║ +╚═══════════════════════════════════╦════════════════════════════════════════════════════╝ + │ /var/run/secrets/ambient/bot-token (JWT) + │ CP_TOKEN_URL env var + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ RUNNER POD ║ +║ SA: ambient-session- ║ +║ ║ +║ Two auth paths: ║ +║ ║ +║ PATH A: Call Backend API ║ +║ Authorization: Bearer ║ +║ → Backend validates via SSAR → handler executes ║ +║ ║ +║ PATH B: Call Control Plane for fresh API token ║ +║ POST /token ║ +║ Authorization: Bearer ║ +║ → Control plane decrypts with RSA private key ║ +║ → Returns OIDC/static API token for downstream calls ║ +╚═══════════════════════════════════╦════════════════════════════════════════════════════╝ + │ (calls for API token) + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ AMBIENT CONTROL PLANE ║ +║ (components/ambient-control-plane) SA: ambient-control-plane ║ +║ ║ +║ Token Server (RSA-based exchange): ║ +║ • Keypair stored in Secret: ambient-cp-token-keypair ║ +║ • Receives: Bearer ║ +║ • Decrypts session ID → validates → returns API token ║ +║ ║ +║ Outbound auth to API server: ║ +║ Either: ║ +║ • StaticTokenProvider: reads AMBIENT_API_TOKEN env var ║ +║ • OIDCTokenProvider: client_credentials flow to RH SSO ║ +║ OIDC_TOKEN_URL = https://sso.redhat.com/auth/realms/redhat-external/... ║ +║ OIDC_CLIENT_ID + OIDC_CLIENT_SECRET ║ +║ Caches token with 30s refresh buffer ║ +╚═══════════════════════════════════╦════════════════════════════════════════════════════╝ + │ + ▼ +╔══════════════════════════════════════════════════════════════════════════════════════╗ +║ AMBIENT API SERVER (Database-backed RBAC) ║ +║ (components/ambient-api-server) ║ +║ ║ +║ Authentication: ║ +║ 1. ForwardedAccessToken middleware: X-Forwarded-Access-Token → Authorization header ║ +║ 2. JWT validation: signature verified against RH SSO JWKS ║ +║ • Dev: secrets/kind-jwks.json (local file) ║ +║ • Prod: https://sso.redhat.com/.../openid-connect/certs ║ +║ 3. gRPC: AMBIENT_API_TOKEN (static) or GRPC_SERVICE_ACCOUNT (JWT username match) ║ +║ ║ +║ Authorization (DB RBAC): ║ +║ DBAuthorizationMiddleware → queries PostgreSQL ║ +║ role_bindings → roles → permissions (resource:action JSON array) ║ +║ ║ +║ PostgreSQL Schema: ║ +║ users: id, username, name, email ║ +║ roles: id, name, permissions (JSON ["session:read", "credential:token", ...]) ║ +║ role_bindings: user_id, role_id, scope (platform|project), scope_id ║ +║ credentials: id, project_id, name, provider, token*, url, email ║ +║ *token stored plaintext — no DB-level encryption ║ +║ ║ +║ Built-in roles: ║ +║ platform:admin ["*:*"] ║ +║ platform:viewer [read-only subset] ║ +║ project:owner ["project:*", "agent:*", "session:*", ...] ║ +║ project:editor [create/update, not delete project] ║ +║ project:viewer [read-only] ║ +║ agent:runner [runtime identity for agent pods] ║ +║ credential:token-reader ["credential:token"] ║ +╚══════════════════════════════════════════════════════════════════════════════════════╝ +``` + +--- + +## ALL SERVICE ACCOUNTS + +### System Service Accounts (static, in manifests) + +| SA Name | Namespace | ClusterRole | Key Permissions | +|---|---|---|---| +| `agentic-operator` | ambient-code | `agentic-operator` | Create/delete pods, jobs, PVCs, SAs, RoleBindings; mint SA tokens (`serviceaccounts/token`) | +| `ambient-control-plane` | ambient-code | `ambient-control-plane` | Manage projects/namespaces/RBAC | +| `backend-api` | ambient-code | `backend-api` | Create/update AgenticSessions, mint tokens, manage access keys | +| `frontend` | ambient-code | `ambient-frontend-auth` | TokenReview, authz checks | +| `ambient-backend` | ambient-system | `ambient-backend-cluster-role` | Legacy/backup backend | + +### Dynamic Service Accounts (created at runtime) + +| SA Name Pattern | Created By | Purpose | Bound Role | +|---|---|---|---| +| `ambient-session-` | Operator | Runner pod identity | Least-privilege project Role (read ConfigMaps, etc.) | +| `ambient-key--` | Backend | User API access keys | `ambient-project-admin/edit/view` (user's choice) | + +--- + +## ALL TOKEN TYPES IN THE SYSTEM + +### User-Facing Tokens + +| Token | Format | Source | Used For | Validated By | +|---|---|---|---|---| +| OCP/RH SSO bearer | `sha256~...` | `oc whoami -t` or browser login | All user API calls | K8s API server (SSAR) | +| K8s SA JWT | `eyJ...` | TokenRequest API | Access keys, runner pods | K8s API server (SSAR) | +| GitHub PAT | `ghp_...` | User creates in GitHub | Git operations | GitHub API | +| Generic bearer | 20+ chars | Various | SDK/CLI access | K8s SSAR | + +### System Tokens + +| Token | Source | Used For | Lifetime | +|---|---|---|---| +| Runner pod JWT | Operator → TokenRequest on `ambient-session-*` SA | Runner → Backend auth | ~1hr, refreshed every 45min | +| Access key JWT | Backend → TokenRequest on `ambient-key-*` SA | CI/CD → Backend auth | User-specified, max 1yr | +| Control plane OIDC token | RH SSO client_credentials flow | Control plane → API server | Short-lived, auto-refreshed (30s buffer) | +| Control plane static token | `AMBIENT_API_TOKEN` env var | Dev/simple deployments | Static (until rotated) | +| GitHub App installation token | Backend mints via GitHub App JWT | Git clone in sessions | ~1hr (GitHub enforced) | + +### OAuth Integration Tokens + +| Token | Provider | Stored In | Keyed By | +|---|---|---|---| +| Google access + refresh token | Google OAuth | K8s Secret (backend ns) | userID | +| GitLab PAT | User provides | K8s Secret `gitlab-user-tokens` | userID | +| Jira API token | User provides | K8s Secret (backend ns) | userID | +| Gerrit HTTP/cookie | User provides | K8s Secret (backend ns) | userID | +| CodeRabbit API key | User provides | K8s Secret (backend ns) | userID | +| Session-specific OAuth creds | OAuth callback | K8s Secret `{session}-{provider}-oauth` | session-scoped | + +--- + +## ALL SECRETS IN THE SYSTEM + +| Secret Name | Namespace | Contents | Owner | +|---|---|---|---| +| `gitlab-user-tokens` | project | GitLab PATs keyed by userID | Backend writes, runner reads | +| `gitlab-connections` | project | GitLab connection metadata | Backend | +| `oauth-callbacks` | backend | Temporary OAuth state (UUID keyed) | Backend (TTL) | +| `{session}-{provider}-oauth` | project | Session-scoped OAuth creds | Backend (GC via OwnerRef) | +| `google-creds-{hash}` | backend | Google OAuth access+refresh token | Backend | +| `jira-creds-{hash}` | backend | Jira URL + email + token | Backend | +| `gerrit-creds-{hash}` | backend | Gerrit instance credentials | Backend | +| `coderabbit-creds-{hash}` | backend | CodeRabbit API key | Backend | +| `ambient-runner-token-{session}` | project | Runner pod K8s JWT (`k8s-token` key) | Operator creates, pod mounts | +| `ambient-cp-token-keypair` | ambient-code | RSA-4096 pub+priv key for runner↔CP auth | Control plane | +| Access key SA token secrets | project | (managed by K8s) | K8s auto-manages for SA JWTs | + +--- + +## HOW THE TOKEN HEADERS FLOW + +``` +Browser/User + │ + │ (browser session cookie, managed by OAuth proxy) + ▼ +OAuth Proxy sidecar + │ + │ Adds to every proxied request: + │ X-Forwarded-User: alice + │ X-Forwarded-Email: alice@example.com + │ X-Forwarded-Groups: platform-admins,dev-team + │ X-Forwarded-Access-Token: sha256~ + ▼ +Next.js Frontend API Route + │ + │ buildForwardHeadersAsync() extracts all X-Forwarded-* headers + │ Sets BOTH on backend call: + │ X-Forwarded-Access-Token: sha256~ + │ Authorization: Bearer sha256~ + │ Also forwards: X-Forwarded-User, Email, Groups + ▼ +Backend API Server + │ + │ forwardedIdentityMiddleware(): + │ ctx.userID ← X-Forwarded-User + │ ctx.userEmail ← X-Forwarded-Email + │ ctx.userGroups← X-Forwarded-Groups + │ ctx.token ← X-Forwarded-Access-Token (priority 1) + │ Authorization: Bearer (priority 2) + │ + │ ValidateProjectContext(): + │ GetK8sClientsForRequest(token) → user-scoped K8s client + │ SSAR: can user LIST agenticsessions in project namespace? + │ (cached 30s) + │ + │ Handler: + │ User token for READ operations (SSAR) + │ Backend SA for WRITE operations (after SSAR validates) + ▼ +Kubernetes API Server + (validates token signature against cluster JWKS) +``` + +--- + +## THE DUAL AUTHORIZATION MODEL + +``` +Every API request hits BOTH authorization layers: + +Layer 1: Kubernetes RBAC (components/backend) + ┌─────────────────────────────────────────────┐ + │ SelfSubjectAccessReview │ + │ "Can THIS TOKEN perform VERB on RESOURCE │ + │ in NAMESPACE?" │ + │ │ + │ Enforced by: K8s API server │ + │ ClusterRoles: ambient-project-admin/edit/view│ + │ Source of truth: K8s RBAC objects │ + └─────────────────────────────────────────────┘ + +Layer 2: Database RBAC (components/ambient-api-server) + ┌─────────────────────────────────────────────┐ + │ DBAuthorizationMiddleware │ + │ "Does this JWT username have a role_binding │ + │ granting RESOURCE:ACTION in this project?" │ + │ │ + │ Enforced by: PostgreSQL query │ + │ Roles: platform:admin, project:owner, etc. │ + │ Source of truth: PostgreSQL DB │ + └─────────────────────────────────────────────┘ + +These are INDEPENDENT systems. A user needs: +• K8s RBAC binding for backend operations +• DB role binding for ambient-api-server operations +``` + +--- + +## RH SSO / CLUSTER JWT — HOW THEY RELATE + +``` +Red Hat SSO (external OIDC provider) + URL: https://sso.redhat.com/auth/realms/redhat-external + │ + ├─► Issues user tokens for human login (browser OAuth flow) + │ → OAuth proxy validates these, injects X-Forwarded-* headers + │ + ├─► Issues service tokens for control plane (client_credentials flow) + │ OIDC_CLIENT_ID + OIDC_CLIENT_SECRET → AMBIENT_API_TOKEN equiv + │ + └─► JWKS endpoint used by ambient-api-server to verify JWT signatures + https://sso.redhat.com/.../openid-connect/certs + +OpenShift / Kubernetes API Server (cluster JWT issuer) + │ + ├─► Issues user tokens (sha256~...) — what you get from `oc whoami -t` + │ These are validated when backend does SSAR + │ + ├─► Issues SA tokens via TokenRequest API + │ Used for: runner pods, access keys + │ Operator mints these for runners + │ Backend mints these for user access keys + │ + └─► TokenReview API — validates any bearer token against cluster + Backend uses this to identify which SA called (for BOT_TOKEN path) + +Key distinction: + RH SSO tokens → validated by ambient-api-server (DB RBAC layer) + OCP/K8s tokens → validated by K8s API server via SSAR (K8s RBAC layer) + BOTH types accepted at backend — which layer you hit depends on which + component you're calling. +``` diff --git a/docs/internal/proposals/iam-consolidation-plan.md b/docs/internal/proposals/iam-consolidation-plan.md new file mode 100644 index 000000000..d38a3d0f5 --- /dev/null +++ b/docs/internal/proposals/iam-consolidation-plan.md @@ -0,0 +1,409 @@ +# Ambient IAM — Three Improvement Plans + +--- + +## 1. Consolidate Around RH SSO + +### The Goal + +One issuer. Every token in the system — user, runner, access key, service — comes from or is +validated by RH SSO. No RSA keypairs, no K8s SA minting loops, no two-step exchanges. + +### What Changes (by identity type) + +#### A. Human users — no change +Already through RH SSO OAuth proxy. OCP issues `sha256~` tokens; the proxy validates them and +injects `X-Forwarded-*` headers. The backend and ambient-api-server already validate JWTs against +the RH SSO JWKS endpoint. This path is already right. + +#### B. Access keys — replace K8s SAs with RH SSO service accounts + +**Current:** Backend creates `ambient-key--` K8s ServiceAccount, creates RoleBinding, +calls `TokenRequest` API, returns JWT. User stores the JWT and sends it as Bearer on every call. +Tracking is done via a `last-used-at` annotation. + +**Target:** Backend calls the Keycloak Admin REST API to create a **confidential client** (service +account) in RH SSO. Client credentials (`client_id` / `client_secret`) are returned to the user +once. User calls RH SSO token endpoint to get a short-lived OIDC access token, sends it as Bearer. + +- Revocation: delete the client in Keycloak → all future token requests fail immediately +- Role assignment: Keycloak client roles map to `project:admin/edit/view` +- Token introspection: any component can call `/introspect` to verify a key is still active +- No K8s SA objects, no K8s RoleBindings for access keys, no `TokenRequest` calls + +#### C. Runner pods — replace K8s SA + RSA exchange with OIDC Token Exchange (RFC 8693) + +**Current:** +1. Operator creates `ambient-session-` SA +2. Operator calls `TokenRequest` → stores JWT in Secret `ambient-runner-token-` +3. Pod mounts the Secret, sends JWT to backend +4. Pod also calls control plane `/token` with RSA-encrypted session ID +5. Control plane decrypts with RSA-4096 private key, returns OIDC token +6. Operator refreshes the K8s JWT every 45 minutes + +**Target:** +1. OCP automatically projects a short-lived K8s SA token into every pod at + `/var/run/secrets/kubernetes.io/serviceaccount/token` (standard, no setup needed) +2. On startup, runner calls RH SSO token exchange endpoint: + ``` + POST /auth/realms/redhat-external/protocol/openid-connect/token + grant_type=urn:ietf:params:oauth:grant-type:token-exchange + subject_token= + subject_token_type=urn:ietf:params:oauth:token-type:jwt + client_id=ambient-runner-exchange + client_secret= + requested_token_type=urn:ietf:params:oauth:token-type:access_token + audience=ambient-platform + ``` +3. RH SSO validates the K8s JWT against the cluster JWKS, issues a scoped OIDC token with custom + claims: `session_id`, `project`, `role=agent:runner` +4. Runner uses this OIDC token for all downstream API calls (backend, ambient-api-server) +5. Token expiry is handled by standard OIDC refresh (no operator refresh loop needed) + +The entire control plane token server, RSA keypair bootstrap, and the 45-minute refresh loop in +the operator go away. + +#### D. Service-to-service (control plane, backend SA) — already right or align + +- Control plane already uses OIDC client credentials ✓ +- Backend SA (`backend-api`) should get its own Keycloak confidential client and use client + credentials for outbound calls to ambient-api-server (currently uses in-cluster SA token) + +--- + +### RH SSO: What Needs to Be Registered + +#### Clients (Confidential, Service Account enabled) + +| Client ID | Grant Type | Purpose | Roles Needed | +|---|---|---|---| +| `ambient-control-plane` | client_credentials | Already exists. Control plane → API server | `platform:admin` or equivalent | +| `ambient-backend` | client_credentials | Backend → ambient-api-server auth | `platform:admin` or equivalent | +| `ambient-runner-exchange` | token_exchange | Accept K8s JWT, issue scoped runner token | `token-exchange` permission on realm | +| `ambient-key-manager` | client_credentials | Keycloak Admin API — create/delete access key clients | `manage-clients`, `view-clients` realm roles | +| `ambient-key--` (dynamic) | client_credentials | Per user-created access key | Project-scoped role (admin/edit/view) | + +#### Realm Configuration + +| Setting | Value | Why | +|---|---|---| +| Token Exchange feature | Enabled | Required for RFC 8693 runner flow | +| K8s cluster as Identity Provider | Add cluster OIDC endpoint | So RH SSO can validate K8s-issued JWTs | +| Cluster JWKS URL | `https://api.:6443/openid/v1/jwks` | RH SSO fetches this to verify runner tokens | +| Client roles | `project:admin`, `project:editor`, `project:viewer` | Assigned to access key clients | +| Custom claim mapper | `session_id`, `project`, `role` on runner tokens | Downstream components read these claims | + +#### K8s Secrets Required (in `ambient-code` namespace) + +| Secret Name | Keys | Purpose | +|---|---|---| +| `ambient-sso-admin-credentials` | `client_id`, `client_secret` | Keycloak Admin API for access key lifecycle | +| `ambient-runner-exchange-credentials` | `client_id`, `client_secret` | Runner token exchange client | +| `ambient-backend-oidc` | `client_id`, `client_secret` | Backend service-to-service auth | + +#### Environment Variables (changes/additions) + +| Component | Variable | Value | +|---|---|---| +| Backend | `SSO_ADMIN_CLIENT_ID` | `ambient-key-manager` | +| Backend | `SSO_ADMIN_CLIENT_SECRET` | from Secret | +| Backend | `SSO_REALM_URL` | `https://sso.redhat.com/auth/realms/redhat-external` | +| Control plane | `OIDC_CLIENT_ID` | `ambient-control-plane` (existing) | +| Control plane | `OIDC_CLIENT_SECRET` | from Secret (existing) | +| Runner | `SSO_TOKEN_EXCHANGE_URL` | RH SSO token endpoint | +| Runner | `SSO_EXCHANGE_CLIENT_ID` | `ambient-runner-exchange` | +| Runner | `SSO_EXCHANGE_CLIENT_SECRET` | from Secret | + +--- + +### What Gets Deleted + +| Component | What Goes Away | +|---|---| +| Operator | SA creation code for `ambient-session-*` | +| Operator | `TokenRequest` minting code | +| Operator | 45-minute token refresh loop | +| Operator | Secret `ambient-runner-token-*` creation | +| Control plane | Entire `internal/tokenserver/` package | +| Control plane | Entire `internal/keypair/` package | +| Control plane | Secret `ambient-cp-token-keypair` | +| Control plane | `CPTokenListenAddr`, `CPTokenURL`, `ProjectKubeTokenFile` config fields | +| Backend | SA creation in `CreateProjectKey()` | +| Backend | `DeleteProjectKey()` SA/RoleBinding deletion | +| Backend | `ListProjectKeys()` SA label selector query | +| Backend | `updateAccessKeyLastUsedAnnotation()` | +| Manifests | All `ambient-key-*` ClusterRole bindings (no longer static) | + +### What Gets Added + +| Component | What's New | +|---|---| +| Backend | Keycloak Admin API client (`pkg/keycloak/`) | +| Backend | `CreateProjectKey()` → create Keycloak client, assign roles, return `client_id`+`client_secret` | +| Backend | `DeleteProjectKey()` → delete Keycloak client | +| Backend | `ListProjectKeys()` → list Keycloak clients with `ambient-key-` prefix | +| Runner | OIDC token exchange on startup (call SSO, cache token, refresh before expiry) | +| ambient-api-server | No change — already validates RH SSO JWTs | + +--- + +### Migration Path + +1. **Register all clients in RH SSO** and validate token exchange with the cluster +2. **Deploy runner with dual-mode**: try exchange first, fall back to RSA for existing sessions +3. **Deploy operator without SA creation** for new sessions only (existing sessions unaffected) +4. **Once all active sessions are on new path**: remove RSA exchange from control plane +5. **Migrate access keys**: for each existing K8s SA access key, create Keycloak client, + notify users to re-issue credentials (old K8s SA tokens expire naturally) +6. **Remove old K8s SAs and Secrets**: `kubectl delete sa -l app=ambient-access-key -A` + +--- + +--- + +## 2. DB RBAC as Source of Truth — Options + +### The Problem to Solve + +Today a project admin must grant access in two independent systems: K8s RoleBindings (for the +backend/K8s API layer) and DB role_bindings (for ambient-api-server). They're not synced. You +can grant someone in one and forget the other. Neither system knows the other exists. + +### Constraint You Can't Remove + +K8s enforces RBAC natively for K8s API operations. When the backend does `SSAR` to check if a +user can `list agenticsessions`, K8s itself makes that decision using RoleBindings. You cannot +bypass this without rewriting how K8s works. So K8s RBAC **for K8s operations** always exists. + +The question is: *who is the write plane* — where does an admin go to say "give Alice access to +project X", and how does that propagate. + +--- + +### Option A: DB Drives K8s (Reconciliation) — Recommended + +**DB is the write plane. K8s RoleBindings are a derived artifact.** + +When a user is added to a project in ambient-api-server's `role_bindings` table, a new reconciler +(in the control plane or operator) watches for those changes and creates/deletes the corresponding +K8s RoleBinding automatically. + +``` +Admin calls: POST /api/ambient/v1/role_bindings + { user_id: "alice", role_id: "project:editor", scope: "project", scope_id: "my-project" } + ↓ +ambient-api-server writes to role_bindings table + ↓ +Reconciler watches role_bindings (polling or change-data-capture) + ↓ +Reconciler creates K8s RoleBinding in namespace "my-project": + subject: alice → ClusterRole: ambient-project-edit + ↓ +Backend SSAR continues to work unchanged +``` + +**Role mapping table** (DB role → K8s ClusterRole): + +| DB Role | K8s ClusterRole | +|---|---| +| `project:owner` | `ambient-project-admin` | +| `project:editor` | `ambient-project-edit` | +| `project:viewer` | `ambient-project-view` | +| `platform:admin` | cluster-admin or custom | +| `agent:runner` | (no K8s ClusterRole needed — runner uses token exchange) | +| Fine-grained (`credential:token-reader`, etc.) | (DB RBAC only, no K8s mapping needed) | + +**What changes:** +- New reconciler in control plane: watches `role_bindings` table, syncs K8s RoleBindings +- Backend permissions handler (`/api/projects/:name/permissions`) delegates writes to + ambient-api-server instead of directly creating K8s RoleBindings +- Frontend permissions UI calls ambient-api-server instead of backend +- Backend SSAR, middleware — no change + +**Tradeoffs:** +- Eventual consistency: DB write → K8s propagation has a lag (aim for < 5s) +- Reconciler needs K8s admin permissions to create RoleBindings +- Fine-grained DB permissions (`credential:token`) have no K8s equivalent — they're DB-only + and that's fine (ambient-api-server enforces them directly) + +--- + +### Option B: ambient-api-server as Authorization Service + +**DB is authoritative. Backend calls ambient-api-server for every authz decision.** + +Backend replaces `SelfSubjectAccessReview` calls with HTTP calls to a new ambient-api-server +endpoint: `POST /api/ambient/v1/authz/check`. + +``` +Backend request arrives with user token + ↓ +Backend extracts user identity from JWT claims (preferred_username) + ↓ +Backend calls: POST /api/ambient/v1/authz/check + { user: "alice", resource: "agenticsessions", action: "list", project: "my-project" } + ↓ +ambient-api-server queries role_bindings → roles → permissions +Returns: { allowed: true } + ↓ +Backend proceeds (or returns 403) +``` + +Backend caches results for 30 seconds (same as current SSAR cache). + +**What changes:** +- Backend: `globalSSARCache` logic remains, but calls ambient-api-server instead of K8s API +- ambient-api-server: new `/authz/check` endpoint +- K8s RoleBindings: can be removed for project-level user bindings (only system SAs need them) +- The K8s ClusterRoles `ambient-project-admin/edit/view` can be retired for user access + +**Tradeoffs:** +- Backend takes a **hard synchronous dependency** on ambient-api-server. If ambient-api-server + is down, the backend cannot authorize any request. +- Risk of circular dependency if ambient-api-server itself calls backend for anything. +- Eliminates K8s audit trail for user actions (SSAR no longer used). +- K8s RBAC for K8s operations still required for system SAs (operator, control plane, etc.) +- Net result: simpler for users, harder operationally. + +--- + +### Option C: Explicit Split (No Single Source) + +Accept that two systems exist but make the split **intentional and documented**: + +- **K8s RBAC** owns: "can this identity access this namespace at all" (coarse gate) +- **DB RBAC** owns: "can this identity do this specific action on this resource" (fine-grained) + +Both are authoritative for their domain. No overlap. Documented contract. + +The only change: everywhere a human admin today has to grant in both systems, replace with a +single API call that writes to both atomically. The backend's permissions handler writes a K8s +RoleBinding **and** a DB role_binding in the same request. + +**Tradeoffs:** +- Still two systems, but the dual-write is explicit and visible +- No reconciler needed, no new service dependencies +- Easiest to implement +- Doesn't actually solve the sync problem — just moves the two-write burden to the backend + +--- + +### Recommendation + +**Option A** is the right call. It gives you a single human-facing write plane (the DB) while +keeping K8s RBAC functioning as it does today. The backend changes minimally. The reconciler is +a small, focused component (< 200 lines of controller-runtime code). + +The reconciler fits naturally in the control plane, which already has K8s admin permissions and +watches resources for reconciliation. Add it alongside the existing project namespace reconciler. + +**One thing to decide:** what to do with fine-grained permissions like `credential:token-reader` +that have no K8s equivalent. The answer is: leave them DB-only. K8s RBAC enforces the coarse +gate (can you access the project). DB RBAC enforces the fine-grained gate (can you read a token +within the project). This split is actually correct — they serve different enforcement points. + +--- + +--- + +## 3. Extend the Credentials Table + +### The Goal + +Move all provider OAuth tokens (GitHub, GitLab, Google, Jira, Gerrit, CodeRabbit) from the +scattered K8s Secrets in the backend namespace into the `credentials` table in ambient-api-server. +Single audit trail, single access control model, single API. + +### Schema Change + +Add `user_id` and `scope` columns (one migration): + +```sql +ALTER TABLE credentials + ADD COLUMN user_id TEXT, + ADD COLUMN scope TEXT NOT NULL DEFAULT 'project'; + +-- scope = 'project': project_id set, user_id null (existing behavior) +-- scope = 'user': user_id set, project_id may be null +``` + +New unique index for user credentials: +```sql +CREATE UNIQUE INDEX credentials_user_provider_url + ON credentials (user_id, provider, url) + WHERE scope = 'user' AND deleted_at IS NULL; +``` + +### New Routes (ambient-api-server) + +``` +GET /api/ambient/v1/users/me/credentials +POST /api/ambient/v1/users/me/credentials +DELETE /api/ambient/v1/users/me/credentials/{id} +GET /api/ambient/v1/users/me/credentials/{id}/token +``` + +The `/me` route resolves `user_id` from the JWT `preferred_username` claim — no user ID in URL. + +### What Moves From K8s Secrets to DB + +| K8s Secret (backend namespace) | → DB credential | +|---|---| +| `gitlab-user-tokens` (key: userID) | `scope=user, provider=gitlab, token=, url=` | +| `google-creds-{hash}` | `scope=user, provider=google, token=`, refresh token in `annotations` JSON | +| `jira-creds-{hash}` | `scope=user, provider=jira, token=, url=, email=` | +| `gerrit-creds-{hash}` | `scope=user, provider=gerrit, token=, url=` (one row per instance) | +| `coderabbit-creds-{hash}` | `scope=user, provider=coderabbit, token=` | +| `{session}-{provider}-oauth` | `scope=project, provider=` + session ID in `labels` JSON | + +### What Stays Where It Is + +| Token | Stays Because | +|---|---| +| `oauth-callbacks` Secret | Transient state (UUID-keyed, short TTL) — a K8s Secret or Redis is fine | +| GitHub App installation tokens | Never stored; minted on demand from private key | +| Runner pod K8s JWT | Changes to OIDC exchange (see plan 1) — not a credential to store | + +### What Changes in the Backend + +Every `StoreX()` / `GetX()` / `DeleteX()` function in `handlers/oauth.go` and `handlers/secrets.go` +becomes an API call to ambient-api-server instead of a K8s Secret operation: + +``` +StoreGitLabToken(userID, token) → POST /users/me/credentials {provider: "gitlab", token: ...} +GetGitLabToken(userID) → GET /users/me/credentials?provider=gitlab + /token +DeleteGitLabToken(userID) → DELETE /users/me/credentials/{id} +``` + +The backend's K8s Secret operations for OAuth credentials reduce to zero. + +### RBAC for User Credentials (DB RBAC) + +New permission: `user_credential:token` (fetch raw token for my own credential) +New built-in role: `user:self` — every authenticated user gets this automatically (bound at login) + +Permissions: +```json +["user_credential:read", "user_credential:list", "user_credential:create", + "user_credential:update", "user_credential:delete", "user_credential:token"] +``` + +Users can only see and fetch their own credentials (enforced by `user_id = JWT.sub` filter, +not just RBAC — defense in depth). + +### Encryption (later) + +When ready, add a `kek_id` column (key-encryption-key ID) and encrypt `token` with AES-256-GCM +using a DEK wrapped by the KEK. The KMS can be OCP's built-in etcd encryption, Vault, or RHKMS. +The schema is designed so this is an additive change — no routes change, only the service layer. + +--- + +### Migration Order + +1. Deploy ambient-api-server schema migration (additive — no downtime) +2. Deploy new `/users/me/credentials` routes +3. Deploy backend with dual-write: write to both K8s Secret AND DB (dark launch) +4. Validate reads from DB return correct data +5. Flip backend to read from DB (write to K8s Secret removed) +6. Clean up orphaned K8s Secrets