[PROPOSAL]RFC: OAuth 2.0 Support for OpenSearch

# RFC: OAuth 2.0 Support for OpenSearch

## Status: Proposed

---

## What/Why

### What are you proposing?

Add OAuth 2.0 support to OpenSearch via an authentication proxy, enabling secure machine-to-machine access, scoped API tokens, and third-party integrations. The proxy validates OAuth tokens (JWT), maps scopes to OpenSearch security roles, and forwards requests to the engine and Dashboards — with zero changes to existing components. This is the foundational layer that unlocks AI agent access, CI/CD automation, collaboration tool integration, and multi-tenant SaaS patterns for OpenSearch.

### What users have asked for this feature?

OpenSearch has foundational auth primitives but lacks the developer experience layer that competitors offer for programmatic and machine-to-machine access:

| Platform | OIDC/SSO | API Keys | Service Accounts | OAuth Apps / Scoped Tokens | Token Governance UI |
|---|---|---|---|---|---|
| Grafana | ✅ | ✅ | ✅ | ✅ | ✅ |
| Datadog | ✅ | ✅ | ✅ | ✅ | ✅ |
| Splunk | ✅ | ✅ | ✅ | ✅ | ✅ |
| Elastic | ✅ | ✅ | ✅ | ✅ | ✅ |
| **OpenSearch** | ✅ | 🔄 In progress ([#5443](https://github.com/opensearch-project/security/pull/5443)) | ✅ (limited) | **❌** | **❌** |

OpenSearch already supports [OIDC authentication](https://docs.opensearch.org/latest/security/authentication-backends/openid-connect/) and has a [Service Account](https://docs.opensearch.org/latest/security/access-control/authentication-tokens/#service-accounts) primitive (originally built for the extensions project). API Keys with direct permission scoping are [in active development](https://github.com/opensearch-project/security/pull/5443) targeting 3.7. The gap is not in authentication primitives — it's in the **developer experience layer**: OAuth app registration, scoped token issuance via standard flows, unified auth across engine and Dashboards, and token governance at enterprise scale.

### Relationship to Existing Capabilities

This RFC builds on — not replaces — existing OpenSearch security features:

| Existing Capability | What it does | What's missing |
|---|---|---|
| **OIDC Authenticator** ([docs](https://docs.opensearch.org/latest/security/authentication-backends/openid-connect/)) | Validates OIDC tokens with role claims for engine auth | Engine-only; doesn't cover Dashboards APIs. Requires roles in token claims — no scope-to-role mapping. No token lifecycle management. |
| **Service Accounts** ([docs](https://docs.opensearch.org/latest/security/access-control/authentication-tokens/#service-accounts)) | Scoped tokens for extensions to access system indices | Built for the extensions project. Not designed for external clients, AI agents, or CI/CD pipelines. |
| **API Keys** ([PR #5443](https://github.com/opensearch-project/security/pull/5443)) | Opaque tokens with `cluster_permissions` and `index_permissions` directly on the token | 🔄 In progress for 3.7. Engine-only — doesn't cover Dashboards. Admin-only issuance in V1. No OIDC federation, no governance UI, no Cedar policies. |

**The OAuth proxy is the layer that connects these primitives to the outside world** — it federates external OIDC tokens, provides unified auth across engine and Dashboards, and adds the governance/management layer that enterprises need. With API Keys landing in 3.7, the proxy becomes thinner: instead of mapping to pre-created backend users, it can programmatically issue API Keys via `/_plugins/_security/api/apitokens` with the exact permissions needed.

Community signals:

- [GitHub Issue #1coords: Service accounts / API tokens for programmatic access](https://github.com/opensearch-project/security/issues) — recurring requests for API tokens with scoped permissions
- [Forum: How to authenticate CI/CD pipelines](https://forum.opensearch.org/) — users asking for alternatives to basic auth in automation
- [GitHub Issue: MCP server for OpenSearch](https://github.com/opensearch-project/OpenSearch-Dashboards/issues) — AI agent integration blocked by lack of secure programmatic auth
- Stack Overflow: multiple questions about "OpenSearch API token" and "OpenSearch service account" with no good answers
- The rise of AI coding agents (Claude Code, Cursor, GitHub Copilot) has created urgent demand for secure, scoped, machine-to-machine auth to data stores — OpenSearch has no answer today

### What problems are you trying to solve?

1. When **building an AI agent that queries logs**, a **developer** wants to **grant the agent read-only access to specific indices with an expiring token**, so they **don't have to share admin credentials that could be leaked or misused**.

2. When **deploying dashboards across environments**, a **DevOps engineer** wants to **authenticate CI/CD pipelines with scoped service accounts**, so they **don't have to store admin passwords in CI secrets with full cluster access**.

3. When **integrating OpenSearch alerts with Slack**, a **platform engineer** wants to **use standard OAuth to connect services**, so they **don't have to build fragile webhook workarounds with embedded credentials**.

4. When **building a multi-tenant SaaS application**, a **backend developer** wants to **issue per-customer scoped tokens**, so they **can isolate tenant data access and revoke individual customers without affecting others**.

5. When **querying OpenSearch from an IDE or CLI**, a **developer** wants to **authenticate once via browser-based OAuth flow and get an auto-expiring token**, so they **don't have to copy-paste credentials that never expire**.

6. When **forwarding logs via Fluent Bit or OpenTelemetry**, an **infrastructure engineer** wants to **give each pipeline a write-only token scoped to its target index**, so they **limit blast radius if a pipeline credential is compromised**.

7. When **managing multiple OpenSearch clusters**, a **platform team** wants to **use one identity provider with different scoped tokens per cluster**, so they **don't have to manage separate credentials for each environment**.

### What is the developer experience going to be?

#### REST API

The OAuth proxy introduces the following endpoints:

**Token management:**
```
POST /oauth/token — Issue a new token (client credentials flow)
DELETE /oauth/token/{token_id} — Revoke a token
GET /oauth/tokens — List active tokens
GET /oauth/token/{token_id} — Get token details
```

**Token issuance example:**
```bash
# Request a scoped token
curl -X POST https://opensearch.example.com:8443/oauth/token \
 -d "grant_type=client_credentials" \
 -d "client_id=my-agent" \
 -d "client_secret=..." \
 -d "scope=read:logs-*"

# Response
{
 "access_token": "eyJhbGciOi...",
 "token_type": "Bearer",
 "expires_in": 3600,
 "scope": "read:logs-*"
}
```

**Using the token:**
```bash
# Search via OAuth proxy — token scoped to read:logs-* only
curl -H "Authorization: Bearer eyJhbGciOi..." \
 "https://opensearch.example.com:8443/logs-*/_search" \
 -d '{"query": {"match": {"level": "error"}}}'
```

**CLI tool:**
```bash
# Browser-based login (authorization code flow)
$ opensearch-auth login --provider keycloak
🌐 Opening browser for authentication...
✅ Authenticated as developer@example.com
 Token stored in ~/.opensearch/token (expires in 8h)
 Scopes: read:logs-*, read:metrics-*

# Create a service account token (client credentials)
$ opensearch-auth create-token --scopes "read:logs-*" --expires 24h
✅ Token created: tok_abc123 (expires 2024-01-16T10:00:00Z)

# Revoke a token
$ opensearch-auth revoke-token --token-id tok_abc123
✅ Token revoked

# Check status
$ opensearch-auth status
✅ Authenticated as developer@example.com
 Provider: keycloak | Expires: 6h remaining
 Scopes: read:logs-*, read:metrics-*
```

**Impact to existing APIs:** None. All existing OpenSearch and Dashboards APIs continue to work unchanged. The proxy is an additive component — clients that don't use OAuth bypass it entirely.

**Configuration (YAML):**
```yaml
# opensearch-oauth-proxy.yaml
upstream:
 engine: https://opensearch:9200
 dashboards: https://opensearch-dashboards:5601

providers:
 - name: keycloak
 issuer: https://keycloak.example.com/realms/opensearch
 jwks_uri: auto # auto-discover from issuer
 - name: auth0
 issuer: https://mycompany.auth0.com
 jwks_uri: auto

scope_mapping:
 "read:logs-*":
 backend_user: agent-logs-reader
 backend_roles: [logs_read_access]
 "write:dashboards":
 backend_user: agent-dashboard-writer
 backend_roles: [dashboard_write_access]
 "admin":
 backend_user: agent-admin
 backend_roles: [all_access]

listen: :8443
tls:
 cert: /path/to/cert.pem
 key: /path/to/key.pem
```

#### Are there any security considerations?

Yes — this feature is fundamentally about security:

1. **Token security** — JWT tokens are signed by the OIDC provider and validated by the proxy via JWKS. Tokens have expiry, scopes, and can be individually revoked.
2. **Scope enforcement** — OAuth scopes are mapped to OpenSearch security roles. The proxy never grants more access than the mapped role allows.
3. **No bypass** — clients going through the proxy cannot access the engine directly (network policy enforced). Clients not using OAuth continue to authenticate directly with existing methods.
4. **Audit trail** — every proxied request is logged with: client_id, user_id (if delegated), scopes, action, target index, timestamp.
5. **Integration with security plugin** — the proxy maps tokens to existing security plugin users/roles. FGAC (field-level, document-level security) and workspace ACL continue to apply as additive restrictions.
6. **Token storage** — tokens are stored in an OpenSearch system index (`.oauth-tokens`) with encryption at rest.

```mermaid
graph TB
 subgraph "Authorization Layers (each narrows access)"
 A["OAuth Scope Token can access logs-*"] --> B["FGAC (Security Plugin) User can read logs-*, field masking on PII"]
 B --> C["Workspace ACL User sees 'observability' workspace only"]
 C --> D["✅ Final: Read logs-*, PII masked, observability workspace only"]
 end

 style A fill:#e1f5fe
 style B fill:#fff3e0
 style C fill:#f3e5f5
 style D fill:#e8f5e9
```

#### Are there any breaking changes to the API?

**No.** Zero breaking changes.

- All existing OpenSearch REST APIs remain unchanged
- All existing Dashboards APIs remain unchanged
- All existing authentication methods (basic auth, SAML, OIDC) continue to work
- The OAuth proxy is a new, optional component — it does not modify or replace any existing functionality
- Clients that don't use OAuth are completely unaffected

### What is the user experience going to be?

#### Use Case 1: AI Agents (Claude Code, Cursor, Cody, Custom Agents)

AI coding agents need to query OpenSearch for log analysis, search, and observability. OAuth provides scoped, auditable, revocable access.

**Example: LangChain agent with OAuth**
```python
from langchain.agents import Tool
from opensearchpy import OpenSearch
import requests

# Get scoped OAuth token
token_response = requests.post("https://keycloak.example.com/token", data={
 "grant_type": "client_credentials",
 "client_id": "langchain-agent",
 "client_secret": "...",
 "scope": "read:logs-*"
})
token = token_response.json()["access_token"]

# Connect to OpenSearch via OAuth proxy
client = OpenSearch(
 hosts=[{"host": "opensearch.example.com", "port": 8443}],
 headers={"Authorization": f"Bearer {token}"},
 use_ssl=True
)

# Agent can search logs but CANNOT delete indices or access other data
def search_logs(query: str) -> str:
 results = client.search(index="logs-*", body={
 "query": {"query_string": {"query": query}},
 "size": 10
 })
 return str(results["hits"]["hits"])

tools = [Tool(name="search_logs", func=search_logs, description="Search OpenSearch logs")]
```

**Example: Claude Code / Cursor MCP config**
```json
{
 "mcpServers": {
 "opensearch": {
 "command": "opensearch-mcp-server",
 "env": {
 "OPENSEARCH_URL": "https://opensearch.example.com:8443",
 "OPENSEARCH_OAUTH_TOKEN": "eyJhbGciOi..."
 }
 }
 }
}
```

**Sequence diagram:**

```mermaid
sequenceDiagram
 participant Dev as Developer
 participant Agent as AI Agent (Claude Code)
 participant MCP as OpenSearch MCP Server
 participant Proxy as OAuth Proxy
 participant OIDC as OIDC Provider (Keycloak)
 participant OS as OpenSearch Engine

 Dev->>Agent: "Find error patterns in logs"
 Agent->>MCP: search(index="logs-*", query="level:error")
 
 Note over MCP,OIDC: First request — get token
 MCP->>OIDC: POST /token (client_credentials, scope=read:logs-*)
 OIDC-->>MCP: access_token (JWT, expires 1h)
 
 MCP->>Proxy: GET /logs-*/_search Authorization: Bearer <token>
 Proxy->>OIDC: Fetch JWKS (cached)
 Proxy->>Proxy: Validate JWT + extract scopes
 Proxy->>Proxy: Map scope "read:logs-*" → role "logs_reader"
 Proxy->>OS: GET /logs-*/_search (as internal user "logs_reader")
 OS-->>Proxy: Search results (247 hits)
 Proxy-->>MCP: Search results
 MCP-->>Agent: Formatted results
 Agent-->>Dev: "Found 247 errors. Top services: payment-api (102), auth-service (89)..."
```

**Compatible agents:**

| Agent / Framework | Integration method |
|---|---|
| Claude Code / Kiro | MCP server with OAuth token |
| Cursor | Custom tool with OAuth bearer token |
| GitHub Copilot Extensions | OAuth app authorization |
| Sourcegraph Cody | Context provider with OAuth |
| LangChain / LlamaIndex | OpenSearch tool with OAuth credentials |
| CrewAI / AutoGen | Custom tool with OAuth bearer |

#### Use Case 2: Model Context Protocol (MCP)

**Example: OpenSearch MCP server (Python)**
```python
from mcp.server import Server
from opensearchpy import OpenSearch

app = Server("opensearch-mcp")

def get_client(oauth_token: str) -> OpenSearch:
 return OpenSearch(
 hosts=[{"host": "opensearch.example.com", "port": 8443}],
 headers={"Authorization": f"Bearer {oauth_token}"},
 use_ssl=True
 )

@app.tool()
async def search(index: str, query: str, size: int = 10) -> str:
 """Search an OpenSearch index with a query string."""
 client = get_client(app.context.oauth_token)
 results = client.search(index=index, body={
 "query": {"query_string": {"query": query}}, "size": size
 })
 return "\n".join(f"[{h['_index']}] {h['_source']}" for h in results["hits"]["hits"])

@app.tool()
async def get_mappings(index: str) -> str:
 """Get the field mappings for an index."""
 client = get_client(app.context.oauth_token)
 return str(client.indices.get_mapping(index=index))

@app.tool()
async def list_indices(pattern: str = "*") -> str:
 """List available indices matching a pattern."""
 client = get_client(app.context.oauth_token)
 return str(client.cat.indices(index=pattern, format="json"))
```

#### Use Case 3: Enterprise CI/CD Pipelines

**Example: GitHub Actions workflow**
```yaml
name: Deploy OpenSearch Dashboards
on:
 push:
 branches: [main]
 paths: ['dashboards/**']

jobs:
 deploy:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/checkout@v4

 - name: Get OAuth token
 id: auth
 run: |
 TOKEN=$(curl -s -X POST ${{ secrets.OAUTH_TOKEN_URL }} \
 -d "grant_type=client_credentials" \
 -d "client_id=${{ secrets.OAUTH_CLIENT_ID }}" \
 -d "client_secret=${{ secrets.OAUTH_CLIENT_SECRET }}" \
 -d "scope=write:dashboards" | jq -r '.access_token')
 echo "token=$TOKEN" >> $GITHUB_OUTPUT

 - name: Deploy to staging
 run: |
 curl -X POST "https://staging-opensearch.example.com:8443/api/saved_objects/_import" \
 -H "Authorization: Bearer ${{ steps.auth.outputs.token }}" \
 -H "osd-xsrf: true" \
 --form file=@dashboards/web-monitoring.ndjson

 - name: Deploy to production
 if: success()
 run: |
 curl -X POST "https://prod-opensearch.example.com:8443/api/saved_objects/_import" \
 -H "Authorization: Bearer ${{ steps.auth.outputs.token }}" \
 -H "osd-xsrf: true" \
 --form file=@dashboards/web-monitoring.ndjson
```

**Example: Terraform**
```hcl
provider "opensearch" {
 url = "https://opensearch.example.com:8443"
 oauth_token = var.opensearch_oauth_token # scoped: write:dashboards
}

resource "opensearch_index_template" "logs" {
 name = "logs-template"
 body = jsonencode({
 index_patterns = ["logs-*"]
 template = { settings = { number_of_shards = 3 } }
 })
}
```

**Sequence diagram:**

```mermaid
sequenceDiagram
 participant Dev as Developer
 participant GH as GitHub
 participant CI as GitHub Actions
 participant Proxy as OAuth Proxy
 participant OIDC as OIDC Provider
 participant Stage as Staging OpenSearch
 participant Prod as Production OpenSearch

 Dev->>GH: git push (dashboards/*.ndjson)
 GH->>CI: Trigger workflow
 CI->>OIDC: POST /token (scope=write:dashboards)
 OIDC-->>CI: access_token
 CI->>Proxy: POST /api/saved_objects/_import (staging)
 Proxy->>Stage: Import dashboard
 Stage-->>CI: 200 OK ✅
 CI->>Proxy: POST /api/saved_objects/_import (prod)
 Proxy->>Prod: Import dashboard
 Prod-->>CI: 200 OK ✅
 CI->>Dev: Slack: "Dashboard deployed to prod"
```

#### Use Case 4: Collaboration & ChatOps

**Example: Slack bot (Node.js)**
```javascript
const { App } = require('@slack/bolt');
const { Client } = require('@opensearch-project/opensearch');

const opensearch = new Client({
 node: 'https://opensearch.example.com:8443',
 auth: { bearer: process.env.OPENSEARCH_OAUTH_TOKEN } // scoped: read:logs-*
});

const app = new App({ token: process.env.SLACK_BOT_TOKEN, signingSecret: process.env.SLACK_SIGNING_SECRET });

// /opensearch query "level:error AND service:payment-api"
app.command('/opensearch', async ({ command, ack, respond }) => {
 await ack();
 const query = command.text.replace('query ', '');
 const results = await opensearch.search({
 index: 'logs-*',
 body: { query: { query_string: { query } }, size: 5 }
 });
 const hits = results.body.hits.hits;
 await respond({
 blocks: [{
 type: 'section',
 text: { type: 'mrkdwn', text: `*Found ${results.body.hits.total.value} results:*` }
 },
 ...hits.map(h => ({
 type: 'section',
 text: { type: 'mrkdwn', text: `\`${h._source.timestamp}\` [${h._source.level}] ${h._source.message}` }
 }))]
 });
});
```

#### Use Case 5: Multi-Tenant SaaS

**Example: Per-tenant scoped tokens (Python/FastAPI)**
```python
from fastapi import FastAPI, Request
from opensearchpy import OpenSearch
import requests

app = FastAPI()

def get_tenant_client(tenant_id: str) -> OpenSearch:
 token = requests.post("https://oauth-proxy.internal:8443/oauth/token", data={
 "grant_type": "client_credentials",
 "client_id": "saas-backend",
 "client_secret": "...",
 "scope": f"read:tenant-{tenant_id}-*"
 }).json()["access_token"]
 return OpenSearch(
 hosts=[{"host": "opensearch.internal", "port": 8443}],
 headers={"Authorization": f"Bearer {token}"}, use_ssl=True
 )

@app.get("/api/search")
async def search(query: str, request: Request):
 tenant_id = request.state.tenant_id
 client = get_tenant_client(tenant_id)
 # Can ONLY access tenant-specific indices
 results = client.search(index=f"tenant-{tenant_id}-*", body={
 "query": {"query_string": {"query": query}}
 })
 return results["hits"]["hits"]
```

#### Use Case 6: Observability Pipelines

**Example: Fluent Bit with scoped write token**
```ini
[OUTPUT]
 Name opensearch
 Match *
 Host opensearch.example.com
 Port 8443
 TLS On
 Header Authorization Bearer eyJhbGciOi...
 Index app-logs
 # Token scope: write:app-logs-* — cannot read, cannot access other indices
```

**Example: OpenTelemetry Collector**
```yaml
exporters:
 opensearch:
 http:
 endpoint: "https://opensearch.example.com:8443"
 headers:
 Authorization: "Bearer ${OPENSEARCH_OAUTH_TOKEN}"
 traces_index: "otel-traces"
 logs_index: "otel-logs"
 # Token scope: write:otel-* — isolated from application indices
```

**Before vs after:**
```
Before: Every pipeline uses admin basic auth → full access to everything
After: Fluent Bit → write:app-logs-* (can only write app logs)
 OTel → write:otel-* (can only write traces/metrics)
 Logstash → write:logstash-* (can only write its indices)
 Each pipeline is isolated. Compromised Fluent Bit can't read OTel data.
```

#### Are there breaking changes to the User Experience?

**No.** The OAuth proxy is entirely opt-in:

| Scenario | Impact |
|---|---|
| Existing users hitting OpenSearch directly | ❌ No change |
| Existing OSD users logging in via SAML/OIDC | ❌ No change |
| Existing basic auth scripts | ❌ No change |
| New agent/CI/CD wanting OAuth | ✅ Point at proxy endpoint |

### Why should it be built? Any reason not to?

**Why build it:**

1. **Dashboards auth gap** — The engine security plugin (including the upcoming API Keys) only covers the OpenSearch engine. OpenSearch Dashboards has its own API surface — saved objects, workspace management, UI settings, visualization export/import — that sits outside the engine's security scope. An AI agent or CI/CD pipeline that needs to import dashboards, manage workspaces, or interact with Dashboards APIs cannot authenticate with engine-level API Keys alone. The proxy provides a single authenticated entry point for the full platform. This gap widens as Dashboards evolves into a richer application layer.
2. **Token governance at enterprise scale** — Engine API Keys ([PR #5443](https://github.com/opensearch-project/security/pull/5443)) provide the primitive to create and delete tokens, but V1 is admin-only issuance with no governance layer. Enterprises need: delegated issuance (team leads creating tokens for their scope), consent/approval workflows, centralized visibility across all tokens org-wide, automated rotation, bulk revocation per compromised client, and a full audit trail. This is critical for regulated industries (finance, healthcare, government) where "who has access to what and when was it granted" must be auditable.
3. **OIDC federation** — Enterprises with existing Keycloak/Auth0/Okta deployments want to use their existing identity infrastructure to access OpenSearch without creating separate API Keys. The proxy bridges external OIDC JWTs to engine-native credentials.
4. **AI agent enablement** — the AI agent ecosystem (Claude Code, Cursor, MCP, LangChain) is exploding. These tools need secure, scoped, machine-to-machine auth. Building OAuth makes OpenSearch AI-agent-ready with standard protocols.
5. **Competitive parity** — Grafana, Datadog, Splunk, and Elastic all have the full stack: API keys + OAuth apps + governance UI. OpenSearch has the engine primitive coming (API Keys) but not the developer experience layer.
6. **Foundation for everything else** — Slack integration, GitHub integration, CI/CD automation, multi-tenant SaaS — all converge on OAuth. Building this once unlocks all of them.

**Reasons not to build:**

1. **Basic auth workaround exists** — users can technically use basic auth for programmatic access, though it's insecure and unscoped.
2. **Proxy adds complexity** — another component to deploy and monitor. Mitigated by making it optional and lightweight.
3. **Scope-to-role mapping is coarse** — without native engine support, the proxy maps tokens to pre-defined backend users. Fine-grained per-token permissions require future engine integration.

**Impact if not built:** OpenSearch continues to fall behind competitors in enterprise and AI agent adoption. Users choose Grafana or Elastic for programmatic workflows. The gap widens as AI agent usage grows.

### What will it take to execute?

#### Architecture

```mermaid
graph TB
 subgraph Clients
 A[AI Agent Claude Code / Cursor]
 B[CI/CD Pipeline GitHub Actions / Terraform]
 C[Slack Bot]
 D[CLI / IDE Plugin]
 end

 subgraph "OAuth Proxy (Go)"
 E[JWT Validation JWKS Auto-Discovery]
 F[Scope → Role Mapping]
 G[Cedar Policy Engine Optional]
 H[API Key Lifecycle via Engine REST API]
 I[Prometheus /metrics]
 end

 subgraph "OpenSearch Engine"
 J[Engine API Search / Index / Admin]
 L[Security Plugin FGAC / Workspace ACL]
 P[API Keys PR #5443 — 3.7]
 end

 subgraph "OpenSearch Dashboards"
 K[Dashboards API Saved Objects / Workspaces / UI]
 end

 subgraph "OIDC Providers"
 M[Keycloak]
 N[Auth0 / Okta]
 O[Dex]
 end

 A -->|Bearer Token| E
 B -->|Bearer Token| E
 C -->|Bearer Token| E
 D -->|Bearer Token| E
 E --> F
 F --> G
 F -->|Mapped Credentials or API Key| J
 F -->|Mapped Credentials| K
 J --> L
 L --> P
 E -.->|JWKS Fetch| M
 E -.->|JWKS Fetch| N
 E -.->|JWKS Fetch| O
 H -->|POST /_plugins/_security/api/apitokens| P
```

> **Key change from original design:** The proxy delegates token issuance to the engine's native API Keys ([PR #5443](https://github.com/opensearch-project/security/pull/5443)) rather than maintaining its own token store. This makes the proxy stateless and thinner. The proxy's unique value is: (1) sitting in front of **both** engine and Dashboards, and (2) bridging external OIDC tokens to engine-native API Keys.

#### Implementation stack

| Layer | Choice | Why |
|---|---|---|
| Language | Go | Fast, small binary, standard for proxies (Envoy, Traefik, CoreDNS) |
| OAuth/JWT | `golang-jwt/jwt` + OIDC discovery | Standard JWT validation, auto-fetch JWKS |
| Cedar | `cedar-go` | Local fine-grained policy evaluation (Apache 2.0) |
| Config | YAML | Scope-to-role mappings, upstream endpoints |
| Deployment | Docker container / sidecar | Runs alongside OpenSearch |
| Distribution | GitHub repo + Docker Hub | Standard open source |

#### Phased delivery

| Phase | Scope | Effort (AI-first) | Effort (traditional) |
|---|---|---|---|
| Phase 0: API Keys (prerequisite) | Engine-native API Keys with direct permission scoping ([PR #5443](https://github.com/opensearch-project/security/pull/5443)) | Owned by @cwperks — targeting 3.7 | — |
| Phase 1: OAuth proxy MVP | Go proxy, JWT validation, scope mapping, API Key integration, CLI, Docker | 2 engineers, 6 weeks | 2 engineers, 1 quarter |
| Phase 2: OSD plugin | Token management UI, consent screen, scope admin, governance dashboard | 1 engineer, 6 weeks | 1 engineer, 1 quarter |
| Phase 3: Cedar policies | Fine-grained local policy evaluation, caching | 1 engineer, 4 weeks | 1-2 engineers, 1 quarter |

**Total: ~3.5 months (AI-first) vs ~9 months (traditional)** — Phase 0 runs in parallel.

```
Phase 0: API Keys lands in OpenSearch 3.7 (parallel, owned by security plugin team)
Weeks 1-2: OAuth proxy MVP (JWT validation, OIDC federation, forwarding to engine + Dashboards)
Weeks 3-4: CLI tool + API Key lifecycle integration (create/revoke via engine API)
Weeks 5-6: Testing, docs, Docker, open source release
Weeks 7-10: OSD plugin (token management UI, consent screen, governance dashboard)
Weeks 11-14: Cedar integration, policy evaluation
```

#### Assumptions and constraints

- **Proxy approach chosen over engine modification** — faster to ship, no engine changes, independent release cycle. The proxy is complementary to the engine's API Keys ([PR #5443](https://github.com/opensearch-project/security/pull/5443)) — it consumes API Keys as the engine-level primitive and adds OIDC federation, Dashboards coverage, and governance on top.
- **API Keys as the engine primitive** — with API Keys landing in 3.7, the proxy delegates token creation to `/_plugins/_security/api/apitokens` rather than maintaining its own token store. This makes the proxy stateless. Before API Keys land, the proxy falls back to mapping OAuth scopes to pre-created backend users/roles.
- **Dashboards API coverage** — the engine security plugin (including API Keys) only covers the OpenSearch engine. Dashboards has its own API surface (saved objects, workspaces, UI settings) that is not covered by engine-level auth. The proxy sits in front of both, providing unified auth for the full platform.
- **OIDC provider required** — users must run an OIDC-compliant provider (Keycloak, Auth0, Okta, Dex). The proxy does not include a built-in identity provider.
- **Extra network hop** — the proxy adds latency (~1-5ms per request). Acceptable for most use cases; high-throughput search workloads may want to bypass the proxy for internal traffic.

#### Deployment

```yaml
# docker-compose.yaml
services:
 opensearch:
 image: opensearchproject/opensearch:latest
 ports:
 - "9200:9200"

 opensearch-dashboards:
 image: opensearchproject/opensearch-dashboards:latest
 ports:
 - "5601:5601"

 oauth-proxy:
 image: opensearch-oauth-proxy:latest
 ports:
 - "8443:8443"
 volumes:
 - ./config.yaml:/etc/opensearch-oauth/config.yaml
```

| Adoption scenario | Action required |
|---|---|
| Don't want OAuth | Nothing. Zero changes. |
| Self-hosted, want OAuth | Deploy proxy container. Add YAML config. |
| Already have a reverse proxy | Chain proxies, or embed token validation library. |

### Any remaining open questions?

1. **~~Built-in token issuer vs external-only?~~** — Resolved: with API Keys ([PR #5443](https://github.com/opensearch-project/security/pull/5443)) landing in the engine, the proxy delegates token issuance to the engine's `/_plugins/_security/api/apitokens` endpoint. No built-in issuer needed.

2. **Scope naming convention** — What should the standard scope format be? Options: `read:index-pattern`, `action:resource`, `role:role-name`. Need community input.

3. **~~Token storage backend~~** — Resolved: API Keys are stored in the engine's `.opensearch_security_api_tokens` system index. The proxy is stateless — no separate token store needed.

4. **WebSocket support** — Dashboards uses WebSockets for real-time features. How does the proxy handle WebSocket upgrade with OAuth tokens?

5. **Rate limiting** — Should the proxy include built-in rate limiting per token/client, or defer to external tools (Envoy, API Gateway)?

6. **API Keys API surface alignment** — Does the API Keys REST API support filtering by `created_by`? The proxy needs this to manage tokens per OAuth client. Does it support custom metadata fields for audit trail (e.g., `client_id`, `oauth_provider`)?

7. **Cedar policy management UX** — How do users author and test Cedar policies? CLI-only, or a visual editor in Dashboards?

8. **Multi-cluster token federation** — Can a token issued for one cluster be used across multiple clusters? What's the trust model?

9. **Dashboards auth integration** — How does the proxy authenticate requests to Dashboards APIs that expect session cookies? Does it need to establish a Dashboards session on behalf of the OAuth client, or can Dashboards be extended to accept bearer tokens directly?

---

## References

- [OAuth 2.0 RFC 6749](https://datatracker.ietf.org/doc/html/rfc6749)
- [OAuth Token Exchange RFC 8693](https://datatracker.ietf.org/doc/html/rfc8693)
- [Cedar Policy Language](https://www.cedarpolicy.com/)
- [OpenSearch Security Plugin](https://opensearch.org/docs/latest/security/)
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
- [OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html)

---

## Appendix

### A. Additional use cases

#### IDE and Developer Tool Integration

**VS Code extension (TypeScript):**
```typescript
import * as vscode from 'vscode';
import { Client } from '@opensearch-project/opensearch';

export async function activate(context: vscode.ExtensionContext) {
 const token = await context.secrets.get('opensearch-oauth-token');
 const client = new Client({
 node: 'https://opensearch.example.com:8443',
 auth: { bearer: token }
 });

 const searchCommand = vscode.commands.registerCommand('opensearch.searchLogs', async () => {
 const query = await vscode.window.showInputBox({ prompt: 'Search query' });
 const results = await client.search({
 index: 'logs-*',
 body: { query: { query_string: { query } }, size: 20 }
 });
 const panel = vscode.window.createWebviewPanel('opensearch', 'Search Results', vscode.ViewColumn.Two);
 panel.webview.html = formatResults(results.body.hits.hits);
 });
}
```

#### Cross-Cluster and Federation

**Multi-cluster management (Python):**
```python
CLUSTERS = {
 "production": {"url": "https://prod-opensearch.example.com:8443", "scope": "admin:prod"},
 "staging": {"url": "https://stage-opensearch.example.com:8443", "scope": "admin:staging"},
 "analytics": {"url": "https://analytics-opensearch.example.com:8443", "scope": "read:analytics"},
}

def get_client(cluster_name: str) -> OpenSearch:
 cluster = CLUSTERS[cluster_name]
 token = requests.post("https://auth.example.com/oauth/token", data={
 "grant_type": "client_credentials",
 "client_id": "cluster-manager",
 "client_secret": "...",
 "scope": cluster["scope"]
 }).json()["access_token"]
 return OpenSearch(
 hosts=[cluster["url"]],
 headers={"Authorization": f"Bearer {token}"}, use_ssl=True
 )

for name in CLUSTERS:
 client = get_client(name)
 health = client.cluster.health()
 print(f"{name}: {health['status']} ({health['number_of_nodes']} nodes)")
```

#### Ansible Playbook

```yaml
- name: Deploy OpenSearch configurations
 hosts: localhost
 vars:
 opensearch_url: "https://opensearch.example.com:8443"
 oauth_token: "{{ lookup('env', 'OPENSEARCH_OAUTH_TOKEN') }}"
 tasks:
 - name: Create index template
 uri:
 url: "{{ opensearch_url }}/_index_template/logs-template"
 method: PUT
 headers:
 Authorization: "Bearer {{ oauth_token }}"
 body_format: json
 body:
 index_patterns: ["logs-*"]
 template:
 settings:
 number_of_shards: 3
 number_of_replicas: 1

 - name: Import dashboards
 uri:
 url: "{{ opensearch_url }}/api/saved_objects/_import"
 method: POST
 headers:
 Authorization: "Bearer {{ oauth_token }}"
 osd-xsrf: "true"
 src: "dashboards/web-monitoring.ndjson"
```

#### Slack Alert Notification (Python)

```python
def send_alert_to_slack(alert):
 slack_message = {
 "blocks": [
 {"type": "header", "text": {"type": "plain_text", "text": f"🔴 Alert: {alert['name']}"}},
 {"type": "section", "fields": [
 {"type": "mrkdwn", "text": f"*Index:* `{alert['index']}`"},
 {"type": "mrkdwn", "text": f"*Condition:* {alert['condition']}"},
 {"type": "mrkdwn", "text": f"*Current value:* {alert['value']}"},
 {"type": "mrkdwn", "text": f"*Severity:* {alert['severity']}"}
 ]},
 {"type": "actions", "elements": [
 {"type": "button", "text": {"type": "plain_text", "text": "View Dashboard"}, 
 "url": alert["dashboard_url"]},
 {"type": "button", "text": {"type": "plain_text", "text": "Acknowledge"},
 "action_id": "ack_alert", "value": alert["id"]},
 {"type": "button", "text": {"type": "plain_text", "text": "Mute 1h"},
 "action_id": "mute_alert", "value": alert["id"]}
 ]}
 ]
 }
 requests.post(SLACK_WEBHOOK_URL, json=slack_message)
```

#### Logstash Output with OAuth

```ruby
output {
 opensearch {
 hosts => ["https://opensearch.example.com:8443"]
 auth_type => {
 type => "bearer"
 token => "${OPENSEARCH_OAUTH_TOKEN}"
 }
 index => "logstash-%{+YYYY.MM.dd}"
 }
}
```

### B. What this blocks today

| Scenario | Current workaround | Risk |
|---|---|---|
| AI agent queries OpenSearch | Hardcoded basic auth credentials | No scoping, no revocation, credential leaks |
| CI/CD deploys dashboards | Shared admin password in CI secrets | Full admin access, no audit trail |
| Slack bot sends alerts | Webhook with embedded credentials | No identity, no scoping |
| Terraform manages configs | Basic auth in state files | Credentials in plaintext |
| Multi-tenant SaaS queries per-customer data | Separate users per tenant, manual management | Doesn't scale, error-prone |
| Developer tools (IDE plugins, CLI) | Copy-paste credentials | No expiry, no revocation |

### C. Design principles

1. **Zero breaking changes** — existing auth methods continue to work. OAuth is opt-in.
2. **No engine modifications** — implemented as a proxy, not a security plugin change.
3. **Works with any OIDC provider** — Keycloak, Auth0, Okta, Dex, or any compliant provider.
4. **Open source** — Apache 2.0 licensed, community-driven.

### D. Phase 2: OSD Plugin details

| Component | Description |
|---|---|
| Token management | Create, revoke, list tokens in Dashboards UI |
| Consent screen | "Grant Agent X access to logs-*?" approval flow |
| Scope/role mapping admin | Visual editor for scope-to-role configuration |
| Config storage | Stored in OpenSearch system index, proxy reads dynamically |

### E. Phase 3: Cedar Policy Engine details

Replace static scope-to-role mapping with fine-grained [Cedar](https://www.cedarpolicy.com/) policies, evaluated locally (Apache 2.0, no external service dependency):

```cedar
// Agent can read logs indices
permit(
 principal == Agent::"my-ai-agent",
 action in [Action::"search", Action::"get"],
 resource in Index::"logs-*"
);

// CI/CD can manage dashboards but not delete indices
permit(
 principal in Group::"cicd-service-accounts",
 action in [Action::"create", Action::"update"],
 resource in ResourceType::"dashboard"
);

// Deny all agents from accessing PII indices
forbid(
 principal in Group::"agents",
 action,
 resource in Index::"pii-*"
);
```

### F. Enterprise benefits summary

| Benefit | Without OAuth | With OAuth |
|---|---|---|
| **Security** | Shared passwords, no scoping | Per-client scoped tokens with expiry |
| **Compliance** | "admin logged in" | Full audit: who, what, when, which app |
| **Scalability** | Manual user management | Federated identity, automated token lifecycle |
| **AI readiness** | Agents can't connect securely | First-class agent support via MCP/OAuth |
| **DevOps maturity** | Manual dashboard management | GitOps, Terraform, CI/CD pipelines |
| **Multi-tenancy** | One credential for all tenants | Per-tenant scoped tokens |
| **Incident response** | Revoke password = break everything | Revoke one token, everything else works |
| **Developer experience** | Copy-paste credentials | `opensearch-auth login` → done |

### G. Success metrics

- OAuth tokens issued per month
- Third-party integrations using OAuth
- Reduction in basic auth usage
- AI agents connected via OAuth
- Community contributions and adoption

### H. AI-first development approach

Using AI coding agents to accelerate development:

**What AI accelerates vs what needs human judgment:**

| AI handles (fast) | Human reviews (critical) |
|---|---|
| HTTP proxy boilerplate | Token security model design |
| JWT parsing and validation | Scope-to-role mapping strategy |
| YAML config parsing | Edge cases: token replay, expiry race conditions |
| Unit and integration tests | Threat modeling |
| Docker/CI/CD setup | Performance under load |
| Documentation and examples | API design decisions |
| OSD React components | UX flow for consent screen |

Platform	OIDC/SSO	API Keys	Service Accounts	OAuth Apps / Scoped Tokens	Token Governance UI
Grafana	✅	✅	✅	✅	✅
Datadog	✅	✅	✅	✅	✅
Splunk	✅	✅	✅	✅	✅
Elastic	✅	✅	✅	✅	✅
OpenSearch	✅	🔄 In progress (#5443)	✅ (limited)	❌	❌

Existing Capability	What it does	What's missing
OIDC Authenticator (docs)	Validates OIDC tokens with role claims for engine auth	Engine-only; doesn't cover Dashboards APIs. Requires roles in token claims — no scope-to-role mapping. No token lifecycle management.
Service Accounts (docs)	Scoped tokens for extensions to access system indices	Built for the extensions project. Not designed for external clients, AI agents, or CI/CD pipelines.
API Keys (PR #5443)	Opaque tokens with `cluster_permissions` and `index_permissions` directly on the token	🔄 In progress for 3.7. Engine-only — doesn't cover Dashboards. Admin-only issuance in V1. No OIDC federation, no governance UI, no Cedar policies.

Agent / Framework	Integration method
Claude Code / Kiro	MCP server with OAuth token
Cursor	Custom tool with OAuth bearer token
GitHub Copilot Extensions	OAuth app authorization
Sourcegraph Cody	Context provider with OAuth
LangChain / LlamaIndex	OpenSearch tool with OAuth credentials
CrewAI / AutoGen	Custom tool with OAuth bearer

Scenario	Impact
Existing users hitting OpenSearch directly	❌ No change
Existing OSD users logging in via SAML/OIDC	❌ No change
Existing basic auth scripts	❌ No change
New agent/CI/CD wanting OAuth	✅ Point at proxy endpoint

Layer	Choice	Why
Language	Go	Fast, small binary, standard for proxies (Envoy, Traefik, CoreDNS)
OAuth/JWT	`golang-jwt/jwt` + OIDC discovery	Standard JWT validation, auto-fetch JWKS
Cedar	`cedar-go`	Local fine-grained policy evaluation (Apache 2.0)
Config	YAML	Scope-to-role mappings, upstream endpoints
Deployment	Docker container / sidecar	Runs alongside OpenSearch
Distribution	GitHub repo + Docker Hub	Standard open source

Phase	Scope	Effort (AI-first)	Effort (traditional)
Phase 0: API Keys (prerequisite)	Engine-native API Keys with direct permission scoping (PR #5443)	Owned by @cwperks — targeting 3.7	—
Phase 1: OAuth proxy MVP	Go proxy, JWT validation, scope mapping, API Key integration, CLI, Docker	2 engineers, 6 weeks	2 engineers, 1 quarter
Phase 2: OSD plugin	Token management UI, consent screen, scope admin, governance dashboard	1 engineer, 6 weeks	1 engineer, 1 quarter
Phase 3: Cedar policies	Fine-grained local policy evaluation, caching	1 engineer, 4 weeks	1-2 engineers, 1 quarter

Adoption scenario	Action required
Don't want OAuth	Nothing. Zero changes.
Self-hosted, want OAuth	Deploy proxy container. Add YAML config.
Already have a reverse proxy	Chain proxies, or embed token validation library.

Scenario	Current workaround	Risk
AI agent queries OpenSearch	Hardcoded basic auth credentials	No scoping, no revocation, credential leaks
CI/CD deploys dashboards	Shared admin password in CI secrets	Full admin access, no audit trail
Slack bot sends alerts	Webhook with embedded credentials	No identity, no scoping
Terraform manages configs	Basic auth in state files	Credentials in plaintext
Multi-tenant SaaS queries per-customer data	Separate users per tenant, manual management	Doesn't scale, error-prone
Developer tools (IDE plugins, CLI)	Copy-paste credentials	No expiry, no revocation

Component	Description
Token management	Create, revoke, list tokens in Dashboards UI
Consent screen	"Grant Agent X access to logs-*?" approval flow
Scope/role mapping admin	Visual editor for scope-to-role configuration
Config storage	Stored in OpenSearch system index, proxy reads dynamically

Benefit	Without OAuth	With OAuth
Security	Shared passwords, no scoping	Per-client scoped tokens with expiry
Compliance	"admin logged in"	Full audit: who, what, when, which app
Scalability	Manual user management	Federated identity, automated token lifecycle
AI readiness	Agents can't connect securely	First-class agent support via MCP/OAuth
DevOps maturity	Manual dashboard management	GitOps, Terraform, CI/CD pipelines
Multi-tenancy	One credential for all tenants	Per-tenant scoped tokens
Incident response	Revoke password = break everything	Revoke one token, everything else works
Developer experience	Copy-paste credentials	`opensearch-auth login` → done

AI handles (fast)	Human reviews (critical)
HTTP proxy boilerplate	Token security model design
JWT parsing and validation	Scope-to-role mapping strategy
YAML config parsing	Edge cases: token replay, expiry race conditions
Unit and integration tests	Threat modeling
Docker/CI/CD setup	Performance under load
Documentation and examples	API design decisions
OSD React components	UX flow for consent screen

[PROPOSAL]RFC: OAuth 2.0 Support for OpenSearch #491

Description

RFC: OAuth 2.0 Support for OpenSearch

Status: Proposed

What/Why

What are you proposing?

What users have asked for this feature?

Relationship to Existing Capabilities

What problems are you trying to solve?

What is the developer experience going to be?

REST API

Are there any security considerations?

Are there any breaking changes to the API?

What is the user experience going to be?

Use Case 1: AI Agents (Claude Code, Cursor, Cody, Custom Agents)

Use Case 2: Model Context Protocol (MCP)

Use Case 3: Enterprise CI/CD Pipelines

Use Case 4: Collaboration & ChatOps

Use Case 5: Multi-Tenant SaaS

Use Case 6: Observability Pipelines

Are there breaking changes to the User Experience?

Why should it be built? Any reason not to?

What will it take to execute?

Architecture

Implementation stack

Phased delivery

Assumptions and constraints

Deployment

Any remaining open questions?

References

Appendix

A. Additional use cases

IDE and Developer Tool Integration

Cross-Cluster and Federation

Ansible Playbook

Slack Alert Notification (Python)

Logstash Output with OAuth

B. What this blocks today

C. Design principles

D. Phase 2: OSD Plugin details

E. Phase 3: Cedar Policy Engine details

F. Enterprise benefits summary

G. Success metrics

H. AI-first development approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions