Skip to content

[PROPOSAL]RFC: OAuth 2.0 Support for OpenSearch #491

@seraphjiang

Description

@seraphjiang

RFC: OAuth 2.0 Support for OpenSearch

Status: Proposed


What/Why

What are you proposing?

Add OAuth 2.0 support to OpenSearch via an authentication proxy, enabling secure machine-to-machine access, scoped API tokens, and third-party integrations. The proxy validates OAuth tokens (JWT), maps scopes to OpenSearch security roles, and forwards requests to the engine and Dashboards — with zero changes to existing components. This is the foundational layer that unlocks AI agent access, CI/CD automation, collaboration tool integration, and multi-tenant SaaS patterns for OpenSearch.

What users have asked for this feature?

OpenSearch has foundational auth primitives but lacks the developer experience layer that competitors offer for programmatic and machine-to-machine access:

Platform OIDC/SSO API Keys Service Accounts OAuth Apps / Scoped Tokens Token Governance UI
Grafana
Datadog
Splunk
Elastic
OpenSearch 🔄 In progress (#5443) ✅ (limited)

OpenSearch already supports OIDC authentication and has a Service Account primitive (originally built for the extensions project). API Keys with direct permission scoping are in active development targeting 3.7. The gap is not in authentication primitives — it's in the developer experience layer: OAuth app registration, scoped token issuance via standard flows, unified auth across engine and Dashboards, and token governance at enterprise scale.

Relationship to Existing Capabilities

This RFC builds on — not replaces — existing OpenSearch security features:

Existing Capability What it does What's missing
OIDC Authenticator (docs) Validates OIDC tokens with role claims for engine auth Engine-only; doesn't cover Dashboards APIs. Requires roles in token claims — no scope-to-role mapping. No token lifecycle management.
Service Accounts (docs) Scoped tokens for extensions to access system indices Built for the extensions project. Not designed for external clients, AI agents, or CI/CD pipelines.
API Keys (PR #5443) Opaque tokens with cluster_permissions and index_permissions directly on the token 🔄 In progress for 3.7. Engine-only — doesn't cover Dashboards. Admin-only issuance in V1. No OIDC federation, no governance UI, no Cedar policies.

The OAuth proxy is the layer that connects these primitives to the outside world — it federates external OIDC tokens, provides unified auth across engine and Dashboards, and adds the governance/management layer that enterprises need. With API Keys landing in 3.7, the proxy becomes thinner: instead of mapping to pre-created backend users, it can programmatically issue API Keys via /_plugins/_security/api/apitokens with the exact permissions needed.

Community signals:

What problems are you trying to solve?

  1. When building an AI agent that queries logs, a developer wants to grant the agent read-only access to specific indices with an expiring token, so they don't have to share admin credentials that could be leaked or misused.

  2. When deploying dashboards across environments, a DevOps engineer wants to authenticate CI/CD pipelines with scoped service accounts, so they don't have to store admin passwords in CI secrets with full cluster access.

  3. When integrating OpenSearch alerts with Slack, a platform engineer wants to use standard OAuth to connect services, so they don't have to build fragile webhook workarounds with embedded credentials.

  4. When building a multi-tenant SaaS application, a backend developer wants to issue per-customer scoped tokens, so they can isolate tenant data access and revoke individual customers without affecting others.

  5. When querying OpenSearch from an IDE or CLI, a developer wants to authenticate once via browser-based OAuth flow and get an auto-expiring token, so they don't have to copy-paste credentials that never expire.

  6. When forwarding logs via Fluent Bit or OpenTelemetry, an infrastructure engineer wants to give each pipeline a write-only token scoped to its target index, so they limit blast radius if a pipeline credential is compromised.

  7. When managing multiple OpenSearch clusters, a platform team wants to use one identity provider with different scoped tokens per cluster, so they don't have to manage separate credentials for each environment.

What is the developer experience going to be?

REST API

The OAuth proxy introduces the following endpoints:

Token management:

POST /oauth/token              — Issue a new token (client credentials flow)
DELETE /oauth/token/{token_id} — Revoke a token
GET /oauth/tokens              — List active tokens
GET /oauth/token/{token_id}    — Get token details

Token issuance example:

# Request a scoped token
curl -X POST https://opensearch.example.com:8443/oauth/token \
  -d "grant_type=client_credentials" \
  -d "client_id=my-agent" \
  -d "client_secret=..." \
  -d "scope=read:logs-*"

# Response
{
  "access_token": "eyJhbGciOi...",
  "token_type": "Bearer",
  "expires_in": 3600,
  "scope": "read:logs-*"
}

Using the token:

# Search via OAuth proxy — token scoped to read:logs-* only
curl -H "Authorization: Bearer eyJhbGciOi..." \
  "https://opensearch.example.com:8443/logs-*/_search" \
  -d '{"query": {"match": {"level": "error"}}}'

CLI tool:

# Browser-based login (authorization code flow)
$ opensearch-auth login --provider keycloak
🌐 Opening browser for authentication...
✅ Authenticated as developer@example.com
   Token stored in ~/.opensearch/token (expires in 8h)
   Scopes: read:logs-*, read:metrics-*

# Create a service account token (client credentials)
$ opensearch-auth create-token --scopes "read:logs-*" --expires 24h
✅ Token created: tok_abc123 (expires 2024-01-16T10:00:00Z)

# Revoke a token
$ opensearch-auth revoke-token --token-id tok_abc123
✅ Token revoked

# Check status
$ opensearch-auth status
✅ Authenticated as developer@example.com
   Provider: keycloak | Expires: 6h remaining
   Scopes: read:logs-*, read:metrics-*

Impact to existing APIs: None. All existing OpenSearch and Dashboards APIs continue to work unchanged. The proxy is an additive component — clients that don't use OAuth bypass it entirely.

Configuration (YAML):

# opensearch-oauth-proxy.yaml
upstream:
  engine: https://opensearch:9200
  dashboards: https://opensearch-dashboards:5601

providers:
  - name: keycloak
    issuer: https://keycloak.example.com/realms/opensearch
    jwks_uri: auto  # auto-discover from issuer
  - name: auth0
    issuer: https://mycompany.auth0.com
    jwks_uri: auto

scope_mapping:
  "read:logs-*":
    backend_user: agent-logs-reader
    backend_roles: [logs_read_access]
  "write:dashboards":
    backend_user: agent-dashboard-writer
    backend_roles: [dashboard_write_access]
  "admin":
    backend_user: agent-admin
    backend_roles: [all_access]

listen: :8443
tls:
  cert: /path/to/cert.pem
  key: /path/to/key.pem

Are there any security considerations?

Yes — this feature is fundamentally about security:

  1. Token security — JWT tokens are signed by the OIDC provider and validated by the proxy via JWKS. Tokens have expiry, scopes, and can be individually revoked.
  2. Scope enforcement — OAuth scopes are mapped to OpenSearch security roles. The proxy never grants more access than the mapped role allows.
  3. No bypass — clients going through the proxy cannot access the engine directly (network policy enforced). Clients not using OAuth continue to authenticate directly with existing methods.
  4. Audit trail — every proxied request is logged with: client_id, user_id (if delegated), scopes, action, target index, timestamp.
  5. Integration with security plugin — the proxy maps tokens to existing security plugin users/roles. FGAC (field-level, document-level security) and workspace ACL continue to apply as additive restrictions.
  6. Token storage — tokens are stored in an OpenSearch system index (.oauth-tokens) with encryption at rest.
graph TB
    subgraph "Authorization Layers (each narrows access)"
        A["OAuth Scope<br/>Token can access logs-*"] --> B["FGAC (Security Plugin)<br/>User can read logs-*, field masking on PII"]
        B --> C["Workspace ACL<br/>User sees 'observability' workspace only"]
        C --> D["✅ Final: Read logs-*, PII masked,<br/>observability workspace only"]
    end

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e9
Loading

Are there any breaking changes to the API?

No. Zero breaking changes.

  • All existing OpenSearch REST APIs remain unchanged
  • All existing Dashboards APIs remain unchanged
  • All existing authentication methods (basic auth, SAML, OIDC) continue to work
  • The OAuth proxy is a new, optional component — it does not modify or replace any existing functionality
  • Clients that don't use OAuth are completely unaffected

What is the user experience going to be?

Use Case 1: AI Agents (Claude Code, Cursor, Cody, Custom Agents)

AI coding agents need to query OpenSearch for log analysis, search, and observability. OAuth provides scoped, auditable, revocable access.

Example: LangChain agent with OAuth

from langchain.agents import Tool
from opensearchpy import OpenSearch
import requests

# Get scoped OAuth token
token_response = requests.post("https://keycloak.example.com/token", data={
    "grant_type": "client_credentials",
    "client_id": "langchain-agent",
    "client_secret": "...",
    "scope": "read:logs-*"
})
token = token_response.json()["access_token"]

# Connect to OpenSearch via OAuth proxy
client = OpenSearch(
    hosts=[{"host": "opensearch.example.com", "port": 8443}],
    headers={"Authorization": f"Bearer {token}"},
    use_ssl=True
)

# Agent can search logs but CANNOT delete indices or access other data
def search_logs(query: str) -> str:
    results = client.search(index="logs-*", body={
        "query": {"query_string": {"query": query}},
        "size": 10
    })
    return str(results["hits"]["hits"])

tools = [Tool(name="search_logs", func=search_logs, description="Search OpenSearch logs")]

Example: Claude Code / Cursor MCP config

{
  "mcpServers": {
    "opensearch": {
      "command": "opensearch-mcp-server",
      "env": {
        "OPENSEARCH_URL": "https://opensearch.example.com:8443",
        "OPENSEARCH_OAUTH_TOKEN": "eyJhbGciOi..."
      }
    }
  }
}

Sequence diagram:

sequenceDiagram
    participant Dev as Developer
    participant Agent as AI Agent<br/>(Claude Code)
    participant MCP as OpenSearch<br/>MCP Server
    participant Proxy as OAuth Proxy
    participant OIDC as OIDC Provider<br/>(Keycloak)
    participant OS as OpenSearch<br/>Engine

    Dev->>Agent: "Find error patterns in logs"
    Agent->>MCP: search(index="logs-*", query="level:error")
    
    Note over MCP,OIDC: First request — get token
    MCP->>OIDC: POST /token (client_credentials, scope=read:logs-*)
    OIDC-->>MCP: access_token (JWT, expires 1h)
    
    MCP->>Proxy: GET /logs-*/_search<br/>Authorization: Bearer <token>
    Proxy->>OIDC: Fetch JWKS (cached)
    Proxy->>Proxy: Validate JWT + extract scopes
    Proxy->>Proxy: Map scope "read:logs-*" → role "logs_reader"
    Proxy->>OS: GET /logs-*/_search<br/>(as internal user "logs_reader")
    OS-->>Proxy: Search results (247 hits)
    Proxy-->>MCP: Search results
    MCP-->>Agent: Formatted results
    Agent-->>Dev: "Found 247 errors. Top services:<br/>payment-api (102), auth-service (89)..."
Loading

Compatible agents:

Agent / Framework Integration method
Claude Code / Kiro MCP server with OAuth token
Cursor Custom tool with OAuth bearer token
GitHub Copilot Extensions OAuth app authorization
Sourcegraph Cody Context provider with OAuth
LangChain / LlamaIndex OpenSearch tool with OAuth credentials
CrewAI / AutoGen Custom tool with OAuth bearer

Use Case 2: Model Context Protocol (MCP)

Example: OpenSearch MCP server (Python)

from mcp.server import Server
from opensearchpy import OpenSearch

app = Server("opensearch-mcp")

def get_client(oauth_token: str) -> OpenSearch:
    return OpenSearch(
        hosts=[{"host": "opensearch.example.com", "port": 8443}],
        headers={"Authorization": f"Bearer {oauth_token}"},
        use_ssl=True
    )

@app.tool()
async def search(index: str, query: str, size: int = 10) -> str:
    """Search an OpenSearch index with a query string."""
    client = get_client(app.context.oauth_token)
    results = client.search(index=index, body={
        "query": {"query_string": {"query": query}}, "size": size
    })
    return "\n".join(f"[{h['_index']}] {h['_source']}" for h in results["hits"]["hits"])

@app.tool()
async def get_mappings(index: str) -> str:
    """Get the field mappings for an index."""
    client = get_client(app.context.oauth_token)
    return str(client.indices.get_mapping(index=index))

@app.tool()
async def list_indices(pattern: str = "*") -> str:
    """List available indices matching a pattern."""
    client = get_client(app.context.oauth_token)
    return str(client.cat.indices(index=pattern, format="json"))

Use Case 3: Enterprise CI/CD Pipelines

Example: GitHub Actions workflow

name: Deploy OpenSearch Dashboards
on:
  push:
    branches: [main]
    paths: ['dashboards/**']

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Get OAuth token
        id: auth
        run: |
          TOKEN=$(curl -s -X POST ${{ secrets.OAUTH_TOKEN_URL }} \
            -d "grant_type=client_credentials" \
            -d "client_id=${{ secrets.OAUTH_CLIENT_ID }}" \
            -d "client_secret=${{ secrets.OAUTH_CLIENT_SECRET }}" \
            -d "scope=write:dashboards" | jq -r '.access_token')
          echo "token=$TOKEN" >> $GITHUB_OUTPUT

      - name: Deploy to staging
        run: |
          curl -X POST "https://staging-opensearch.example.com:8443/api/saved_objects/_import" \
            -H "Authorization: Bearer ${{ steps.auth.outputs.token }}" \
            -H "osd-xsrf: true" \
            --form file=@dashboards/web-monitoring.ndjson

      - name: Deploy to production
        if: success()
        run: |
          curl -X POST "https://prod-opensearch.example.com:8443/api/saved_objects/_import" \
            -H "Authorization: Bearer ${{ steps.auth.outputs.token }}" \
            -H "osd-xsrf: true" \
            --form file=@dashboards/web-monitoring.ndjson

Example: Terraform

provider "opensearch" {
  url         = "https://opensearch.example.com:8443"
  oauth_token = var.opensearch_oauth_token  # scoped: write:dashboards
}

resource "opensearch_index_template" "logs" {
  name = "logs-template"
  body = jsonencode({
    index_patterns = ["logs-*"]
    template = { settings = { number_of_shards = 3 } }
  })
}

Sequence diagram:

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub
    participant CI as GitHub Actions
    participant Proxy as OAuth Proxy
    participant OIDC as OIDC Provider
    participant Stage as Staging OpenSearch
    participant Prod as Production OpenSearch

    Dev->>GH: git push (dashboards/*.ndjson)
    GH->>CI: Trigger workflow
    CI->>OIDC: POST /token (scope=write:dashboards)
    OIDC-->>CI: access_token
    CI->>Proxy: POST /api/saved_objects/_import (staging)
    Proxy->>Stage: Import dashboard
    Stage-->>CI: 200 OK ✅
    CI->>Proxy: POST /api/saved_objects/_import (prod)
    Proxy->>Prod: Import dashboard
    Prod-->>CI: 200 OK ✅
    CI->>Dev: Slack: "Dashboard deployed to prod"
Loading

Use Case 4: Collaboration & ChatOps

Example: Slack bot (Node.js)

const { App } = require('@slack/bolt');
const { Client } = require('@opensearch-project/opensearch');

const opensearch = new Client({
  node: 'https://opensearch.example.com:8443',
  auth: { bearer: process.env.OPENSEARCH_OAUTH_TOKEN } // scoped: read:logs-*
});

const app = new App({ token: process.env.SLACK_BOT_TOKEN, signingSecret: process.env.SLACK_SIGNING_SECRET });

// /opensearch query "level:error AND service:payment-api"
app.command('/opensearch', async ({ command, ack, respond }) => {
  await ack();
  const query = command.text.replace('query ', '');
  const results = await opensearch.search({
    index: 'logs-*',
    body: { query: { query_string: { query } }, size: 5 }
  });
  const hits = results.body.hits.hits;
  await respond({
    blocks: [{
      type: 'section',
      text: { type: 'mrkdwn', text: `*Found ${results.body.hits.total.value} results:*` }
    },
    ...hits.map(h => ({
      type: 'section',
      text: { type: 'mrkdwn', text: `\`${h._source.timestamp}\` [${h._source.level}] ${h._source.message}` }
    }))]
  });
});

Use Case 5: Multi-Tenant SaaS

Example: Per-tenant scoped tokens (Python/FastAPI)

from fastapi import FastAPI, Request
from opensearchpy import OpenSearch
import requests

app = FastAPI()

def get_tenant_client(tenant_id: str) -> OpenSearch:
    token = requests.post("https://oauth-proxy.internal:8443/oauth/token", data={
        "grant_type": "client_credentials",
        "client_id": "saas-backend",
        "client_secret": "...",
        "scope": f"read:tenant-{tenant_id}-*"
    }).json()["access_token"]
    return OpenSearch(
        hosts=[{"host": "opensearch.internal", "port": 8443}],
        headers={"Authorization": f"Bearer {token}"}, use_ssl=True
    )

@app.get("/api/search")
async def search(query: str, request: Request):
    tenant_id = request.state.tenant_id
    client = get_tenant_client(tenant_id)
    # Can ONLY access tenant-specific indices
    results = client.search(index=f"tenant-{tenant_id}-*", body={
        "query": {"query_string": {"query": query}}
    })
    return results["hits"]["hits"]

Use Case 6: Observability Pipelines

Example: Fluent Bit with scoped write token

[OUTPUT]
    Name            opensearch
    Match           *
    Host            opensearch.example.com
    Port            8443
    TLS             On
    Header          Authorization Bearer eyJhbGciOi...
    Index           app-logs
    # Token scope: write:app-logs-* — cannot read, cannot access other indices

Example: OpenTelemetry Collector

exporters:
  opensearch:
    http:
      endpoint: "https://opensearch.example.com:8443"
      headers:
        Authorization: "Bearer ${OPENSEARCH_OAUTH_TOKEN}"
    traces_index: "otel-traces"
    logs_index: "otel-logs"
    # Token scope: write:otel-* — isolated from application indices

Before vs after:

Before: Every pipeline uses admin basic auth → full access to everything
After:  Fluent Bit  → write:app-logs-*    (can only write app logs)
        OTel        → write:otel-*        (can only write traces/metrics)
        Logstash    → write:logstash-*    (can only write its indices)
        Each pipeline is isolated. Compromised Fluent Bit can't read OTel data.

Are there breaking changes to the User Experience?

No. The OAuth proxy is entirely opt-in:

Scenario Impact
Existing users hitting OpenSearch directly ❌ No change
Existing OSD users logging in via SAML/OIDC ❌ No change
Existing basic auth scripts ❌ No change
New agent/CI/CD wanting OAuth ✅ Point at proxy endpoint

Why should it be built? Any reason not to?

Why build it:

  1. Dashboards auth gap — The engine security plugin (including the upcoming API Keys) only covers the OpenSearch engine. OpenSearch Dashboards has its own API surface — saved objects, workspace management, UI settings, visualization export/import — that sits outside the engine's security scope. An AI agent or CI/CD pipeline that needs to import dashboards, manage workspaces, or interact with Dashboards APIs cannot authenticate with engine-level API Keys alone. The proxy provides a single authenticated entry point for the full platform. This gap widens as Dashboards evolves into a richer application layer.
  2. Token governance at enterprise scale — Engine API Keys (PR #5443) provide the primitive to create and delete tokens, but V1 is admin-only issuance with no governance layer. Enterprises need: delegated issuance (team leads creating tokens for their scope), consent/approval workflows, centralized visibility across all tokens org-wide, automated rotation, bulk revocation per compromised client, and a full audit trail. This is critical for regulated industries (finance, healthcare, government) where "who has access to what and when was it granted" must be auditable.
  3. OIDC federation — Enterprises with existing Keycloak/Auth0/Okta deployments want to use their existing identity infrastructure to access OpenSearch without creating separate API Keys. The proxy bridges external OIDC JWTs to engine-native credentials.
  4. AI agent enablement — the AI agent ecosystem (Claude Code, Cursor, MCP, LangChain) is exploding. These tools need secure, scoped, machine-to-machine auth. Building OAuth makes OpenSearch AI-agent-ready with standard protocols.
  5. Competitive parity — Grafana, Datadog, Splunk, and Elastic all have the full stack: API keys + OAuth apps + governance UI. OpenSearch has the engine primitive coming (API Keys) but not the developer experience layer.
  6. Foundation for everything else — Slack integration, GitHub integration, CI/CD automation, multi-tenant SaaS — all converge on OAuth. Building this once unlocks all of them.

Reasons not to build:

  1. Basic auth workaround exists — users can technically use basic auth for programmatic access, though it's insecure and unscoped.
  2. Proxy adds complexity — another component to deploy and monitor. Mitigated by making it optional and lightweight.
  3. Scope-to-role mapping is coarse — without native engine support, the proxy maps tokens to pre-defined backend users. Fine-grained per-token permissions require future engine integration.

Impact if not built: OpenSearch continues to fall behind competitors in enterprise and AI agent adoption. Users choose Grafana or Elastic for programmatic workflows. The gap widens as AI agent usage grows.

What will it take to execute?

Architecture

graph TB
    subgraph Clients
        A[AI Agent<br/>Claude Code / Cursor]
        B[CI/CD Pipeline<br/>GitHub Actions / Terraform]
        C[Slack Bot]
        D[CLI / IDE Plugin]
    end

    subgraph "OAuth Proxy (Go)"
        E[JWT Validation<br/>JWKS Auto-Discovery]
        F[Scope → Role Mapping]
        G[Cedar Policy Engine<br/>Optional]
        H[API Key Lifecycle<br/>via Engine REST API]
        I[Prometheus /metrics]
    end

    subgraph "OpenSearch Engine"
        J[Engine API<br/>Search / Index / Admin]
        L[Security Plugin<br/>FGAC / Workspace ACL]
        P[API Keys<br/>PR #5443 — 3.7]
    end

    subgraph "OpenSearch Dashboards"
        K[Dashboards API<br/>Saved Objects / Workspaces / UI]
    end

    subgraph "OIDC Providers"
        M[Keycloak]
        N[Auth0 / Okta]
        O[Dex]
    end

    A -->|Bearer Token| E
    B -->|Bearer Token| E
    C -->|Bearer Token| E
    D -->|Bearer Token| E
    E --> F
    F --> G
    F -->|Mapped Credentials<br/>or API Key| J
    F -->|Mapped Credentials| K
    J --> L
    L --> P
    E -.->|JWKS Fetch| M
    E -.->|JWKS Fetch| N
    E -.->|JWKS Fetch| O
    H -->|POST /_plugins/_security/api/apitokens| P
Loading

Key change from original design: The proxy delegates token issuance to the engine's native API Keys (PR #5443) rather than maintaining its own token store. This makes the proxy stateless and thinner. The proxy's unique value is: (1) sitting in front of both engine and Dashboards, and (2) bridging external OIDC tokens to engine-native API Keys.

Implementation stack

Layer Choice Why
Language Go Fast, small binary, standard for proxies (Envoy, Traefik, CoreDNS)
OAuth/JWT golang-jwt/jwt + OIDC discovery Standard JWT validation, auto-fetch JWKS
Cedar cedar-go Local fine-grained policy evaluation (Apache 2.0)
Config YAML Scope-to-role mappings, upstream endpoints
Deployment Docker container / sidecar Runs alongside OpenSearch
Distribution GitHub repo + Docker Hub Standard open source

Phased delivery

Phase Scope Effort (AI-first) Effort (traditional)
Phase 0: API Keys (prerequisite) Engine-native API Keys with direct permission scoping (PR #5443) Owned by @cwperks — targeting 3.7
Phase 1: OAuth proxy MVP Go proxy, JWT validation, scope mapping, API Key integration, CLI, Docker 2 engineers, 6 weeks 2 engineers, 1 quarter
Phase 2: OSD plugin Token management UI, consent screen, scope admin, governance dashboard 1 engineer, 6 weeks 1 engineer, 1 quarter
Phase 3: Cedar policies Fine-grained local policy evaluation, caching 1 engineer, 4 weeks 1-2 engineers, 1 quarter

Total: ~3.5 months (AI-first) vs ~9 months (traditional) — Phase 0 runs in parallel.

Phase 0:     API Keys lands in OpenSearch 3.7 (parallel, owned by security plugin team)
Weeks 1-2:   OAuth proxy MVP (JWT validation, OIDC federation, forwarding to engine + Dashboards)
Weeks 3-4:   CLI tool + API Key lifecycle integration (create/revoke via engine API)
Weeks 5-6:   Testing, docs, Docker, open source release
Weeks 7-10:  OSD plugin (token management UI, consent screen, governance dashboard)
Weeks 11-14: Cedar integration, policy evaluation

Assumptions and constraints

  • Proxy approach chosen over engine modification — faster to ship, no engine changes, independent release cycle. The proxy is complementary to the engine's API Keys (PR #5443) — it consumes API Keys as the engine-level primitive and adds OIDC federation, Dashboards coverage, and governance on top.
  • API Keys as the engine primitive — with API Keys landing in 3.7, the proxy delegates token creation to /_plugins/_security/api/apitokens rather than maintaining its own token store. This makes the proxy stateless. Before API Keys land, the proxy falls back to mapping OAuth scopes to pre-created backend users/roles.
  • Dashboards API coverage — the engine security plugin (including API Keys) only covers the OpenSearch engine. Dashboards has its own API surface (saved objects, workspaces, UI settings) that is not covered by engine-level auth. The proxy sits in front of both, providing unified auth for the full platform.
  • OIDC provider required — users must run an OIDC-compliant provider (Keycloak, Auth0, Okta, Dex). The proxy does not include a built-in identity provider.
  • Extra network hop — the proxy adds latency (~1-5ms per request). Acceptable for most use cases; high-throughput search workloads may want to bypass the proxy for internal traffic.

Deployment

# docker-compose.yaml
services:
  opensearch:
    image: opensearchproject/opensearch:latest
    ports:
      - "9200:9200"

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:latest
    ports:
      - "5601:5601"

  oauth-proxy:
    image: opensearch-oauth-proxy:latest
    ports:
      - "8443:8443"
    volumes:
      - ./config.yaml:/etc/opensearch-oauth/config.yaml
Adoption scenario Action required
Don't want OAuth Nothing. Zero changes.
Self-hosted, want OAuth Deploy proxy container. Add YAML config.
Already have a reverse proxy Chain proxies, or embed token validation library.

Any remaining open questions?

  1. Built-in token issuer vs external-only? — Resolved: with API Keys (PR #5443) landing in the engine, the proxy delegates token issuance to the engine's /_plugins/_security/api/apitokens endpoint. No built-in issuer needed.

  2. Scope naming convention — What should the standard scope format be? Options: read:index-pattern, action:resource, role:role-name. Need community input.

  3. Token storage backend — Resolved: API Keys are stored in the engine's .opensearch_security_api_tokens system index. The proxy is stateless — no separate token store needed.

  4. WebSocket support — Dashboards uses WebSockets for real-time features. How does the proxy handle WebSocket upgrade with OAuth tokens?

  5. Rate limiting — Should the proxy include built-in rate limiting per token/client, or defer to external tools (Envoy, API Gateway)?

  6. API Keys API surface alignment — Does the API Keys REST API support filtering by created_by? The proxy needs this to manage tokens per OAuth client. Does it support custom metadata fields for audit trail (e.g., client_id, oauth_provider)?

  7. Cedar policy management UX — How do users author and test Cedar policies? CLI-only, or a visual editor in Dashboards?

  8. Multi-cluster token federation — Can a token issued for one cluster be used across multiple clusters? What's the trust model?

  9. Dashboards auth integration — How does the proxy authenticate requests to Dashboards APIs that expect session cookies? Does it need to establish a Dashboards session on behalf of the OAuth client, or can Dashboards be extended to accept bearer tokens directly?


References


Appendix

A. Additional use cases

IDE and Developer Tool Integration

VS Code extension (TypeScript):

import * as vscode from 'vscode';
import { Client } from '@opensearch-project/opensearch';

export async function activate(context: vscode.ExtensionContext) {
  const token = await context.secrets.get('opensearch-oauth-token');
  const client = new Client({
    node: 'https://opensearch.example.com:8443',
    auth: { bearer: token }
  });

  const searchCommand = vscode.commands.registerCommand('opensearch.searchLogs', async () => {
    const query = await vscode.window.showInputBox({ prompt: 'Search query' });
    const results = await client.search({
      index: 'logs-*',
      body: { query: { query_string: { query } }, size: 20 }
    });
    const panel = vscode.window.createWebviewPanel('opensearch', 'Search Results', vscode.ViewColumn.Two);
    panel.webview.html = formatResults(results.body.hits.hits);
  });
}

Cross-Cluster and Federation

Multi-cluster management (Python):

CLUSTERS = {
    "production":  {"url": "https://prod-opensearch.example.com:8443",  "scope": "admin:prod"},
    "staging":     {"url": "https://stage-opensearch.example.com:8443", "scope": "admin:staging"},
    "analytics":   {"url": "https://analytics-opensearch.example.com:8443", "scope": "read:analytics"},
}

def get_client(cluster_name: str) -> OpenSearch:
    cluster = CLUSTERS[cluster_name]
    token = requests.post("https://auth.example.com/oauth/token", data={
        "grant_type": "client_credentials",
        "client_id": "cluster-manager",
        "client_secret": "...",
        "scope": cluster["scope"]
    }).json()["access_token"]
    return OpenSearch(
        hosts=[cluster["url"]],
        headers={"Authorization": f"Bearer {token}"}, use_ssl=True
    )

for name in CLUSTERS:
    client = get_client(name)
    health = client.cluster.health()
    print(f"{name}: {health['status']} ({health['number_of_nodes']} nodes)")

Ansible Playbook

- name: Deploy OpenSearch configurations
  hosts: localhost
  vars:
    opensearch_url: "https://opensearch.example.com:8443"
    oauth_token: "{{ lookup('env', 'OPENSEARCH_OAUTH_TOKEN') }}"
  tasks:
    - name: Create index template
      uri:
        url: "{{ opensearch_url }}/_index_template/logs-template"
        method: PUT
        headers:
          Authorization: "Bearer {{ oauth_token }}"
        body_format: json
        body:
          index_patterns: ["logs-*"]
          template:
            settings:
              number_of_shards: 3
              number_of_replicas: 1

    - name: Import dashboards
      uri:
        url: "{{ opensearch_url }}/api/saved_objects/_import"
        method: POST
        headers:
          Authorization: "Bearer {{ oauth_token }}"
          osd-xsrf: "true"
        src: "dashboards/web-monitoring.ndjson"

Slack Alert Notification (Python)

def send_alert_to_slack(alert):
    slack_message = {
        "blocks": [
            {"type": "header", "text": {"type": "plain_text", "text": f"🔴 Alert: {alert['name']}"}},
            {"type": "section", "fields": [
                {"type": "mrkdwn", "text": f"*Index:* `{alert['index']}`"},
                {"type": "mrkdwn", "text": f"*Condition:* {alert['condition']}"},
                {"type": "mrkdwn", "text": f"*Current value:* {alert['value']}"},
                {"type": "mrkdwn", "text": f"*Severity:* {alert['severity']}"}
            ]},
            {"type": "actions", "elements": [
                {"type": "button", "text": {"type": "plain_text", "text": "View Dashboard"}, 
                 "url": alert["dashboard_url"]},
                {"type": "button", "text": {"type": "plain_text", "text": "Acknowledge"},
                 "action_id": "ack_alert", "value": alert["id"]},
                {"type": "button", "text": {"type": "plain_text", "text": "Mute 1h"},
                 "action_id": "mute_alert", "value": alert["id"]}
            ]}
        ]
    }
    requests.post(SLACK_WEBHOOK_URL, json=slack_message)

Logstash Output with OAuth

output {
  opensearch {
    hosts => ["https://opensearch.example.com:8443"]
    auth_type => {
      type => "bearer"
      token => "${OPENSEARCH_OAUTH_TOKEN}"
    }
    index => "logstash-%{+YYYY.MM.dd}"
  }
}

B. What this blocks today

Scenario Current workaround Risk
AI agent queries OpenSearch Hardcoded basic auth credentials No scoping, no revocation, credential leaks
CI/CD deploys dashboards Shared admin password in CI secrets Full admin access, no audit trail
Slack bot sends alerts Webhook with embedded credentials No identity, no scoping
Terraform manages configs Basic auth in state files Credentials in plaintext
Multi-tenant SaaS queries per-customer data Separate users per tenant, manual management Doesn't scale, error-prone
Developer tools (IDE plugins, CLI) Copy-paste credentials No expiry, no revocation

C. Design principles

  1. Zero breaking changes — existing auth methods continue to work. OAuth is opt-in.
  2. No engine modifications — implemented as a proxy, not a security plugin change.
  3. Works with any OIDC provider — Keycloak, Auth0, Okta, Dex, or any compliant provider.
  4. Open source — Apache 2.0 licensed, community-driven.

D. Phase 2: OSD Plugin details

Component Description
Token management Create, revoke, list tokens in Dashboards UI
Consent screen "Grant Agent X access to logs-*?" approval flow
Scope/role mapping admin Visual editor for scope-to-role configuration
Config storage Stored in OpenSearch system index, proxy reads dynamically

E. Phase 3: Cedar Policy Engine details

Replace static scope-to-role mapping with fine-grained Cedar policies, evaluated locally (Apache 2.0, no external service dependency):

// Agent can read logs indices
permit(
  principal == Agent::"my-ai-agent",
  action in [Action::"search", Action::"get"],
  resource in Index::"logs-*"
);

// CI/CD can manage dashboards but not delete indices
permit(
  principal in Group::"cicd-service-accounts",
  action in [Action::"create", Action::"update"],
  resource in ResourceType::"dashboard"
);

// Deny all agents from accessing PII indices
forbid(
  principal in Group::"agents",
  action,
  resource in Index::"pii-*"
);

F. Enterprise benefits summary

Benefit Without OAuth With OAuth
Security Shared passwords, no scoping Per-client scoped tokens with expiry
Compliance "admin logged in" Full audit: who, what, when, which app
Scalability Manual user management Federated identity, automated token lifecycle
AI readiness Agents can't connect securely First-class agent support via MCP/OAuth
DevOps maturity Manual dashboard management GitOps, Terraform, CI/CD pipelines
Multi-tenancy One credential for all tenants Per-tenant scoped tokens
Incident response Revoke password = break everything Revoke one token, everything else works
Developer experience Copy-paste credentials opensearch-auth login → done

G. Success metrics

  • OAuth tokens issued per month
  • Third-party integrations using OAuth
  • Reduction in basic auth usage
  • AI agents connected via OAuth
  • Community contributions and adoption

H. AI-first development approach

Using AI coding agents to accelerate development:

What AI accelerates vs what needs human judgment:

AI handles (fast) Human reviews (critical)
HTTP proxy boilerplate Token security model design
JWT parsing and validation Scope-to-role mapping strategy
YAML config parsing Edge cases: token replay, expiry race conditions
Unit and integration tests Threat modeling
Docker/CI/CD setup Performance under load
Documentation and examples API design decisions
OSD React components UX flow for consent screen

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions