Skip to content

Comments

Implement Tests/HA#17

Merged
swalkerhppr merged 29 commits intomainfrom
kube-testing
Feb 22, 2026
Merged

Implement Tests/HA#17
swalkerhppr merged 29 commits intomainfrom
kube-testing

Conversation

@swalkerhppr
Copy link
Contributor

High availability and federated keystore

Summary

Adds support for running multiple Stoke replicas behind a load balancer with a shared database and federated keystore. When HA is enabled, each replica keeps signing keys in memory only (no key persistence). Peers discover each other via a static list, merge their public keys, and serve a single logical JWKS at /api/pkeys so tokens issued by any replica can be verified by any replica and by resource servers.

Goals

  • HA without DB key writes: Multiple replicas share the same claims DB; each has its own in-memory signing keys. Key persistence is disabled when cluster.enabled is true.
  • Federated JWKS: Each replica discovers peers (static list), fetches each peer's JWKS from /api/pkeys?local=true, merges with its own keys (deduplicated by kid), and serves the merged JWKS at /api/pkeys. Verification uses this merged set so tokens from any replica are accepted.
  • Join/leave at runtime: Discovery is refreshed periodically (cluster.refresh_sec, default 30s); updating static_peers and restarting (or config reload) reflects new or removed replicas.
  • Documented path: HA requirements (shared Postgres/MySQL, no SQLite), configuration, and Helm scaling are documented.

Implemented changes

Configuration

  • internal/cfg/cluster.go – New cluster config: enabled, discovery (e.g. "static"), static_peers (base URLs), refresh_sec (default 30), instance_id (optional, for distinct kids in merged JWKS). Cluster is attached to context via WithContext; ClusterFromContext(ctx) available where the issuer is built.
  • internal/cfg/config.goConfig extended with Cluster; cluster wired into WithContext.
  • internal/cfg/tokens.go – When ClusterFromContext(ctx).Enabled is true: PersistKeys is forced to false, and the issuer is wrapped in key.NewFederatedTokenIssuer(...) so the token issuer in context is the federated one.

Discovery and merge

  • internal/cluster/discovery.goDiscoverer interface: Peers(ctx) ([]string, error).
  • internal/cluster/static.goStaticDiscoverer implementing Discoverer from cluster.static_peers. Tests in static_test.go.
  • internal/cluster/federated.goMergeJWKS(localJWKS, peerURLs, httpClient) parses local JWKS, GETs each peer at {base}/api/pkeys?local=true, merges keys by kid, and returns combined JWKSet JSON. Peer fetch failures skip that peer; merge still succeeds. Tests in federated_test.go.

Federated issuer

  • internal/key/federated_issuer.goFederatedTokenIssuer wraps a TokenIssuer: delegates IssueToken, RefreshToken, WithContext to inner; PublicKeys(ctx) returns merged JWKS (cached until merged JWKS exp minus a short buffer, or RefreshSec). ParseClaims uses the merged key set so tokens from any replica are valid. When LocalKeysOnly(ctx) is set (e.g. request with ?local=true), returns only inner keys to avoid recursion when peers fetch our keys. Tests in federated_issuer_test.go.

Web

  • internal/web/ctx.go – When handling /api/pkeys, requests with ?local=true get key.WithLocalKeysOnly(ctx) so peer fetches receive only that replica's keys.

Helm

  • helm/values.yamlserver.replicaCount: 1.
  • helm/templates/stoke-server-deployment.yamlreplicas: {{ .Values.server.replicaCount | default 1 }}.
  • helm/README.md – HA section: use shared DB and cluster.enabled + cluster.static_peers; link to docs/high-availability.md.

Documentation

  • docs/high-availability.md – Requirements (shared Postgres/MySQL, no key persistence in HA), how to enable HA (cluster config, static_peers, instance_id), behaviour (issuance, verification, DB), config reference table, and pointer to Helm/values.

Testing

  • test/e2e/specs/ha.spec.ts – E2E: GET /api/pkeys returns 200 and valid JWKS; with HA profile, replica 2's /api/pkeys returns merged JWKS with at least two keys. HA run uses docker-compose.e2e-ha.yaml and configs config-ha-replica1.yaml / config-ha-replica2.yaml.

Configuration example

cluster:
enabled: true
discovery: static
static_peers:
- https://stoke-0:8080
- https://stoke-1:8080
refresh_sec: 30
instance_id: "stoke-0" # optional; keeps kids distinct in merged JWKS

swalkerhppr and others added 28 commits March 30, 2025 17:26
…onment variable to point to server. Add task to manage local kubernetes tests. Update server-config-map, stoke-server-deployment to fix references. Have sqlite database be in memory by default.
… Use local version of container. Update admin interface to try to remove deprecation warning. Add db init config map to helm chart.
…elm values.yaml with new user and group configurations; create .env.production for admin UI
…d introduce Helm chart documentation. Enhance admin UI with base admin path support and fix related capabilities handling. Add integration tests for capabilities endpoint.
…pdate Dockerfile to streamline build steps and adjust paths for assets. Modify .gitignore to exclude new build directories. Introduce base path configuration for improved proxy support and update related API responses. Adjust tests to reflect changes in capabilities and available providers endpoints.
Co-authored-by: Cursor <cursoragent@cursor.com>
…umentation

- Added Playwright E2E tests in `test/e2e/` to validate admin UI and API against a running Stoke server.
- Updated `.tasks.sh` to include an `e2e` subcommand for running Playwright tests.
- Enhanced `README.md` and `playwright.md` with instructions on running E2E tests.
- Created new test specifications for admin UI and login API.
- Documented user stories and test coverage in `test/stories/`.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…@US-010

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Updated `.tasks.sh` to improve E2E test execution, including options for starting/stopping the Stoke server.
- Added new configuration files for E2E testing, including `configs/config.yaml` and `dbinit.yaml`.
- Enhanced documentation in `playwright.md` to clarify server startup options and test execution.
- Updated test specifications to reflect changes in user credentials and navigation elements.
- Introduced new helper functions for managing E2E database initialization.

Co-authored-by: Cursor <cursoragent@cursor.com>
…Name, etc.

Co-authored-by: Cursor <cursoragent@cursor.com>
…Name, etc.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
docs: add High Availability documentation and update README

- Removed the High Availability bullet point from the Road Map in README.md.
- Added a new document on High Availability detailing requirements, configuration, and behavior for multi-replica setups.
- Updated helm/README.md to reference the new High Availability documentation.
- Set default replica count in helm/values.yaml to 1 and updated deployment template to use this value.
- Enhanced context handling in cluster configuration to ensure proper defaults for refresh intervals.
- Added support for High Availability (HA) in the E2E testing framework, allowing for multiple Stoke replicas to serve a merged JWKS.
- Introduced new Docker Compose configurations for HA setups, including two replicas with a shared Postgres database.
- Updated the `.tasks.sh` script to include an option for running E2E tests with HA.
- Enhanced the configuration structure to support instance IDs for replicas, ensuring unique key IDs in merged JWKS.
- Updated documentation to reflect new HA features and usage instructions for E2E tests.
- Added tests to verify that tokens issued by one replica can be validated using the merged JWKS from another replica.
Implement HA with federated public key stores.
@swalkerhppr swalkerhppr marked this pull request as draft February 22, 2026 23:06
@swalkerhppr swalkerhppr marked this pull request as ready for review February 22, 2026 23:06
@swalkerhppr swalkerhppr self-assigned this Feb 22, 2026
@swalkerhppr swalkerhppr merged commit dc2211e into main Feb 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant