Skip to content

[DRAFT] Add OIDC Support to Aerie-UI#1746

Open
pranav-super wants to merge 22 commits intodevelopfrom
feature/oidc-support
Open

[DRAFT] Add OIDC Support to Aerie-UI#1746
pranav-super wants to merge 22 commits intodevelopfrom
feature/oidc-support

Conversation

@pranav-super
Copy link
Copy Markdown
Contributor

@pranav-super pranav-super commented Aug 12, 2025

This is the main pull request relating to OIDC support, although there are very small accompanying pull requests on Aerie (main) and Aerie-Gateway that go with this.

OIDC Overview

OIDC refers to the addition of authentication to OAuth2.0's authorization mechanisms. When it is used here and throughout this PR, it generally entails authenticating against an Identity Provider (IdP) and using the permissions that it provides to authorize what a given user can and cannot do.

While Aerie and Aerie-UI have existing authorization mechanisms, through Aerie-UI's robust permissions.ts and Aerie's Hasura metadata, which specifies permissions for each table, there isn't much support for authenticating users against an external IdP. Such authentication against an external, likely organizational IdP, is a fairly general need, and it is common for that IdP to use the OIDC framework. As such, the focus of this PR was to add this capability, specifically following the OAuth2.0 specification.

Details on the specifics of OIDC and OAuth2.0 are left out, but can be found here. OAuth2.0, in its specification, defines a variety of "authorization flows", where the end goal for a given user after submitting their credentials is to obtain an access token with which they can access the application. This PR introduces to Aerie-UI support for Authorization Code Flow with PKCE, which seems to be the most recommended and best supported flow for our given use case. That being said, it wouldn't be unimaginably difficult to support other authorization flows, although that was not the focus of this PR and that is generally an item for discussion, below.

How Do I Run This?

This PR introduces a few new pieces to the .env puzzle. Within aerie-ui, the following have been introduced:

  • PUBLIC_AUTH_OIDC_ENABLED: must be set to true to enable OIDC functionality.
  • OIDC_WELL_KNOWN_URL: the "well-known" URL endpoint your IdP. This usually ends or includes in the URL the string .well-known.
  • OIDC_AUTHORIZATION_URL: the endpoint of the IdP where the authorization code is obtained. Typically ends in /auth.
  • OIDC_TOKEN_URL: the endpoint of the IdP where the authorization code is obtained. Typically ends in /token.
  • OIDC_LOGOUT_URL: the endpoint of the IdP where the authorization code is obtained. Typically ends in /logout.
  • OIDC_JWKS_URL: the endpoint of the IdP where the authorization code is obtained. Typically ends in /certs.
  • OIDC_SCOPES: the "scopes" that the token should provide. These are usually details about the user and can depend on the specific IdP being used. They are space separated. Generally, defaulting to some superset of openid profile email, if not just that, is sufficient.
  • OIDC_CLIENT_ID: the ID in the IdP assigned to your client/application. This could be something like "aerie".
  • OIDC_CLIENT_SECRET: this is presently not used, but can be relevant future implementations if a flow other than PKCE is used.
  • OIDC_REDIRECT_URI: the URI to redirect to after obtaining the authorization code.
  • OIDC_AUDIENCE: the audience for the token. Typically matches OIDC_CLIENT_ID.
  • OIDC_ISSUER: the base URL of the IdP for your given "realm".

Finally, if using OIDC, you will likely also need to change your HASURA_GRAPHQL_JWT_SECRET in Aerie's .env to use RS256 as its type, and replace the key with a jwk_url matching the above configuration. If running Aerie-Gateway locally instead of with the rest of Aerie, this environment variable should be updated there as well.

Our Goal Here

Generally, our goal here is to be as minimally invasive as we can. While some pretty big changes are made, we tried to make sure too much wasn't altered. For example, all OIDC functionality is sectioned off to its own folder to ensure there is as little interference with existing auth functionality as possible.

There are also things we considered but ultimately descoped. Namely, phasing out the user cookie. While a review of the code suggesting that this cookie could logically be replaced with an access token, leading to a more uniform implementation overall, doing so would require slightly more sweeping changes than we would like to introduce in this PR at this moment. Doing so for the no-auth option ultimately seems to entail mirroring that implementation to what is done in the SSO/CAM implementation. Doing so would, however, require modification to Aerie-Gateway as well to support such functionality. As such, a cohesive fix for this should be pushed to a separate PR, especially considering that there already might be other things we would like to consider as far as refactoring the existing auth code and potentially integrating it or mirroring it more closely with the OIDC functionality being introduced here. This also implies that any refactor of the CAM/SSO flow is descoped in this PR.

What is Worth Knowing When Reviewing?

OIDC authentication in Aerie-UI does not rely as heavily on Aerie-Gateway as the other options do. It uses it briefly, in token/session validation, and as such required a small update to the gateway to support JWKs-based decoding. Otherwise, however, the UI operates largely independently of the gateway in performing authentication.

We also tried to reduce the reliance on redirects and silent logouts/errors, which were present in a lot of error cases. For example, if a JWT token expires, the current behavior was that as soon as it was caught in a Hasura request or in a Hasura subscription, the user would be logged out. This led to some very difficult-to-trace errors during development, and we thought it might be more intuitive to the user that, instead of immediately logging out, we make the errors more explicit and suggest that the user should log out rather than doing it for them.

Discussion Items

A large unknown here is testing! Testing has proven to be quite difficult using typical frameworks for mock IdPs, like Dex (which doesn't seem to support users providing custom token refresh intervals, instead only allowing a default of 24 hours), and using a custom setup with a real IdP, like Keycloak, seems far too heavyweight. As such, comprehensive end-to-end testing is something that isn't yet included in this PR and we welcome any suggestions. We posted this PR without said tests just because there is a fair amount to review, so we didn't want to wait until testing was figured out to get any reviewing started, especially since if there are suggestions for additional sweeping changes, those tests would soon be made obsolete!

The idea was floated in development to introduce a (protected) folder to wrap all non-authentication-related routes. This would be purely organizational and wouldn't affect any URLs. It isn't included in this PR because it would make any future rebases extremely difficult, so that was left as a potential final change.

As mentioned before, we might want to consider supporting other authentication flows. While that is not in scope of this PR, we would like to suggest the intention here as a discussion item.

Finally, we mentioned earlier a potential refactor of the existing SSO/CAM and no-auth flows. While it isn't clear when we would do that and exactly what that would look like, we would like to make sure the interest in doing so is documented here. Such a change would also affect gateway somewhat significantly, so we would probably want to make a separate discussion post and meet about it following this PR!

WebSocket Lifecycle & Token Refresh

Important Note: Hasura validates JWT not only at connection_init but also continuously monitors token expiration, closing WebSocket connections when JWTs expire (observed in Hasura logs with error "Could not verify JWT: JWTExpired", close codes 1006/4400).

Implementation:

  • Proactive WebSocket restart: When accessToken cookie updates after a successful refresh, the WebSocket connection gracefully restarts (close code 4205) with fresh credentials, preventing Hasura from abruptly terminating it
  • Automatic token refresh: Tokens refresh ~10 seconds before expiration using Cookie Store API to monitor cookie changes
  • Hybrid auto-recovery: Subscriptions auto-recover from connection errors using two strategies:
    1. Fast path: connectionState listener for ~100ms recovery when graphql-ws auto-reconnects
    2. Fallback: 5-second timer to kick graphql-ws out of lazy mode when all subscriptions are terminated
  • Offline resilience: Token refresh retries every 5 seconds on network failure until successful
  • HMR resilience: Cookie store listeners and WebSocket client state persist across Hot Module Replacement during development

Testing Instructions

Prerequisites

  1. Set up an OIDC-compatible Identity Provider (tested with Keycloak)
  2. Configure .env with required OIDC variables (see above)
  3. Update Hasura JWT secret to use RS256 with jwk_url
  4. Ensure browser supports Cookie Store API (Chrome/Edge) for automatic token refresh

Manual Test Scenarios

Normal Operation (Happy Path)

  1. Navigate to app → should redirect to IdP login
  2. Log in with valid credentials → redirected back to app
  3. Verify user is authenticated and can access protected pages
  4. Open Browser DevTools Network tab
  5. Wait ~10 seconds → /oidc/refresh request should appear
  6. Verify WebSocket connection restarts cleanly (look for new graphql WebSocket with fresh connection_init)
  7. Verify no "Reconnecting..." banner appears during normal refresh
  8. Switch user role via dropdown → WebSocket should restart, data should reload

Offline Resilience

  1. While logged in with valid tokens, open DevTools Network tab
  2. Enable "Offline" mode in DevTools
  3. Wait for next scheduled token refresh (~10s)
  4. Observe token refresh fails in console
  5. Disable "Offline" mode (go back online)
  6. Verify token refresh retries automatically and succeeds within 5 seconds
  7. Verify WebSocket reconnects and subscriptions resume
  8. Verify no data loss or UI errors

Role Switching

  1. Log in and navigate to any page with subscription data
  2. Open Browser DevTools and watch WebSocket messages
  3. Switch to a different role using the role dropdown
  4. Verify WebSocket closes and reopens with new role in headers
  5. Verify subscriptions re-register and data updates for new role permissions

Logout Flow

  1. Click logout button
  2. Verify cookies are cleared (check Application tab in DevTools)
  3. Verify redirected to IdP logout endpoint
  4. Verify IdP redirects back to app
  5. Verify app redirects to login page
  6. Check browser console and server logs - should be no errors

Long-Running Session

  1. Log in and leave the app running for >1 hour
  2. Periodically check Network tab to verify token refresh continues
  3. Verify WebSocket connection stays healthy (restarts every ~10s with fresh tokens)
  4. Interact with the app (create/edit plans, etc.) to verify subscriptions work correctly
  5. Verify no unexpected logouts or connection errors

HMR During Development (Dev Only)

  1. Log in and navigate to a page with subscriptions
  2. Make a code change to trigger Hot Module Replacement
  3. Verify subscriptions reconnect seamlessly (no errors if tokens are fresh)
  4. If tokens have expired during idle time: verify auto-recovery kicks in and subscriptions eventually reconnect once tokens refresh

TODO:

  • Move any remnants of socket consolidation user fixes/cleanup to a branch off of develop that we merge in first
  • Sort out e2e testing wrt new changes
  • Continue testing socket connection + auth edge cases and add more comments in code to document what we are doing
  • Investigate if number prefixing on aerie db users is actually necessary
  • Deeper review of OIDC svelte server code
  • Verify existing authentication methods function normally

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots

See analysis details on SonarQube Cloud

@jmorton jmorton force-pushed the feature/oidc-support branch from 6d605b8 to 95ca8ef Compare October 10, 2025 23:42
@jmorton jmorton force-pushed the feature/oidc-support branch 2 times, most recently from 63e9fe2 to 981b01a Compare December 9, 2025 16:57
@AaronPlave
Copy link
Copy Markdown
Contributor

@jmorton @pranav-super We should tag up about this PR in relation to my new changes in #1741 where I've added socket consolidation + user store defined in a client-side store passed around via svelte context.

@AaronPlave AaronPlave self-assigned this Jan 20, 2026
Implement complete OIDC authentication flow including:
- Server-side OIDC client using Arctic library with PKCE support
- Login, callback, logout, and token refresh endpoints
- Updated hooks.server.ts to handle OIDC authentication mode
- Modified subscribable stores and effects for token management
- Request utilities updated for authenticated API calls
Migrate from passing user through PageData to using a centralized
userStore for authentication state. This change:
- Removes user parameter threading through page components
- Updates all stores to access user from centralized auth store
- Refactors route layouts to use reactive auth state
- Removes unnecessary +page.ts files that only passed user data
- Enables role changes to propagate without full page reload

# Conflicts:
#	src/components/plan/PlanMergeReview.svelte
#	src/routes/+layout.server.ts
#	src/routes/+layout.svelte
#	src/routes/constraints/+layout.svelte
#	src/routes/constraints/+page.svelte
#	src/routes/constraints/edit/[id]/+page.svelte
#	src/routes/constraints/new/+page.svelte
#	src/routes/dictionaries/+page.svelte
#	src/routes/expansion/+layout.svelte
#	src/routes/expansion/rules/+page.svelte
#	src/routes/expansion/rules/edit/[id]/+page.svelte
#	src/routes/expansion/rules/new/+page.svelte
#	src/routes/expansion/runs/+page.svelte
#	src/routes/expansion/sets/+page.svelte
#	src/routes/expansion/sets/new/+page.svelte
#	src/routes/external-sources/+layout.svelte
#	src/routes/external-sources/sources/+page.svelte
#	src/routes/external-sources/types/+page.svelte
#	src/routes/models/+layout.svelte
#	src/routes/models/+page.svelte
#	src/routes/models/[id]/+page.svelte
#	src/routes/parcels/+layout.svelte
#	src/routes/parcels/+page.svelte
#	src/routes/parcels/edit/[id]/+page.svelte
#	src/routes/parcels/new/+page.svelte
#	src/routes/plans/+page.svelte
#	src/routes/plans/[id]/+page.svelte
#	src/routes/plans/[id]/merge/+page.svelte
#	src/routes/scheduling/+layout.svelte
#	src/routes/scheduling/+page.svelte
#	src/routes/scheduling/conditions/edit/[id]/+page.svelte
#	src/routes/scheduling/conditions/new/+page.svelte
#	src/routes/scheduling/goals/edit/[id]/+page.svelte
#	src/routes/scheduling/goals/new/+page.svelte
#	src/routes/sequence-templates/+layout.svelte
#	src/routes/sequence-templates/+page.svelte
#	src/routes/tags/+page.svelte
#	src/routes/workspaces/+layout.svelte
#	src/routes/workspaces/+page.svelte
#	src/routes/workspaces/[workspaceId]/+layout@.svelte
#	src/routes/workspaces/[workspaceId]/actions/+layout@.svelte
#	src/routes/workspaces/[workspaceId]/actions/runs/[runId]/+layout@.svelte
#	src/routes/workspaces/[workspaceId]/actions/runs/[runId]/+page.svelte
#	src/stores/sequencing.ts
#	src/stores/tags.ts
Create rule.ts module for enforcing authentication requirements
at the route level, enabling consistent access control across
the application.
Implement Playwright tests for OIDC authentication including:
- OIDC fixture for handling auth flows in tests
- Login/logout test scenarios
- Token refresh verification
- Updated helpers and AppNav fixture for OIDC support
Align legacy auth routes (login, logout, changeRole) with OIDC
implementation by using SvelteKit's event-based cookie API
instead of manual header manipulation. Reduces code duplication
and improves consistency.
The Client singleton was initiating a fetch for the well-known
configuration but not awaiting it, causing endpoint values to be
undefined when the constructor completed before the fetch.

Changes:
- Convert Client.instance to async getter returning Promise<Client>
- Move initialization to async init() method that awaits well-known fetch
- Update all call sites to await Client.instance
- Uncomment OIDC_CLIENT_SECRET env var to fix type error
jmorton and others added 15 commits February 12, 2026 08:30
The nonce parameter binds the ID token to a specific authentication
request, preventing attackers from reusing previously issued tokens.

Changes:
- Add generateNonce() function using crypto.randomBytes
- Include nonce in authorization URL via createAuthorizationURLWithPKCE()
- Store nonce in httpOnly cookie during login
- Add verifyNonce() to validate ID token nonce claim on callback
- Update callback check() to require nonce presence
Validate the 'aud' claim in tokens to prevent confused deputy attacks
where tokens issued for other applications could be accepted.

Set OIDC_AUDIENCE env var to enable validation (recommended for production).
- Split JWT verification options: BASE_VERIFY_OPTS (no audience) for
  access tokens, ID_TOKEN_VERIFY_OPTS (with audience) for ID tokens
- Access tokens are treated as opaque per OIDC spec - audience
  validation is the resource server's responsibility
- ID tokens require audience validation to prevent confused deputy attacks
- Also fixes secure cookie flag for local development (secure: !dev)
Validate the `back` query parameter in the OIDC login route to prevent
open redirect attacks. Only allow relative paths starting with '/'
but not '//' (protocol-relative URLs).

Rejected examples: 'https://evil.com', '//evil.com', 'javascript:alert(1)'
Ensures auth cookies are only transmitted over HTTPS in production.
Uses `secure: !dev` pattern to allow HTTP in local development.

Updated files:
- oidc.ts: accessToken, idToken, refreshToken
- hooks.server.ts: activeRole (OIDC and SSO handlers)
- oidc/login/+page.server.ts: back cookie
- auth/login/+server.ts: activeRole, user
- auth/changeRole/+server.ts: activeRole
Using 'lax' instead of 'strict' because:
- OIDC flow requires cookies to persist across the redirect back from Keycloak
- 'lax' allows cookies on top-level GET navigations (safe for redirects)
- 'lax' still blocks cross-site POST requests (CSRF protection)

Note: SSO handler keeps sameSite='none' for cross-site SSO compatibility.
Allows operators to specify supported JWT signing algorithms via
space-separated OIDC_ALGORITHMS environment variable (e.g., "RS256 RS384").
Defaults to RS256, the most common algorithm for OIDC providers.
- Remove token values from user registration logs (only log username)
- Remove event object from cookie change log (contained token values)
- Remove token from error messages in callback
- Log only error messages, not full error objects that may contain tokens
- Downgrade verbose logs to console.debug
- Change informational logs from console.log to console.debug for consistency
- Remove response body from refresh error (may contain sensitive details)
- Only log error messages, not full error objects in scheduled refresh
- Remove timeout ID from log output (not useful for debugging)
- Improve log message clarity ("Scheduling token refresh" vs "Delay changed")
Adds CSP and other security headers to all responses:
- Content-Security-Policy-Report-Only: monitors violations without blocking
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- Referrer-Policy: strict-origin-when-cross-origin

CSP directives configured for:
- Scripts: self + unsafe-inline + unsafe-eval (Monaco editor requires these)
- Styles: self + unsafe-inline (Svelte scoped styles)
- Connect: self + Hasura + Gateway + Action + Workspace URLs
- Workers: self + blob (Monaco editor)
- Images/Fonts: self + data + blob

Change header to 'Content-Security-Policy' to enforce after testing.
Add verifyIdToken() function to validate ID tokens with full OIDC-compliant
checks (signature, issuer, expiration, audience) before passing to IdP.

If verification fails during logout, proceed without the id_token_hint
parameter rather than sending an unverified token.
Add environment variables to configure JWT claim namespace and paths:
- OIDC_CLAIMS_NAMESPACE (server) / PUBLIC_OIDC_CLAIMS_NAMESPACE (client)
- OIDC_CLAIMS_USER_ID / PUBLIC_OIDC_CLAIMS_USER_ID
- OIDC_CLAIMS_ALLOWED_ROLES / PUBLIC_OIDC_CLAIMS_ALLOWED_ROLES
- OIDC_CLAIMS_DEFAULT_ROLE / PUBLIC_OIDC_CLAIMS_DEFAULT_ROLE

Defaults to Hasura's standard claim structure:
  https://hasura.io/jwt/claims -> x-hasura-user-id, x-hasura-allowed-roles, x-hasura-default-role

Add extractClaims() helper functions in both oidc.ts (server) and auth.ts (client)
to centralize claim extraction with proper validation.

IMPORTANT: These settings must match:
- Hasura's HASURA_GRAPHQL_JWT_SECRET claims_map
- Aerie Gateway's JWT parsing logic
- Your IdP's token mapper configuration
Key changes:
- Add proactive WebSocket restart on token refresh (Hasura monitors JWT expiration and kills connections)
- Implement hybrid auto-recovery for subscription errors (connectionState listener + fallback timer)
- Add token refresh retry on failure (offline resilience)
- Fix HMR resilience for cookie store listeners
- Prevent token refresh on login page
- Fix logout to use window.location.href instead of goto() for server-only routes
- Remove noisy auth error logging during normal logout flow
- Add getCookieValue utility for reading cookies
- Return 401 from /oidc/refresh instead of throwing on expired tokens
- Catch refresh failures on page load, clear cookies, redirect to login
- Fix delay=0 being falsy so expired tokens trigger immediate refresh
- On 401, log out immediately instead of retrying
- Send raw id_token as logout hint (IdPs accept expired tokens for this)
- Handle missing id_token in multi-tab logout scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants