diff --git a/fastmcp-migration/execution-strategy-research/00-migration-execution-strategy.md b/fastmcp-migration/execution-strategy-research/00-migration-execution-strategy.md index d6daa3e..36ab7a0 100644 --- a/fastmcp-migration/execution-strategy-research/00-migration-execution-strategy.md +++ b/fastmcp-migration/execution-strategy-research/00-migration-execution-strategy.md @@ -1,9 +1,17 @@ - # Banksy xmcp-to-FastMCP Migration +## How to use this roadmap + +This is a **living document**. It reflects the intended final state of the migration and should be updated in-place as implementation reveals better approaches, new constraints, or scope changes. + +- When deviating from the roadmap during implementation, **update the roadmap first** before proceeding. It should always describe what we're actually building, not what we originally thought we'd build. +- Mark revisions inline with a brief **`Revised:`** annotation so readers can tell what changed and why (e.g. *"Revised: switched from X to Y because Z"*). Don't silently overwrite — the revision trail is useful context. +- Implementation is driven by **phase-specific `.plan.md` files** created from this roadmap when starting each phase. The roadmap defines what to build; phase plans define how to execute each chunk. +- Research documents linked in the Deep Research Index are **not revised** — they capture point-in-time analysis. If a research finding turns out to be wrong, note it in the roadmap rather than editing the research doc. + ## Summary -Rewrite banksy from a 3-process TypeScript/xmcp architecture to a Python/FastMCP server. `BANKSY_MODE` (internal/public/dev) selects the auth provider and tool set at runtime — one Docker image, multiple deployments. Two `FastMCP.from_openapi()` calls replace both `banksy-mural-api` (internal API, 39 tools) and `banksy-public-api` (Public API, 87 tools) code-gen pipelines. Auth uses FastMCP's built-in OAuth with Google as the initial IdP for Layer 1 (IDE to banksy) plus custom Python for Layer 2 (banksy to Mural API) token management. Database is a fresh PostgreSQL schema (no data migration). A React SPA is preserved for browser-facing pages (home, Session Activation, error) and served from the same process via Starlette's `StaticFiles`. +Rewrite banksy from a 3-process TypeScript/xmcp architecture to a Python/FastMCP server. `AUTH_MODE` (sso-proxy/mural-oauth/dev) selects the auth provider and tool set at runtime — one Docker image, multiple deployments. Two `FastMCP.from_openapi()` calls replace both `banksy-mural-api` (internal API, 39 tools) and `banksy-public-api` (Public API, 87 tools) code-gen pipelines. Auth uses FastMCP's built-in OAuth with Google as the initial IdP for Layer 1 (IDE to banksy) plus custom Python for Layer 2 (banksy to Mural API) token management. Database is a fresh PostgreSQL schema (no data migration). A React SPA is preserved for browser-facing pages (home, Session Activation, error) and served from the same process via Starlette's `StaticFiles`. The repo uses a uv workspace structure under `pypackages/` — only `banksy-server` is created now. The workspace is ready to expand with `banksy-shared` (extracted shared code) and `banksy-harness` (agent orchestration) when those consumers are needed. Existing TS code in `packages/` stays as read-only reference until the final cleanup removes all TypeScript artifacts. @@ -19,9 +27,9 @@ graph TD Core -.->|"code-gen at build time"| PublicAPI end - subgraph after ["Target (FastMCP, 1 image, BANKSY_MODE per deploy)"] - ClientInt["LLM Client"] -->|"MCP HTTP"| InternalDeploy["banksy BANKSY_MODE=internal"] - ClientPub["LLM Client"] -->|"MCP HTTP"| PublicDeploy["banksy BANKSY_MODE=public"] + subgraph after ["Target (FastMCP, 1 image, AUTH_MODE per deploy)"] + ClientInt["LLM Client"] -->|"MCP HTTP"| InternalDeploy["banksy AUTH_MODE=sso-proxy"] + ClientPub["LLM Client"] -->|"MCP HTTP"| PublicDeploy["banksy AUTH_MODE=mural-oauth"] InternalDeploy -->|REST| MURAL2I["Mural API (internal)"] PublicDeploy -->|REST| MURAL2P["Mural API (public)"] Browser["Browser"] -->|"SPA + auth routes"| PublicDeploy @@ -53,7 +61,7 @@ graph LR | Phase | What It Delivers | Depends On | Parallelism | |-------|-----------------|------------|-------------| -| 1 Bootstrap | uv workspace skeleton (root + `banksy-server` under `pypackages/`), echo tool, health endpoint, `BANKSY_MODE` config, CI | Nothing | -- | +| 1 Bootstrap | uv workspace skeleton (root + `banksy-server` under `pypackages/`), echo tool, health endpoint, `AUTH_MODE` config, CI | Nothing | -- | | 2 OpenAPI Tools | `from_openapi()` integration, Mural API tools | 1 | Parallel with 4, 5 | | 3 Tool Curation | LLM-friendly names, descriptions, transforms, composites | 2 | -- | | 4 Database | PostgreSQL schema, Alembic migrations, token storage | 1 | Parallel with 2, 5 | @@ -76,12 +84,12 @@ Two hard constraints shape the FastMCP migration: ### Deployment Mode Selection (Option E) -Build one Docker image. At runtime, `BANKSY_MODE` selects the auth provider and tool set. Within each mode, tags provide finer-grained client-side filtering. +Build one Docker image. At runtime, `AUTH_MODE` selects the auth provider and tool set. Within each mode, tags provide finer-grained client-side filtering. ``` -BANKSY_MODE=internal -> FastMCP(auth=SSOProxyAuth) + internal tools + internal tags -BANKSY_MODE=public -> FastMCP(auth=MuralOAuthAuth) + public tools + public tags -BANKSY_MODE=dev -> FastMCP(auth=None) + all tools + all tags +AUTH_MODE=sso-proxy -> FastMCP(auth=SSOProxyAuth) + internal tools + internal tags +AUTH_MODE=mural-oauth -> FastMCP(auth=MuralOAuthAuth) + public tools + public tags +AUTH_MODE=dev -> FastMCP(auth=None) + all tools + all tags ``` ### Startup Flow @@ -96,10 +104,10 @@ def create_server() -> FastMCP: register_common_routes(mcp) # /health, /version match settings.banksy_mode: - case "internal": + case "sso-proxy": register_internal_tools(mcp) register_session_activation_routes(mcp) - case "public": + case "mural-oauth": register_public_tools(mcp) register_mural_oauth_routes(mcp) case "dev": @@ -116,9 +124,9 @@ def create_server() -> FastMCP: ### Auth Provider per Mode -**Internal mode (`sso-proxy`):** Layer 1 uses `OAuthProxy` or `RemoteAuthProvider` with Google IdP via SSO proxy. Layer 2 stores session JWTs in `mural_tokens`. Tools call `banksy-mural-api` (internal REST) with session JWTs. +**sso-proxy mode:** Layer 1 uses `OAuthProxy` or `RemoteAuthProvider` with Google IdP via SSO proxy. Layer 2 stores session JWTs in `mural_tokens`. Tools call `banksy-mural-api` (internal REST) with session JWTs. -**Public mode (`mural-oauth`):** Layer 1 uses `OAuthProxy` wrapping Mural's OAuth authorization server. Layer 2 stores Mural OAuth access/refresh tokens in `mural_tokens`. Tools call mural-api's public API with OAuth access tokens. +**mural-oauth mode:** Layer 1 uses `OAuthProxy` wrapping Mural's OAuth authorization server. Layer 2 stores Mural OAuth access/refresh tokens in `mural_tokens`. Tools call mural-api's public API with OAuth access tokens. **Dev mode:** Layer 1 has no auth (`auth=None` or `StaticTokenVerifier`). Layer 2 tokens loaded from dev seed data. Both tool sets registered; backend URLs configurable. @@ -163,8 +171,8 @@ banksy/ │ ├── src/ │ │ └── banksy_server/ │ │ ├── __init__.py -│ │ ├── server.py # Entry point: reads BANKSY_MODE, wires auth + domains -│ │ ├── config.py # pydantic-settings with BANKSY_MODE, DB URLs, auth +│ │ ├── server.py # Entry point: reads AUTH_MODE, wires auth + domains +│ │ ├── config.py # pydantic-settings with AUTH_MODE, DB URLs, auth │ │ ├── mural_api.py # FastMCP.from_openapi() integration │ │ ├── spa.py # SpaStaticFiles class │ │ ├── auth/ # providers.py, sso_proxy.py, mural_oauth.py, token_manager.py @@ -339,12 +347,12 @@ Two separate `from_openapi()` sub-servers, one per API spec: - Filter to the operation IDs currently exposed by `banksy-public-api` - Uses standard OAuth tokens for all operations -Both use `RouteMap`: GET → RESOURCE, POST/PUT/DELETE → TOOL, deprecated/internal → EXCLUDE. Each mounts onto the server within its respective `BANKSY_MODE` — `mount()` organizes tools by namespace within a single mode, not across auth modes (see Server Topology). +Both use `RouteMap`: GET → RESOURCE, POST/PUT/DELETE → TOOL, deprecated/internal → EXCLUDE. Each mounts onto the server within its respective `AUTH_MODE` — `mount()` organizes tools by namespace within a single mode, not across auth modes (see Server Topology). -**Phasing**: Start with the Public API spec in Phase 2 (when `BANKSY_MODE=public` or `dev`). Add the internal API spec as a follow-on (when `BANKSY_MODE=internal` or `dev`). The plumbing is identical — `from_openapi()` is called with different specs and different httpx clients (different base URLs, different auth injection per mode). +**Phasing**: Start with the Public API spec in Phase 2 (when `AUTH_MODE=mural-oauth` or `dev`). Add the internal API spec as a follow-on (when `AUTH_MODE=sso-proxy` or `dev`). The plumbing is identical — `from_openapi()` is called with different specs and different httpx clients (different base URLs, different auth injection per mode). ```python -# In BANKSY_MODE=public (or dev) +# In AUTH_MODE=mural-oauth (or dev) public_api = FastMCP.from_openapi( openapi_spec=public_spec, client=public_http_client, @@ -353,7 +361,7 @@ public_api = FastMCP.from_openapi( ) mcp.mount(public_api, namespace="mural") -# In BANKSY_MODE=internal (or dev) +# In AUTH_MODE=sso-proxy (or dev) internal_api = FastMCP.from_openapi( openapi_spec=internal_spec, client=internal_http_client, @@ -427,9 +435,9 @@ mcp.enable(tags={"murals"}, only=True) # Mural-focused deployment ### Deployment Modes (Resolved) -Mode merging is not recommended. `BANKSY_MODE` is preserved as a runtime configuration flag. Auth modes are capability constraints — internal and public tools call different APIs with incompatible token types. FastMCP's one-auth-per-server constraint means a single server cannot cleanly handle multiple auth strategies. MCP clients support multiple servers, so separate deployments per auth mode is transparent to users. +Mode merging is not recommended. `AUTH_MODE` is preserved as a runtime configuration flag. Auth modes are capability constraints — internal and public tools call different APIs with incompatible token types. FastMCP's one-auth-per-server constraint means a single server cannot cleanly handle multiple auth strategies. MCP clients support multiple servers, so separate deployments per auth mode is transparent to users. -The current two TS Dockerfiles (`Dockerfile` for sso-proxy, `Dockerfile.mural-oauth` for mural-oauth) are replaced by a single `Dockerfile.server` — the mode is runtime config (`BANKSY_MODE` env var), not build-time. See Server Topology for the full design. +The current two TS Dockerfiles (`Dockerfile` for sso-proxy, `Dockerfile.mural-oauth` for mural-oauth) are replaced by a single `Dockerfile.server` — the mode is runtime config (`AUTH_MODE` env var), not build-time. See Server Topology for the full design. --- @@ -612,7 +620,7 @@ This `get_authenticated_user` helper belongs in the auth module and is reused ac | (new) `IDP_ISSUER` | Expected JWT issuer | | (new) `IDP_AUDIENCE` | Expected JWT audience | | (new) `IDP_AUTHORIZATION_SERVER` | IdP URL for PRM metadata | -| (new) `BANKSY_MODE` | `internal`, `public`, or `dev` — selects auth provider and tool set (see Server Topology) | +| (new) `AUTH_MODE` | `sso-proxy`, `mural-oauth`, or `dev` — selects auth provider and tool set (see Server Topology) | | (new) `ENABLED_TAGS` | Optional comma-separated tag filter for specialized deployments (e.g., `read`) | **Key TS reference**: @@ -765,7 +773,7 @@ pypackages/server/src/banksy_server/domains/ └── tools.py ``` -Each domain's `register_*_tools(mcp)` function takes a `FastMCP` instance and registers all tools for that domain, including tags and metadata. The domain owns its tool definitions, schemas, and any domain-specific helpers. `server.py` calls the appropriate registration functions based on `BANKSY_MODE` (see Server Topology). +Each domain's `register_*_tools(mcp)` function takes a `FastMCP` instance and registers all tools for that domain, including tags and metadata. The domain owns its tool definitions, schemas, and any domain-specific helpers. `server.py` calls the appropriate registration functions based on `AUTH_MODE` (see Server Topology). ### from_openapi() in Domain Context @@ -789,7 +797,7 @@ def register_public_tools(mcp: FastMCP) -> None: ### Routes by Concern -Non-MCP HTTP routes (`routes/`) are organized by concern, not by mode. Mode-specific routes are registered conditionally in `server.py` based on `BANKSY_MODE` — for example, Session Activation routes are only registered in `internal` and `dev` modes. +Non-MCP HTTP routes (`routes/`) are organized by concern, not by mode. Mode-specific routes are registered conditionally in `server.py` based on `AUTH_MODE` — for example, Session Activation routes are only registered in `sso-proxy` and `dev` modes. ### Canvas-MCP Absorption @@ -908,7 +916,7 @@ pypackages/server/tests/ ├── test_token_refresh.py # Token refresh logic ├── test_auth_flow.py # OAuth flow (HeadlessOAuth) ├── test_session_activation.py # Session Activation routes -├── test_mode_selection.py # BANKSY_MODE startup paths +├── test_mode_selection.py # AUTH_MODE startup paths └── test_integration/ # End-to-end tests ``` @@ -926,7 +934,7 @@ The TS codebase has ~15 Vitest test files. These should be reviewed as reference ### Dockerfile.server -Workspace-aware multi-stage build using uv. One Docker image serves all modes — `BANKSY_MODE` is a runtime env var, not build-time. This replaces the two current TS Dockerfiles (`Dockerfile` for sso-proxy, `Dockerfile.mural-oauth` for mural-oauth). +Workspace-aware multi-stage build using uv. One Docker image serves all modes — `AUTH_MODE` is a runtime env var, not build-time. This replaces the two current TS Dockerfiles (`Dockerfile` for sso-proxy, `Dockerfile.mural-oauth` for mural-oauth). ```dockerfile # Stage 1: Build SPA @@ -1124,7 +1132,7 @@ banksy-shared = { workspace = true } - **Single workspace member → multi-member**: When a second Python service is needed (e.g., agent harness), add a directory under `pypackages/` with its own `pyproject.toml`. Extract shared code into `banksy-shared` at that time. The workspace glob auto-discovers new members. - **pre-commit → CI only**: If hooks cause friction during rapid iteration and the team is 1–2 developers, rely on CI alone. - **custom_route() → raw Starlette routing**: If HTTP routes grow complex, use `starlette.routing.Router` for grouping, `Mount` for sub-apps, or Starlette middleware wrappers for per-route concerns. FastAPI is a last resort — Starlette is already underneath FastMCP. -- **`BANKSY_MODE` per-deployment → mode merging**: If a future need requires multi-auth in a single process, revisit Option B (protocol-level routing) or Option D (middleware-based auth) from the [server topology analysis](../banksy-research/tool-visibility-server-topology-research.md). +- **`AUTH_MODE` per-deployment → mode merging**: If a future need requires multi-auth in a single process, revisit Option B (protocol-level routing) or Option D (middleware-based auth) from the [server topology analysis](../banksy-research/tool-visibility-server-topology-research.md). --- @@ -1144,6 +1152,8 @@ Items from the canvas-mcp alignment assessment and architecture research that ar | 12 | When to extract `banksy-shared` | Trigger: when a second consumer (agent harness) needs shared code (models, auth utils, Mural client) | Open (deferred) | | 13 | When to create `banksy-harness` | Trigger: when agent orchestration work begins | Open (deferred) | +**Future naming consideration:** The research documents proposed renaming `AUTH_MODE` to something like `BANKSY_MODE` with more semantic values (`internal`/`public`/`dev`). That has merit for clarity, but we keep the existing naming (`AUTH_MODE` with `sso-proxy`/`mural-oauth`/`dev`) for migration simplicity. Consider revisiting the rename once the migration stabilizes. + --- ## Deep Research Index diff --git a/fastmcp-migration/security-oauthproxy-tradeoff-review.md b/fastmcp-migration/security-oauthproxy-tradeoff-review.md new file mode 100644 index 0000000..ab8d09c --- /dev/null +++ b/fastmcp-migration/security-oauthproxy-tradeoff-review.md @@ -0,0 +1,73 @@ +# MCP Server Auth Migration: Tradeoff Review for Security Team + +**Author:** Willis Kirkham | **Date:** March 2026 | **Status:** Seeking Security Team Input + +--- + +## Context + +In early 2026, the security team audited our MCP server and recommended that it stop acting as an OAuth Authorization Server (AS) and become a pure Resource Server (RS) that validates externally-issued tokens. We agree with the direction — eliminating AS surface reduces attack vectors (confused deputy, DCR abuse, code interception). + +Before we commit to the implementation path, we want to make sure the full cost picture is visible — particularly around user coverage. Today, the MCP server supports all Mural users regardless of how they authenticate (email/password, Google, Microsoft, SAML SSO, OAuth2 SSO). The pure RS migration puts this at risk: enterprise users with SSO can authenticate through their existing IdP, but users without an external identity provider (primarily email/password) have no path unless we add one. Miro's MCP integration supports all their users today, so a coverage regression carries competitive weight. + +We could move ahead with pure RS and limit access to enterprise SSO users, accepting the regression. Or we could pursue a middle ground. We've identified three paths and we'd like your assessment to help inform this business and engineering decision. + +--- + +## The Three Paths + +### Path 1: Pure RS with External IdP + +The audit recommendation fully implemented — zero authorization endpoints, JWTs validated from an external IdP. + +- **Requires purchasing an external IdP.** Mural's auth infrastructure doesn't currently meet the RS model's requirements (HS256 tokens, no JWKS, no OAuth discovery). A third-party IdP (Auth0, Descope) is needed. Pricing needs PoC work to determine but ranges from free tiers at low usage to potentially significant costs at scale. +- **User coverage gap.** Without configuring Mural as a custom social connection in the IdP, email/password users lose access entirely (see Context above). This configuration is validated in concept but requires PoC. +- **Infrastructure complexity.** IdP configuration, custom social connection, possibly a token vault for single-step UX, ongoing operational responsibility. + +### Path 2: Pure RS with Mural-API as the Authorization Server + +Mural's API evolves to issue RS256 JWTs, expose JWKS, and serve OAuth discovery. The MCP server becomes a pure RS. + +- **Non-trivial changes to mural-api.** HS256 → RS256 migration, key management, JWKS and discovery endpoints. Touches every token validation path. +- **Not planned or prioritized.** New requirement needing planning and prioritization by R&D. +- **Does not eliminate the AS risk — it moves it.** The same attack surface would exist in mural-api instead. Arguably better managed there, but not eliminated from the system. + +### Path 3: OAuthProxy (Middle Ground) + +The MCP server proxies OAuth flows to Mural's existing login endpoints via FastMCP's OAuthProxy. Serves Protected Resource Metadata, issues its own RS256 JWTs, stores Mural tokens server-side. All current login methods continue to work. + +- **Retains AS-like surface.** `/authorize`, `/token`, DCR, and token issuance remain present — audit recommendation only partially satisfied. +- **No vendor cost, no cross-team dependency, no user access regression.** + +### Comparison + +| | External IdP | Mural-API as AS | OAuthProxy | +|---|---|---|---| +| AS surface on MCP server | Eliminated | Eliminated | Reduced | +| AS surface in system | Eliminated | Moved to mural-api | Remains | +| Vendor cost | Ongoing (TBD) | None | None | +| Cross-team dependency | None | Mural platform team | None | +| User coverage | SSO only without custom social config | All users | All users | +| Timeline | PoC needed (weeks) | Uncertain (months+) | Near-term | + +--- + +## OAuthProxy: What It Improves and What It Doesn't + +The current system uses Better Auth with server-side sessions and HTTP-only cookies. OAuthProxy improves this in three ways: + +1. **Stateless cryptographic tokens replace sessions.** RS256 JWTs validated by signature, not database lookup. Eliminates cookie theft/replay/fixation, session storage bugs, and the database as a runtime auth dependency. +2. **Identity verification delegated.** Token issuance tied to a successful upstream code exchange with Mural — no independent identity decisions. +3. **PKCE and CIMD support.** PKCE mitigates code interception. CIMD reduces DCR abuse risk via URL-based client identity verification. + +What remains: DCR endpoint still exists (though CIMD reduces its necessity), and `/authorize` and `/token` endpoints are proxied but present. An attacker can still register clients and initiate authorization flows. + +--- + +## The Question + +Given the costs of full RS migration — vendor dependency and user coverage risk (Path 1), or unplanned cross-team work with uncertain timeline (Path 2) — **we'd like your assessment of OAuthProxy as an intermediate posture.** It addresses the most concrete risks from the audit (session management, independent identity decisions, lack of PKCE) while preserving full user coverage and avoiding external dependencies. + +Does this provide enough of a security improvement to move forward with while we evaluate the long-term IdP path? If so, are there specific residual risks you'd want us to mitigate, or additional hardening measures we should apply to the OAuthProxy surface? + +**Note on MCP spec compliance:** OAuthProxy satisfies the MCP spec (Protected Resource Metadata, audience-bound tokens, no token passthrough). The spec allows the AS to be colocated with the RS. The full RS migration is driven by the audit, not spec compliance — and the spec timeline is more urgent (AI tools are dropping legacy auth support), while the audit's AS-elimination recommendation can be phased.