Skip to content

docs(spec): external MCP client (RFC)#99

Open
sdc53 wants to merge 1 commit intojackmusick:mainfrom
sdc53:spec/external-mcp-client
Open

docs(spec): external MCP client (RFC)#99
sdc53 wants to merge 1 commit intojackmusick:mainfrom
sdc53:spec/external-mcp-client

Conversation

@sdc53
Copy link
Copy Markdown
Contributor

@sdc53 sdc53 commented Apr 26, 2026

Design doc for adding MCP-client capability to Bifrost — the symmetric counterpart to the existing mcp_server/ package.

Filing as docs-only so the conversation has a stable URL to anchor on. Companion feature issue will be filed referencing this doc.

TL;DR: Reuse integrations + oauth_providers + configs, add a kind discriminator and one new external_mcp_tools cache table. resolve_agent_tools() gains a fourth source. Two-layer access control: Bifrost grants AND remote scopes.

First question in the spec is whether this is already on your roadmap — answer that first if so and I'll align rather than duplicate.

If green-lit I'd build the whole thing as a single feature PR (open question if you'd prefer it split, noted at the end of the spec).

Design doc for adding MCP-client capability to Bifrost. Filing as
docs-only so the discussion has a stable URL. Implementation follows
once direction is confirmed by upstream maintainer.
@sdc53 sdc53 requested a review from jackmusick as a code owner April 26, 2026 03:48
@sdc53
Copy link
Copy Markdown
Contributor Author

sdc53 commented Apr 26, 2026

Companion feature request: #100

@jackmusick
Copy link
Copy Markdown
Owner

jackmusick commented Apr 26, 2026

Edit (cleaning up my stream-of-consciousness reply): Trying to land the actual architectural concern more clearly.

This was on my roadmap and I like the data-model approach. Before I green-light, I want to surface a risk that I think is the real blocker, and ask for your take on it.

What I think the actual value is

Reading this back, I don't think the value of this RFC is "MCP-the-protocol" — it's catalog reuse. Mirroring 77 HaloPSA tools as 77 workflows doesn't scale, and an MCP-aware client lets us inherit a vendor's catalog without that toil. That part I want.

The risk I keep landing on

Bifrost integrations are system-first: org-scoped or global, headless agents, scheduled runs, no UI primitive for "your token expired, please re-consent." That model assumes credentials are configured once and refresh in the background.

Most useful vendor MCP servers (Slack, GitHub, M365 Admin, Atlassian, Notion) are user-first: per-user OAuth, interactive consent, the protocol leans into elicitation/sampling/etc.

Looking at how Copilot Studio, M365 Copilot Chat, Claude.ai, Cursor, and VS Code handle this — there's a remarkably consistent pattern: admin enables the MCP server at the org level, then each user OAuths individually. None of them have a "this tool is interactive-only" flag; tools are uniformly callable and the runtime just 401s if the wrong auth is attached. Headless only works with service-principal/managed-identity credentials.

So the architectural question I keep running into: if we build the service-account-only version of MCP-as-client now, how do we extend it to per-user later without a refactor? Specifically:

  1. Single point of configuration. We'd want admins to configure external MCP servers once (URL, scopes, allowed tools), the same way they configure integrations today. How do you envision a user-auth tier overlaying that config without a separate registration path?

  2. Gating user-MCP from system contexts. If a tool requires user auth, what happens when a scheduled agent or webhook-triggered agent tries to use it? Is the right answer "tools have an auth-context tag and the dispatcher refuses to bind them in non-interactive flows" — or do we just expect operators to know the rules and accept runtime 401s?

  3. Or: a third scoping layer. Today integrations scope is global → org. A future where it becomes global → org → user would let a user-scoped MCP tool override an org-scoped one in chat contexts only. That feels right philosophically (matches the two-stage gate pattern in Copilot/Claude/Cursor) but it's a meaningful refactor of the integration model. I'd genuinely value your thoughts on this — does that scoping model make sense as a forward path, and what would the disposition rules be?

What I'm asking

If you can sketch even a rough answer to (1) and (2) — or push back on (3) — I'd green-light. The thing I want to avoid is building a system-first MCP client now that we'd architecturally regret when (not if) the per-user MCP requirement shows up. If we agree on the auth-tier model upfront, the implementation can ship in stages.

Genuinely want to collaborate — the design itself is solid, I just want to make sure we build it on a foundation that survives the user-auth requirement we're definitely going to face.

@sdc53
Copy link
Copy Markdown
Contributor Author

sdc53 commented Apr 26, 2026

Thanks — these are genuinely the right questions and I had to back up and rethink. You're right that lumping MCP into "always service-account" hides a real use case I have on the roadmap.

Real forcing example: SpireTech's Tech Support agent needs to combine HaloPSA results (service-account — halopsa-mcp's job) with SharePoint Copilot Graph search results, where Microsoft Graph filters results by the calling user's permissions. Using a service principal there is either useless (no SharePoint membership) or scary (Application permissions that bypass per-document ACLs). The whole point of Copilot search is the personalized result ranking — it needs the calling user's identity. So per-user delegation isn't a v2 nice-to-have; it's required for the integration to be correct.

Proposed: admit both auth modes as configuration on the integration, not as separate features:

  • auth_mode = 'client_credentials' — admin sets up one credential per integration; every agent run uses it. Works in chat / scheduled / webhook contexts equally. (halopsa-mcp case.)
  • auth_mode = 'authorization_code' — admin sets up the OAuth app once; each user does a one-time consent at /integrations/{id}/connect; per-user refresh tokens stored in a new user_external_mcp_credentials table. (SharePoint case.)
  • Optional, opt-in per integration: service_user_id — a designated Bifrost user whose tokens are used as the fallback when the integration's auth_mode=authorization_code but the agent run has no caller (schedule/webhook). Disabled by default to avoid the "use the admin's super-token" foot-gun. Useful when you want an automated agent to query M365 as a dedicated service identity that has its own license + scoped permissions, rather than re-using a real user's token.

The four concerns become:

1 & 2 (admin-vs-user, scheduled agents): solved at the integration level. Service-mode tools always work. Per-user-mode tools require user context — resolve_agent_tools() filters them OUT when the agent run has no caller, unless the integration has a service_user_id configured. Avoids the anti-pattern: the agent's tool list reflects what auth context is actually available.

3 (mid-chat re-auth): solvable when Bifrost owns the chat UI. Dispatch returns a structured needs_reauth error with a reauth_url; the chat UI renders an inline reconnect button. Microsoft's refresh-token flow handles the silent path for non-expired refresh tokens. You're right this does NOT solve the case where a third-party chat client uses Bifrost as an MCP server and Bifrost is delegating to another external server — the third-party chat has no language for "this tool's downstream needs reauth." That genuinely is a v2 problem (or maybe never), but it's a separate deferral from the per-user delegation itself: per-user works fine for Bifrost-owned chat surfaces and for Bifrost agents driven from any context with a known user identity.

4 (philosophical): the SharePoint case is concrete proof that there's value beyond chat. A scheduled-or-chat-driven agent that needs to query M365 (or Salesforce, or any other vendor that ships an MCP server with delegated auth) gets to do it without us reimplementing search-with-permissions ourselves. That's not a "chat-client niche."

I'd plan to update the spec PR with this — the auth_mode column, the user_external_mcp_credentials table, the optional service_user_id, the executor's user-context plumbing, and the tool-availability filter for non-chat agent runs. The non-trivial bit is making agent_executor pass current_user_id through to dispatch — that touches the existing tool-dispatch contract, so worth flagging as a sensitive-paths change up front.

Per-user delegation needs to be in v1 in some form (full per-user, service-user fallback, or both — happy to follow your preference on which). Deferring it to v2 would mean the SharePoint integration class can't ship at all, and from what I'm seeing across the vendor MCP ecosystem (M365, Atlassian, Salesforce — all delegated by default) that's a significant chunk of the value.

Open to:

  • Both per-user AND service-user (most flexible; users get personalized results in chat, schedules still work)
  • Per-user only (cleanest; scheduled agents simply can't use SharePoint-class tools)
  • Service-user only (simpler implementation; loses the personalized-results benefit in chat)

Strong preference any direction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants