From 1659d3dc46532633416eeb48b2836440a3bc21c2 Mon Sep 17 00:00:00 2001
From: Juan Antonio Osorio <ozz@stacklok.com>
Date: Mon, 15 Sep 2025 13:32:34 +0300
Subject: [PATCH] docs: add Runtime Authoring Guide; document
 MCP_TRANSPORT/MCP_PORT contract

Signed-off-by: Juan Antonio Osorio <ozz@stacklok.com>
---
 docs/runtime-implementation-guide.md | 451 +++++++++++++++++++++++++++
 1 file changed, 451 insertions(+)
 create mode 100644 docs/runtime-implementation-guide.md

diff --git a/docs/runtime-implementation-guide.md b/docs/runtime-implementation-guide.md
new file mode 100644
index 000000000..72a6ce709
--- /dev/null
+++ b/docs/runtime-implementation-guide.md
@@ -0,0 +1,451 @@
+# ToolHive Runtime Authoring Guide
+
+This guide defines a stable, implementation-agnostic contract for adding new ToolHive runtimes.
+
+Contents
+- Scope and glossary
+- Runtime contract (capabilities and API shape)
+- Workload lifecycle (deploy, list, info, logs, stop, remove, attach)
+- Transports and port exposure
+- Network isolation reference design
+- Permissions and security mapping
+- Secrets handling
+- Labeling and discoverability
+- Idempotency and reconciliation
+- Error handling, logging, and monitoring
+- Observability and telemetry
+- Testing and conformance
+- Security posture hardening guidelines
+- Performance and scalability considerations
+- Compatibility and portability
+- Implementation checklist
+- Acceptance criteria
+
+## 1. Scope and glossary
+
+- Runtime: A backend that materializes an MCP server as a managed “workload” on a given platform (e.g., Docker, Kubernetes, future platforms).
+- Workload: The process/container/pod that runs the MCP server.
+- Auxiliary components: Supporting processes/containers (DNS, egress proxy, ingress proxy) created to implement network isolation and ingress exposure.
+- Transport: How ToolHive proxies communicate with the MCP server:
+  - stdio (no network exposure)
+  - SSE
+  - Streamable HTTP
+- Permission profile: A JSON-level description of allowed file-system access, process privileges, and network policy for a workload. The CLI resolves profiles and passes an effective configuration to the runtime.
+- Isolation: When enabled, ToolHive enforces outbound network ACLs via an egress proxy, restricts DNS via a DNS service, and, for non-stdio transports, exposes ingress only through a controlled proxy.
+
+## 2. Runtime contract
+
+A runtime must implement the following capabilities with consistent semantics:
+
+- Deploy workload
+  - Inputs: See `RunConfig` struct in `pkg/runner/config.go` for the complete set of parameters including image reference, workload name, command/args, environment variables, labels, permission profile, transport type, deploy options, and network isolation flag.
+  - Output: an integer host port when the transport requires ingress exposure; otherwise 0 (e.g., stdio).
+  - Constraints:
+    - **Note on current implementation**: As of this writing, `thv run` returns an error if a workload with the same name already exists. The desired behavior described below represents the target state for runtime implementations.
+    - Idempotent (target behavior): If the same workload (by name) already exists with the same effective configuration, reuse it and start if stopped.
+    - Reconcile differences: If configuration diverges, replace the workload accordingly.
+- List workloads
+  - Return a list of managed workloads, excluding auxiliary components used for isolation.
+  - Include human-readable status string, normalized WorkloadStatus enum, labels, created time, and port mappings.
+- Get workload info
+  - Return a detailed view for a single workload, including normalized state, labels, created time, and port mappings.
+- Get workload logs
+  - Return combined stdout/stderr, optionally following.
+- Stop workload
+  - Idempotent: Success if already stopped or missing.
+  - If isolated, attempt to stop auxiliary components (best-effort).
+- Remove workload
+  - Idempotent: Success if already removed.
+  - Remove auxiliary components and internal networks for isolated workloads (best-effort).
+- Attach (optional, platform-dependent)
+  - Provide an interactive stdio attach for platforms that support it (e.g., Kubernetes exec/attach semantics).
+
+Data model expectations (conceptual, not code):
+- ContainerInfo:
+  - name: unique workload name
+  - image: original image string
+  - status: human-readable (e.g., “Up 1m”, “Pending”)
+  - state: normalized enum (Running, Starting, Stopped, Removing, Unknown)
+  - created: timestamp
+  - labels: map[string]string
+  - ports: list of {containerPort, hostPort, protocol}
+- DeployWorkloadOptions (conceptual):
+  - attachStdio: bool (attach stdin/stdout/stderr; typically true for stdio transport, false for HTTP-based transports)
+  - exposedPorts: map of “port/proto” -> empty struct (e.g., “8080/tcp”)
+  - portBindings: map of “port/proto” -> list of {hostIP, hostPort}
+  - platform-specific extension fields (e.g., Kubernetes pod template patch) must be optional and ignored by other runtimes.
+
+## 3. Workload lifecycle
+
+Deploy
+- Resolve and validate the effective permission configuration and deploy options.
+- Ensure the image is available (pull, or gracefully continue if present locally and pull fails).
+- If isolateNetwork=false:
+  - Configure filesystem and process security from the permission config.
+  - Configure exposed ports and host port bindings if the transport needs ingress.
+- If isolateNetwork=true:
+  - Build the isolation topology (see Network isolation reference design).
+  - Inject proxy environment variables (HTTP_PROXY, HTTPS_PROXY, NO_PROXY) into the workload.
+  - For non-stdio transports, publish a host port via an ingress proxy and return the assigned port.
+- Apply standard labels (see Labeling and discoverability).
+- If attachStdio=true, enable interactive session wiring where platform supports it (does not impact return semantics).
+- Return 0 for stdio transport, or the published host port for SSE/Streamable HTTP.
+
+Info
+- Provide the same normalization guarantees as List but for a single workload.
+- Do not assume the workload is running; report current state.
+
+Logs
+- Provide combined stdout/stderr, with follow semantics if requested.
+- Never include secrets in logs; redact or avoid printing environment variable values.
+
+Stop
+- If the workload is running, request graceful termination with a reasonable timeout.
+- If the workload participated in isolation, best-effort stop of auxiliary components.
+- If not found, success (idempotency).
+
+Remove
+- Remove workload and auxiliary resources; clean up isolation networks when orphaned.
+- If not found, success (idempotency).
+
+## 4. Transports and port exposure
+
+- stdio
+  - No network exposure.
+  - Deploy returns hostPort=0.
+  - Communication runs over stdio via the ToolHive proxy process.
+- SSE and Streamable HTTP
+  - The MCP server exposes an HTTP endpoint.
+  - Non-isolated: publish a host port with a deterministic or random binding (respect input mappings).
+  - Isolated: front with an ingress HTTP proxy that publishes a host port and reverse-proxies to the internal service.
+
+Port binding policy
+- When the caller supplied an explicit host port mapping for a user-facing workload, honor it (except when isolation forces ingress proxy ownership of the host port).
+- For automatic/random port assignment, set exactly one host port per deployment for the primary exposed service.
+
+## 5. Network isolation reference design
+
+When isolateNetwork=true, instantiate the following topology:
+
+- Networks
+  - “External” network: shared link to host networking.
+  - “Internal” per-workload network: private segment named by workload; accessible only to the workload and auxiliary components.
+- Components
+  - Egress proxy (HTTP/HTTPS)
+    - Enforces outbound ACLs from the permission profile.
+    - Termination point for all outbound HTTP/HTTPS; other protocols are not guaranteed and should be blocked by default.
+    - Inject HTTP(S)_PROXY and NO_PROXY environment variables into the workload.
+  - DNS
+    - Provide controlled name resolution, ensuring outbound destinations match permitted hosts.
+  - Ingress proxy (HTTP)
+    - Only for SSE/Streamable HTTP.
+    - Publishes a host port on the external network and reverse-proxies to the workload on the internal network.
+- Traffic flow
+  - Workload → DNS/Egress proxy → External destinations (HTTP/HTTPS).
+  - External client → Ingress proxy (host port) → Workload service (internal network).
+- Limitations
+  - Isolation is defined for HTTP/HTTPS through the egress proxy and domain-based ACLs.
+  - If a server must use arbitrary TCP protocols, recommend running without isolation; rely on the platform’s default container isolation.
+- Clean-up
+  - Stop/remove auxiliary components when stopping/removing the workload.
+  - Remove per-workload internal networks when not referenced by other live components.
+
+## 6. Permissions and security mapping
+
+A runtime must map effective permission configuration into platform-native primitives:
+
+- Filesystem
+  - Mounts:
+    - Bind host paths into the workload with read-only/read-write per profile.
+    - Fail fast if requested mounts cannot be honored.
+- Process privileges
+  - Capabilities:
+    - Drop all by default; selectively add minimal required capabilities.
+  - Privileged:
+    - Strongly discouraged; allow only when explicitly requested by the profile.
+  - Security options:
+    - Apply platform-appropriate confinement (e.g., seccomp/AppArmor; read-only root filesystem when possible).
+  - User:
+    - Run as non-root by default; enable configurable user/group when supported.
+- Network mode (non-isolated runs)
+  - Respect configured network mode as supported by the platform (e.g., bridge/none/host semantics).
+- Restart policy
+  - Use a safe, non-aggressive default (e.g., restart-on-failure or unless-stopped for long-lived proxies), with platform-specific tuning.
+
+Platform guidance examples
+- Kubernetes-style platforms
+  - Prefer pod/container security contexts that enforce:
+    - Non-root execution
+    - No privilege escalation
+    - Read-only root filesystem (unless explicitly required)
+    - Capability drops (“ALL” by default)
+  - For OpenShift-like environments:
+    - Allow platform to assign UID/GID/FSGroup when required by security constraints.
+    - Set seccomp profile to runtime/default where appropriate.
+
+## 7. Secrets handling
+
+- Secrets are injected as environment variables at deploy time by the CLI and passed through verbatim by the runtime.
+- Do not log secret values. Avoid printing full environment vectors.
+- When isolation is enabled (isolateNetwork=true), overlay proxy-related environment variables:
+  - HTTP_PROXY, HTTPS_PROXY, http_proxy, https_proxy (pointing to the egress proxy)
+  - NO_PROXY, no_proxy (including loopback addresses and internal network ranges)
+  - Preserve pre-existing keys by overriding only the proxy variables and leaving other keys unchanged.
+- Runtimes must treat secrets as opaque; they are not stored by the runtime.
+
+## 8. Labeling and discoverability
+
+Apply consistent labels to all resources:
+- toolhive=true on all primary workloads.
+- Name labels:
+  - Use the workload name (and “app” on orchestrators that prefer it).
+- Tool type:
+  - Label the main MCP server workload to distinguish it from auxiliary components.
+- Auxiliary flag:
+  - Mark isolation components (ingress/egress/DNS) as auxiliary so they can be excluded from List.
+- Isolation flag:
+  - Mark primary workloads that were deployed with isolation; lifecycle operations should use this to decide whether auxiliary clean-up is required.
+
+List/Info behavior:
+- Exclude auxiliary components.
+- Surface labels to help operators and other ToolHive components reason about inventory.
+
+## 9. Idempotency and reconciliation
+
+Deploy must:
+- Determine if a workload with the requested name already exists.
+- Compare effective configuration (image, command, env, labels, mount set, privilege set, security options, exposed ports/bindings, and, when isolated, presence of proxy/DNS wiring).
+- If equal: start if stopped and return success.
+- If different: replace the workload; ensure minimal downtime and consistent labels.
+
+Stop/Remove must:
+- Treat missing workloads as success.
+- For isolated workloads, stop/remove auxiliary components and remove unused per-workload internal networks.
+
+## 10. Error handling, logging, and monitoring
+
+- Wrap platform errors with context that includes workload name or resource identity.
+- Classify “not found” conditions as non-fatal in stop/remove paths.
+- Provide clear messages for “exited unexpectedly” including last known logs and reported status.
+- Implement a monitor that periodically checks “is running” state and reports an error when the workload disappears or stops unexpectedly, including a short log excerpt.
+
+## 11. Observability and telemetry
+
+- Emit structured logs with clear operation names (deploy, list, info, logs, stop, remove, attach).
+- Include correlation identifiers (workload name) and outcome (success/failure with reason).
+- Optionally expose metrics for:
+  - Deploy durations and outcomes
+  - Running workload count
+  - Proxy start failures
+  - Image pull outcomes
+- Avoid logging environment variables or sensitive values.
+
+## 12. Testing and conformance
+
+Unit-test matrix (minimum):
+**Note**: The following test requirements represent the target state. Current runtime implementations may not yet meet all these requirements.
+
+- Deploy stdio (isolated and non-isolated) – returns port 0; no ingress proxy.
+- Deploy SSE/Streamable HTTP (isolated and non-isolated) – returns published host port.
+- Port-binding behaviors:
+  - Honor explicit bindings; assign exactly one random host port when requested.
+- Isolation topology:
+  - Creation of internal network, DNS, egress proxy, ingress proxy (where applicable).
+  - Proxy env injection and DNS passing to workload.
+- Labeling:
+  - Primary workloads labeled; auxiliary flagged and filtered from listings.
+- List/Info:
+  - State normalization; port mapping extraction; created time handling.
+- Stop/Remove:
+  - Idempotent when missing.
+  - Auxiliary clean-up and network teardown (best-effort).
+- Errors:
+  - Propagate platform API errors; wrap with context.
+- Permissions:
+  - Mounts, capabilities, privileged, security options applied as requested.
+- Platform-specific extensions (where applicable):
+  - Security contexts and platform detection shape.
+
+Conformance guidance:
+- Provide a black-box conformance suite that deploys representative MCP servers across transports, toggles isolation, and asserts runtime-invariant behavior (ports, labels, state machine, idempotency).
+- Include regression tests for common edge cases (e.g., invalid port mapping keys, bad time formats, non-numeric port parsing).
+
+## 13. Security posture hardening
+
+Defaults
+- Run as non-root.
+- Read-only root filesystem where possible.
+- Drop all capabilities; add only the minimal set required.
+- Disallow privilege escalation.
+- Disable container device access unless explicitly required.
+- Avoid host network, host PID/IPC, or other host-level sharing by default.
+
+Isolation
+- Enforce egress policy via HTTP/HTTPS proxy and DNS control.
+- Ensure the proxy images are pulled from trusted registries and are version-pinned where feasible.
+- Consider name-resolution bypass mitigations (e.g., prevent /etc/hosts injection by workloads if supported by the platform).
+
+Secrets
+- Treat all secrets as opaque envs; do not persist, print, or export them.
+- Recommend short-lived tokens or centralized providers (e.g., 1Password) for operators.
+
+## 14. Performance and scalability
+
+- Cache/pull optimization:
+  - Attempt to pull images; if pull fails but image exists locally, continue.
+- Reuse shared external network constructs where possible.
+- Create per-workload internal networks only when isolation is enabled.
+- Use exponential backoff and timeouts for platform API calls.
+- Avoid tight polling in monitors; prefer modest intervals and backoff on errors.
+
+## 15. Compatibility and portability
+
+- Names:
+  - Sanitize workload names to meet platform-specific constraints (length, allowed characters).
+- Ports:
+  - Detect collisions; provide actionable errors or retry randomized host ports when safe.
+- OS/Kernel features:
+  - Be resilient to missing features (cgroups, seccomp); degrade gracefully and warn.
+- Network drivers:
+  - Work with common defaults; document requirements for custom drivers.
+
+## 16. Implementation checklist
+
+- Initialization
+  - Implement IsAvailable by creating a platform client with a short timeout.
+- Deploy
+  - Resolve permission configuration and deploy options.
+  - Ensure image availability (pull with local fallback).
+  - Map permission config to platform mounts, capabilities, privilege, and security options.
+  - If isolateNetwork:
+    - Create internal per-workload network.
+    - Start DNS and egress proxy; inject proxy envs.
+    - For non-stdio, start ingress proxy; publish host port and return it.
+  - Else:
+    - Expose ports directly with host bindings as requested.
+  - Apply standard labels (primary workload vs auxiliary; isolation flag).
+  - Attach stdio if requested (platform permitting).
+- List/Info
+  - Exclude auxiliary components; normalize status and ports; include created time and labels.
+- Logs
+  - Combined stdout/stderr; follow option.
+- Stop/Remove
+  - Idempotent; best-effort auxiliary/network cleanup.
+- Errors
+  - Wrap platform errors with workload identity; treat not-found as success on stop/remove.
+- Tests
+  - Cover success paths, mismatches, isolation, labeling, ports, and error propagation.
+
+## 17. Acceptance criteria
+
+A runtime implementation is considered conformant when the following are satisfied:
+
+- Deploy (stdio)
+  - Returns 0 host port; no ingress proxy created; isolation components created only if isolateNetwork=true.
+- Deploy (SSE/Streamable HTTP)
+  - Non-isolated: host port exposed by binding; connectivity reachable.
+  - Isolated: host port exposed via ingress proxy; internal service not directly routable.
+- Isolation
+  - Outbound HTTP/HTTPS routes only via egress proxy; DNS queries resolved via controlled DNS.
+  - Proxy env vars present in the workload; NO_PROXY includes loopback addresses at minimum.
+- Permissions
+  - Mounts, capabilities, privileged, security options mapped correctly per profile.
+- Labels and listing
+  - Primary workloads have toolhive=true (and analogous “tool-type” labels); auxiliary components flagged and excluded from List.
+- Idempotency
+  - Re-deploy with same configuration reuses existing workload (starts if stopped).
+  - Re-deploy with different configuration replaces the workload and applies new config.
+- Stop/Remove
+  - No error on missing workloads; auxiliary and internal networks cleaned up when isolated.
+- Errors and logs
+  - Errors include workload identity and context; logs retrievable and followable.
+- Conformance tests
+  - Passes the conformance suite across transports and isolation modes.
+
+---
+
+This document is the source of truth for runtime behavior. New runtimes should use it as a checklist to ensure consistent UX, security posture, and operational characteristics across platforms while allowing platform-specific optimizations and extensions.
+## Appendix: MCP_TRANSPORT and MCP_PORT contract (runtime obligations)
+
+Goal
+- Ensure every workload receives canonical transport-related environment variables in a way that remains stable across platforms and isolation modes.
+
+Authoritative variables
+- MCP_TRANSPORT: One of stdio, sse, streamable-http. This tells the MCP server how to expose itself.
+- MCP_PORT: The TCP port inside the workload where the MCP server should bind (only for sse or streamable-http).
+- FASTMCP_PORT (optional): Mirror of MCP_PORT for servers that also read FASTMCP_PORT.
+- MCP_HOST (optional): The host interface the server should bind to; defaults to 0.0.0.0 when omitted.
+
+Runtime requirements
+- Always ensure MCP_TRANSPORT is present in the workload environment and matches the selected transport.
+- For sse and streamable-http:
+  - Ensure MCP_PORT is present and corresponds to the internal “target” port that the MCP server should bind to within the workload’s network namespace.
+  - Optionally set FASTMCP_PORT to the same value as MCP_PORT for compatibility with servers that use it.
+  - Optionally set MCP_HOST when the platform requires an explicit bind address (e.g., inside some orchestrators). Default assumed by servers should be 0.0.0.0.
+- For stdio:
+  - Do not set MCP_PORT; only MCP_TRANSPORT=stdio is required.
+
+Precedence and merge strategy
+- If MCP_TRANSPORT and/or MCP_PORT are already present in the caller-provided env, do not override them.
+- Only inject defaults when absent.
+- When network isolation is enabled and HTTP(S) proxy env vars are injected, overlay only proxy-related variables; avoid mutating MCP_* variables that already exist.
+
+Determining MCP_PORT (sse/streamable-http)
+- Single target port:
+  - If the deploy options define a single clearly intended container service port (e.g., via exposedPorts), use that port for MCP_PORT.
+- Multiple target ports:
+  - Select a primary application port deterministically (e.g., the first declared “port/proto” entry in natural order) and document that policy.
+- No explicit port provided:
+  - Use a runtime-wide default (for example, 8080) that is documented and consistently applied.
+  - The default should be overridable by the caller via env or options.
+- Important: MCP_PORT represents the in-container binding port for the MCP server. It is not the host/ingress port. The runtime may allocate/publish a host port (directly or through an ingress proxy), but MCP_PORT must remain the workload’s internal port so the process knows where to listen.
+
+Interaction with host/ingress ports
+- Non-isolated:
+  - The runtime may bind hostPort → containerPort; return the selected host port from Deploy.
+  - The workload receives MCP_PORT=containerPort. The caller-facing port (host) is distinct and is not injected as MCP_PORT.
+- Isolated:
+  - The runtime creates an ingress proxy that publishes hostPort and forwards to the workload’s MCP_PORT on the internal network.
+  - Return the published hostPort from Deploy.
+  - The workload still receives MCP_PORT=containerPort (internal target port).
+  - Do not inject hostPort as MCP_PORT.
+
+MCP_HOST (optional)
+- Runtimes should default the server bind host to 0.0.0.0 when not set (or omit MCP_HOST if servers already default correctly).
+- If set, MCP_HOST should typically be 0.0.0.0 for containerized environments unless the platform dictates a specific interface.
+
+Examples
+- stdio
+  - Inject MCP_TRANSPORT=stdio
+  - Do not set MCP_PORT
+  - Deploy returns 0
+- sse (non-isolated)
+  - Inject MCP_TRANSPORT=sse, MCP_PORT=8080 (or chosen/declared container target port)
+  - Publish a host port binding (random or requested)
+  - Deploy returns hostPort (e.g., 18080)
+- sse (isolated)
+  - Inject MCP_TRANSPORT=sse, MCP_PORT=8080 (or chosen target port)
+  - Ingress proxy publishes hostPort (e.g., 18080) and forwards to 8080 inside the internal network
+  - Deploy returns hostPort (18080)
+- streamable-http
+  - Same as sse in terms of MCP_TRANSPORT/MCP_PORT
+  - Optionally add FASTMCP_PORT=MCP_PORT and MCP_HOST=0.0.0.0 if the target server expects them
+
+Security and logging
+- Treat MCP_* variables as non-secret but avoid dumping complete environment sets in logs.
+- Never log user-provided env var values verbatim.
+
+Portability notes
+- Do not rely on host networking details inside the workload; MCP_PORT is always the internal port.
+- If the higher-level toolchain injects MCP_* already, the runtime must not override them; the runtime’s job is to guarantee presence when absent and to return the published hostPort (when applicable) to the caller.
+
+Cross-cutting consistency
+- The Deploy return value for non-stdio transports is the externally reachable host port (direct binding or via ingress proxy).
+- The MCP_PORT env value is the internal service port used by the MCP server process.
+- This separation allows upper layers to route traffic correctly while keeping server configuration consistent.
+
+Implementation guidance (non-normative)
+- Determine target container port from deploy options (exposed ports, pod template extension, or defaults).
+- Before container/pod creation, merge env:
+  - Respect user vars → overlay MCP_TRANSPORT/MCP_PORT only if missing → overlay proxy envs (when isolated).
+- Avoid platform-specific leakage into MCP_PORT semantics (e.g., do not pass NodePort/LoadBalancer ports to the workload).
\ No newline at end of file