This document is the dense transport reference for SpeakSwiftlyServer. Keep the operator-facing summary in README.md concise and move detailed contract inventory here instead.
The server exposes one shared localhost host process with:
- an HTTP surface
- an optional MCP surface
- shared retained request, artifact, playback, and runtime snapshots behind both transports
When the same host is embedded through EmbeddedServerSession, the transport process now runs
inside one outer service-owned lifecycle group that also owns package-level host startup,
config-watch lifetime, and optional MCP readiness and drain. The HTTP and MCP contracts described
below are unchanged by that embedding model, but the ownership story is now flatter and more
explicit for app hosts and maintainers.
When APP_CONFIG_FILE is set, the server watches that YAML file through ReloadingFileProvider<YAMLSnapshot>. The optional APP_CONFIG_RELOAD_INTERVAL_SECONDS environment variable controls the polling interval and defaults to 2 seconds.
Only the host-safe subset reloads live today:
app.nameapp.environmentapp.sseHeartbeatSecondsapp.completedJobTTLSecondsapp.completedJobMaxCountapp.jobPruneIntervalSeconds
Changes to bind addresses, ports, HTTP enablement, MCP enablement, MCP path, or MCP server metadata are detected and reported, but they still require a process restart before they can take effect.
SPEAKSWIFTLY_PROFILE_ROOT is also a startup-only setting. It points at the runtime profile root directory the server should own, and the server threads that same root through both its own runtime-configuration persistence and the underlying SpeakSwiftly profile and artifact persistence. Because that setting changes filesystem ownership rather than hot runtime state, it is intentionally not part of the live-reloaded YAML surface.
GET /healthzGET /readyzGET /runtime/hostGET /runtime/statusGET /runtime/configurationPOST /runtime/backendPOST /runtime/models/reloadPOST /runtime/models/unloadPUT /runtime/configuration
GET /voicesPOST /voices/from-descriptionPOST /voices/from-audioPOST /voices/{profile_name}/rerollPUT /voices/{profile_name}/nameDELETE /voices/{profile_name}
GET /text-profilesGET /text-profiles/styleGET /text-profiles/baseGET /text-profiles/activeGET /text-profiles/effectiveGET /text-profiles/effective/{profile_id}GET /text-profiles/stored/{profile_id}POST /text-profiles/storedPOST /text-profiles/loadPOST /text-profiles/savePOST /text-profiles/active/resetPOST /text-profiles/active/replacementsPOST /text-profiles/stored/{profile_id}/replacementsPUT /text-profiles/stored/{profile_id}PUT /text-profiles/stylePUT /text-profiles/activePUT /text-profiles/active/replacements/{replacement_id}PUT /text-profiles/stored/{profile_id}/replacements/{replacement_id}DELETE /text-profiles/stored/{profile_id}DELETE /text-profiles/active/replacements/{replacement_id}DELETE /text-profiles/stored/{profile_id}/replacements/{replacement_id}
POST /speech/livePOST /speech/filesPOST /speech/batchesGET /requestsGET /requests/{request_id}GET /requests/{request_id}/eventsGET /generation/queueGET /generation/jobsGET /generation/jobs/{job_id}GET /generation/filesGET /generation/files/{artifact_id}GET /generation/batchesGET /generation/batches/{batch_id}
GET /playback/stateGET /playback/queuePOST /playback/pausePOST /playback/resumeDELETE /playback/queueDELETE /playback/requests/{request_id}
POST /speech/live, POST /voices/from-description, POST /voices/from-audio, PUT /voices/{profile_name}/name, POST /voices/{profile_name}/reroll, and DELETE /voices/{profile_name} all return accepted-request metadata immediately.
Those responses use request_id, request_url, and events_url so ordinary HTTP clients can follow one tracked request cleanly without having to learn the MCP resource model first.
POST /speech/live mirrors the current public live-speech queue lane and accepts optional cwd, repo_root, text_profile_name, text_format, nested_source_format, and source_format fields so callers can pass path-aware and normalization-aware context explicitly.
The /text-profiles route family is synchronous and state-oriented rather than request-oriented. It exposes the current built-in style plus base, active, stored, and effective TextForSpeech.Profile state, along with replacement editing and profile persistence paths for downstream apps or agents that need to shape normalization deliberately.
GET /text-profiles/style and PUT /text-profiles/style mirror the built-in normalization-style control that now participates in effective normalization alongside custom profiles.
POST /text-profiles/load and POST /text-profiles/save map directly to the public text-profile persistence calls so operators can refresh or flush stored normalization state without reaching into the runtime process manually.
The queue and playback control routes are immediate control operations rather than long-running requests.
GET /generation/queueandGET /playback/queueexpose the generation and playback queues separately so the HTTP layer matches the runtime's split control surface.GET /playback/state,POST /playback/pause, andPOST /playback/resumeexpose the current playback state and let clients control it directly.DELETE /playback/queueclears queued playback work and returns the number of cancelled queued requests.DELETE /playback/requests/{request_id}cancels one active or queued request and returns the cancelled request ID.
The runtime routes are also state-oriented.
GET /runtime/hostreturns the shared-host overview with readiness, queues, transports, cached profiles, and recent errors.GET /runtime/statusreturns the underlyingSpeakSwiftly.StatusEvent.GET /runtime/configurationandPUT /runtime/configurationexpose the saved next-start backend configuration.POST /runtime/backendhot-switches the active backend.POST /runtime/models/reloadandPOST /runtime/models/unloadfollow the current runtime-control verbs directly.
The current HTTP SSE route remains intentionally job-specific at the route boundary, but it now rides the same host-owned event backbone used by other non-UI consumers instead of keeping a separate per-job subscriber registry inside ServerHost.
The MCP surface is optional and mounts on the same shared Hummingbird process at APP_MCP_PATH when APP_MCP_ENABLED=true.
generate_speechgenerate_audio_filegenerate_batchlist_active_requestslist_generation_jobsget_generation_jobexpire_generation_joblist_generated_filesget_generated_filelist_generated_batchesget_generated_batch
create_voice_profile_from_descriptioncreate_voice_profile_from_audioupdate_voice_profile_namereroll_voice_profilelist_voice_profilesdelete_voice_profile
get_text_normalizer_snapshotget_text_profile_styleset_text_profile_styleload_text_profilessave_text_profilescreate_text_profilestore_text_profileuse_text_profiledelete_text_profilereset_active_text_profileadd_text_replacementreplace_text_replacementremove_text_replacement
get_runtime_overviewget_runtime_statusget_staged_runtime_configset_staged_configswitch_speech_backendreload_modelsunload_modelslist_generation_queuelist_playback_queuepause_playbackresume_playbackget_playback_stateclear_playback_queuecancel_request
speak://runtime/overviewspeak://runtime/statusspeak://runtime/configuration
speak://voicesspeak://voices/guidespeak://voices/{profile_name}
speak://text-profilesspeak://text-profiles/stylespeak://text-profiles/basespeak://text-profiles/activespeak://text-profiles/effectivespeak://text-profiles/effective/{profile_id}speak://text-profiles/stored/{profile_id}speak://text-profiles/guide
speak://requestsspeak://requests/{request_id}speak://generation/jobsspeak://generation/jobs/{job_id}speak://generation/filesspeak://generation/files/{artifact_id}speak://generation/batchesspeak://generation/batches/{batch_id}speak://playback/guide
Those MCP tools and resources are intentionally thin adapters over the same ServerHost snapshots and mutations used by the HTTP API and the app-facing ServerState.
Accepted-request MCP tool results return request_id, request_resource_uri, and status_resource_uri so coding agents can follow one tracked request immediately while still having an obvious top-level status resource for orientation.
The embedded MCP prompt catalog currently includes:
draft_profile_voice_descriptiondraft_profile_source_textdraft_voice_design_instructiondraft_queue_playback_noticedraft_text_profiledraft_text_replacementchoose_surface_action
The text-profile prompts and the speak://text-profiles/guide resource are there so an app-hosted or MCP-hosted agent can help a user author replacements deliberately instead of treating normalization rules like hidden implementation detail.
The embedded MCP surface supports resource subscriptions for the live state resources and templates backed by shared host updates.
Clients connected to the standalone MCP event stream can subscribe to:
speak://runtime/overviewspeak://runtime/statusspeak://runtime/configurationspeak://voicesspeak://voices/{profile_name}speak://requestsspeak://requests/{request_id}speak://generation/jobsspeak://generation/jobs/{job_id}speak://generation/filesspeak://generation/files/{artifact_id}speak://generation/batchesspeak://generation/batches/{batch_id}speak://text-profilesspeak://text-profiles/stylespeak://text-profiles/basespeak://text-profiles/activespeak://text-profiles/effectivespeak://text-profiles/effective/{profile_id}speak://text-profiles/stored/{profile_id}
Subscribed clients receive notifications/resources/updated when shared host events change the underlying state.
Transport lifecycle snapshots are intentionally tied to the shared Hummingbird process rather than static config alone. listening means the shared HTTP host has actually reached Hummingbird's onServerRunning boundary, so HTTP and MCP surface status describe real network availability instead of only configuration intent.