This document describes the WebSocket and REST API endpoints provided by Active Call.
All API endpoints are relative to the server base URL.
Most endpoints require WebSocket upgrade for real-time communication.
The following three endpoints establish WebSocket connections for different voice communication protocols:
Endpoint: GET /call
Description: Establishes a WebSocket connection for voice call handling with audio stream transmitted via WebSocket.
Parameters:
id(optional, string): Session ID. If not provided, a new UUID will be generated (prefixed withs.).dump_events(optional, boolean): Enable event dumping to file. Default:true.ping_interval(optional, number): Interval in seconds to send Ping events. Default:20. Set to0to disable.server_side_track(optional, string): Override server-side track ID.
Response: WebSocket connection upgrade
Usage:
const ws = new WebSocket('ws://localhost:8080/call?id=session123&dump_events=true&ping_interval=20');Endpoint: GET /call/webrtc
Description: Establishes a WebSocket connection for WebRTC call handling with audio stream transmitted via WebRTC RTP.
Note: WebRTC requires a Secure Context. Ensure you are accessing your web client via HTTPS or 127.0.0.1, otherwise the browser will not enable WebRTC functionality.
Parameters:
id(optional, string): Session ID. If not provided, a new UUID will be generated (prefixed withs.).dump_events(optional, boolean): Enable event dumping to file. Default:true.ping_interval(optional, number): Interval in seconds to send Ping events. Default:20. Set to0to disable.server_side_track(optional, string): Override server-side track ID.
Response: WebSocket connection upgrade
Usage:
const ws = new WebSocket('ws://localhost:8080/call/webrtc?id=session123&dump_events=true');Endpoint: GET /call/sip
Description: Establishes a WebSocket connection for SIP call handling with audio stream transmitted via SIP/RTP.
Parameters:
id(optional, string): Session ID. If not provided, a new UUID will be generated (prefixed withs.).dump_events(optional, boolean): Enable event dumping to file. Default:true.ping_interval(optional, number): Interval in seconds to send Ping events. Default:20. Set to0to disable.server_side_track(optional, string): Override server-side track ID.
Response: WebSocket connection upgrade
Usage:
const ws = new WebSocket('ws://localhost:8080/call/sip?id=session123&dump_events=true');sequenceDiagram
participant Client
participant RustPBX
participant MediaEngine
participant ASR/TTS
Client->>RustPBX: WebSocket Connect
RustPBX->>Client: Connection Established
Client->>RustPBX: Send Command (JSON)
RustPBX->>MediaEngine: Process Command
MediaEngine->>ASR/TTS: Audio Processing
ASR/TTS->>MediaEngine: Processing Results
MediaEngine->>RustPBX: Generate Events
RustPBX->>Client: Send Events (JSON)
Note over Client,RustPBX: Audio Stream Flow
Client->>RustPBX: Audio Data (Binary/WebRTC/SIP)
RustPBX->>MediaEngine: Process Audio
MediaEngine->>Client: Audio Response
sequenceDiagram
participant Client
participant RustPBX
participant WebRTC Engine
participant ICE Servers
Client->>RustPBX: WebSocket Connect (/call/webrtc)
RustPBX->>Client: Connection Established
Client->>RustPBX: Send Invite Command with SDP Offer
RustPBX->>WebRTC Engine: Create PeerConnection
RustPBX->>ICE Servers: Get ICE Servers
WebRTC Engine->>RustPBX: Generate SDP Answer
RustPBX->>Client: Send Answer Event with SDP
Client->>RustPBX: Set Remote Description
Note over Client,RustPBX: WebRTC Media Flow
Client->>RustPBX: RTP Audio Packets (PCM/PCMA/PCMU/G722)
RustPBX->>Client: RTP Audio Response
Client->>RustPBX: Send TTS/Play Commands
RustPBX->>Client: Send Audio Events
sequenceDiagram
participant Client
participant RustPBX
participant SIP UA
participant SIP Server
Client->>RustPBX: WebSocket Connect (/call/sip)
RustPBX->>Client: Connection Established
Client->>RustPBX: Send Invite Command with Caller/Callee
RustPBX->>SIP UA: Create SIP Dialog
SIP UA->>SIP Server: Send INVITE Request
SIP Server->>SIP UA: Send 200 OK with SDP Answer
RustPBX->>Client: Send Answer Event with SDP
Client->>RustPBX: Set Remote Description
Note over SIP UA,SIP Server: SIP/RTP Media Flow
SIP UA->>SIP Server: RTP Audio Packets (PCM/PCMA/PCMU/G722)
SIP Server->>SIP UA: RTP Audio Response
Client->>RustPBX: Send TTS/Play Commands
RustPBX->>Client: Send Audio Events
- Audio Format: PCM, PCMA, PCMU, G722
- Transport: WebSocket binary messages
- Usage: Direct audio streaming over WebSocket connection
- Advantages: Simple, low latency, works through firewalls
- Audio Format: PCM, PCMA, PCMU, G722
- Transport: WebRTC RTP over UDP
- Usage: Browser-compatible, NAT traversal
- Advantages: Browser native support, adaptive bitrate
- Audio Format: PCM, PCMA, PCMU, G722
- Transport: SIP/RTP over UDP
- Usage: Traditional telephony integration
- Advantages: Standard telephony protocol, PBX integration
MediaPass allows for bidirectional audio streaming between RustPBX and an external WebSocket server. This feature enables another side to receive and send audio streams during a call.
The mediaPass option in CallOption configures the WebSocket connection for audio streaming:
{
"mediaPass": {
"url": "ws://localhost:9090/media",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 2560
}
}MediaPass Fields:
url(string): WebSocket URL to connect to for media streaminginputSampleRate(number): Sample rate of audio received from the WebSocket server (also the sample rate of the track)outputSampleRate(number): Sample rate of audio sent to the WebSocket serverpacketSize(number, optional): Packet size sent to WebSocket server, default is 2560 bytesptime(numer, optional): if ptime is set, server will buffering the input audio, and playing it withptimeperiod
{
"command": "invite",
"option": {
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"codec": "g722",
"mediaPass": {
"url": "ws://ai-server.rustpbx.com:9090/audio",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 1280
},
"asr": {
"provider": "tencent",
"language": "zh-CN",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"samplerate": 16000
}
}
}{
"command": "accept",
"option": {
"caller": "sip:caller@rustpbx.com",
"callee": "sip:agent@rustpbx.com",
"codec": "pcmu",
"denoise": true,
"mediaPass": {
"url": "ws://ai-voice-processor.rustpbx.com:8090/stream",
"inputSampleRate": 8000,
"outputSampleRate": 16000,
"packetSize": 2560
},
"vad": {
"type": "webrtc",
"samplerate": 16000,
"speechPadding": 250,
"silencePadding": 100,
"voiceThreshold": 0.5
},
"recorder": {
"recorderFile": "/recordings/call_with_ai.wav",
"samplerate": 16000,
"ptime": 200
}
}
}The external WebSocket server should handle binary audio data in PCM format:
- Receiving Audio: RustPBX sends PCM audio data as binary WebSocket messages at the configured
outputSampleRate - Sending Audio: The WebSocket server can send PCM audio data back to RustPBX at the configured
inputSampleRate - Audio Format: Raw PCM data, signed 16-bit little-endian
- Packet Size: Configurable via
packetSizeparameter (default: 2560 bytes)
sequenceDiagram
participant Caller
participant RustPBX
participant AI_Server
participant Callee
Caller->>RustPBX: Audio Stream
RustPBX->>AI_Server: PCM Audio (WebSocket)
AI_Server->>AI_Server: Process Audio (ASR/AI/TTS)
AI_Server->>RustPBX: Processed Audio (WebSocket)
RustPBX->>Callee: Processed Audio Stream
Note over Caller,Callee: Bidirectional AI-enhanced communication
Commands are sent as JSON messages through the WebSocket connection. All timestamps are in milliseconds. Each command follows a common structure with the command field indicating the operation type.
Purpose: Initiates a new outbound call.
Fields:
command(string): Always "invite"option(CallOption): Call configuration parameters
{
"command": "invite",
"option": {
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"offer": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n...",
"codec": "g722",
"denoise": true,
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"samplerate": 16000,
"startWhenAnswer": true
},
"tts": {
"provider": "tencent",
"speaker": "xiaoyan",
"volume": 5,
"speed": 1.0,
"emotion": "neutral"
}
}
}Purpose: Accepts an incoming call.
Fields:
command(string): Always "accept"option(CallOption): Call configuration parameters
{
"command": "accept",
"option": {
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"codec": "g722",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
}
}
}Purpose: Rejects an incoming call.
Fields:
command(string): Always "reject"reason(string): Reason for rejectioncode(number, optional): SIP response code
{
"command": "reject",
"reason": "Busy",
"code": 486
}Purpose: Sends ringing response for incoming call.
Note: If a
recorderis set in the ringing command, therecorderoption in the subsequent accept command will not override the recorder settings from the ringing phase.
Fields:
command(string): Always "ringing"recorder(RecorderOption, optional): Call recording configurationrecorderFile(string): Path to the recording filesamplerate(number): Recording sample rate in Hz (default: 16000)ptime(number): Packet time in milliseconds (default: 200)
earlyMedia(boolean): Enable early media during ringingringtone(string, optional): Custom ringtone URL
{
"command": "ringing",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
},
"earlyMedia": true,
"ringtone": "http://rustpbx.com/ringtone.wav"
}Purpose: Converts text to speech and plays audio.
Fields:
command(string): Always "tts"text(string): Text to synthesizespeaker(string, optional): Speaker voice nameplayId(string, optional): Unique identifier for this TTS session. If the same playId is used, it will not interrupt the previous playback.autoHangup(boolean, optional): If true, the call will be automatically hung up after TTS playback is finished.streaming(boolean, optional): If true, indicates streaming text input (like LLM streaming output).endOfStream(boolean, optional): If true, indicates the input text is finished (used with streaming).waitInputTimeout(number, optional): Maximum time to wait for user input in secondsoption(SynthesisOption, optional): TTS provider specific optionsbase64(bool, optional): If true, text is base64 encoded PCM samples of sample rate 16000 hz, DO NOT use this feature in Streaming TTS
{
"command": "tts",
"text": "Hello, this is a test message",
"speaker": "xiaoyan",
"playId": "unique_play_id",
"autoHangup": false,
"streaming": false,
"endOfStream": false,
"waitInputTimeout": 30,
"option": {
"provider": "tencent",
"speaker": "xiaoyan",
"volume": 5,
"speed": 1.0
}
}Purpose: Plays audio from a URL.
Fields:
command(string): Always "play"url(string): URL of audio file to play (supports HTTP/HTTPS URLs). This URL will be returned as playId in the trackEnd event.autoHangup(boolean, optional): If true, the call will be automatically hung up after playback is finished.waitInputTimeout(number, optional): Maximum time to wait for user input in seconds
{
"command": "play",
"url": "http://rustpbx.com/audio.mp3",
"autoHangup": false,
"waitInputTimeout": 30
}Purpose: Interrupts current TTS or audio playback.
Fields:
command(string): Always "interrupt"graceful(boolean, optional): If true, waits for the current TTS command to finish playing before stopping. Default:false.fadeOutMs(number, optional): Fade-out duration in milliseconds before stopping playback.
{
"command": "interrupt",
"graceful": false
}Purpose: Pauses current playback.
{
"command": "pause"
}Purpose: Resumes paused playback.
{
"command": "resume"
}Purpose: Transfers the call to another party (SIP REFER).
Fields:
command(string): Always "refer"caller(string): Caller identity for the transfercallee(string): Address of Record (AOR) of the transfer target (e.g., sip:bob@rustpbx.com)options(ReferOption, optional): Transfer configurationdenoise(boolean, optional): Enable noise reductiontimeout(number, optional): Transfer timeout in secondsmoh(string, optional): Music on hold URL to play during transferasr(TranscriptionOption, optional): Automatic Speech Recognition configurationprovider(string): ASR provider (e.g., "tencent", "aliyun", "openai")secretId(string): Provider secret IDsecretKey(string): Provider secret keyregion(string, optional): Provider regionmodel(string, optional): ASR model to use
autoHangup(boolean, optional): Automatically hang up after transfer completionsip(SipOption, optional): SIP configurationusername(string): SIP usernamepassword(string): SIP passwordrealm(string): SIP realm/domainheaders(object, optional): Additional SIP headers
{
"command": "refer",
"caller": "sip:alice@rustpbx.com",
"callee": "sip:charlie@rustpbx.com",
"options": {
"denoise": true,
"timeout": 30,
"moh": "http://rustpbx.com/hold_music.wav",
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"bufferSize": 4000,
"samplerate": 16000,
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"startWhenAnswer": true
},
"autoHangup": true,
"sip": {
"username": "transfer_user",
"password": "transfer_password",
"realm": "rustpbx.com",
"headers": {
"X-Transfer-Source": "pbx"
}
}
}
}Purpose: Mutes a specific audio track.
Fields:
command(string): Always "mute"trackId(string, optional): Track ID to mute (if not specified, mutes all tracks)
{
"command": "mute",
"trackId": "track-123"
}Purpose: Unmutes a specific audio track.
Fields:
command(string): Always "unmute"trackId(string, optional): Track ID to unmute (if not specified, unmutes all tracks)
{
"command": "unmute",
"trackId": "track-123"
}Purpose: Ends the call.
Fields:
command(string): Always "hangup"reason(string, optional): Reason for hanging upinitiator(string, optional): Who initiated the hangup (user, system, etc.)headers(object, optional): Additional SIP headers to include in the BYE request (SIP calls only)
{
"command": "hangup",
"reason": "user_requested",
"initiator": "user",
"headers": {
"X-Hangup-Cause": "normal"
}
}Purpose: Adds a conversation history entry.
Fields:
command(string): Always "history"speaker(string): Speaker identifiertext(string): Conversation text
{
"command": "history",
"speaker": "user",
"text": "Hello, I need help with my account"
}The CallOption object is used in invite and accept commands and contains the following fields:
{
"denoise": true,
"offer": "SDP offer string",
"callee": "sip:callee@rustpbx.com",
"caller": "sip:caller@rustpbx.com",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
},
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"bufferSize": 4000,
"samplerate": 16000,
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"startWhenAnswer": true
},
"vad": {
"type": "webrtc",
"samplerate": 16000,
"speechPadding": 250,
"silencePadding": 100,
"ratio": 0.5,
"voiceThreshold": 0.5,
"maxBufferDurationSecs": 50,
"silenceTimeout": null,
"endpoint": null,
"secretKey": null,
"secretId": null
},
"tts": {
"samplerate": 16000,
"provider": "tencent",
"speed": 1.0,
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"volume": 5,
"speaker": "1345",
"codec": "pcm",
"subtitle": true,
"emotion": "neutral",
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"cacheKey": "cache_key_example"
},
"mediaPass": {
"url": "ws://localhost:9090/media",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 2560
},
"handshakeTimeout": 30,
"enableIpv6": false,
"inactivityTimeout": 50,
"sip": {
"username": "user",
"password": "password",
"realm": "rustpbx.com",
"headers": {
"X-Custom-Header": "value"
}
},
"extra": {
"custom_field": "custom_value"
},
"codec": "g722",
"eou": {
"type": "tencent",
"endpoint": "https://api.rustpbx.com",
"secretKey": "your_secret_key",
"secretId": "your_secret_id",
"timeout": 5000
}
}CallOption Fields:
denoise(boolean, optional): Enable noise reduction for audio processingoffer(string, optional): SDP offer string for WebRTC/SIP negotiationcallee(string, optional): Callee's SIP URI or phone number (e.g., "sip:bob@rustpbx.com")caller(string, optional): Caller's SIP URI or phone number (e.g., "sip:alice@rustpbx.com")recorder(RecorderOption, optional): Call recording configurationrecorderFile(string): Path to the recording filesamplerate(number): Recording sample rate in Hz (default: 16000)ptime(number): Packet time in milliseconds (default: 200)
asr(TranscriptionOption, optional): Automatic Speech Recognition configurationprovider(string): ASR provider ("tencent", "aliyun", "voiceapi")language(string, optional): Language code (e.g., "zh-CN", "en-US")appId(string, optional): Application ID for the ASR servicesecretId(string, optional): Secret ID for authenticationsecretKey(string, optional): Secret key for authenticationmodelType(string, optional): ASR model type (e.g., "16k_zh", "8k_en")bufferSize(number, optional): Audio buffer size in bytessamplerate(number, optional): Audio sample rate for ASR processingendpoint(string, optional): Custom ASR service endpoint URLextra(object, optional): Additional provider-specific parametersstartWhenAnswer(boolean, optional): Start ASR when call is answered
vad(VADOption, optional): Voice Activity Detection configurationtype(string): VAD algorithm type ("silero")samplerate(number): Audio sample rate for VAD processing (default: 16000)speechPadding(number): Padding before speech detection in milliseconds (default: 250)silencePadding(number): Padding after silence detection in milliseconds (default: 100)ratio(number): Voice detection ratio threshold (default: 0.5)voiceThreshold(number): Voice energy threshold (default: 0.5)maxBufferDurationSecs(number): Maximum buffer duration in seconds (default: 50)silenceTimeout(number, optional): Timeout for silence detection in millisecondsendpoint(string, optional): Custom VAD service endpointsecretKey(string, optional): Secret key for VAD service authenticationsecretId(string, optional): Secret ID for VAD service authentication
tts(SynthesisOption, optional): Text-to-Speech configurationsamplerate(number, optional): TTS output sample rate in Hzprovider(string, optional): TTS provider ("tencent", "aliyun", "deepgram", "supertonic"). Default: "aliyun" for Chinese (zh), "supertonic" for English (en).speed(number, optional): Speech speed multiplier (default: 1.0)appId(string, optional): Application ID for TTS servicesecretId(string, optional): Secret ID for authenticationsecretKey(string, optional): Secret key for authenticationvolume(number, optional): Speech volume level (1-10)speaker(string, optional): Voice speaker name (e.g., "xiaoyan", "xiaoyun")codec(string, optional): Audio codec for TTS outputsubtitle(boolean, optional): Enable subtitle generationemotion(string, optional): Speech emotion ("neutral", "sad", "happy", "angry", "fear", "news", "story", "radio", "poetry", "call", "sajiao", "disgusted", "amaze", "peaceful", "exciting", "aojiao", "jieshuo")endpoint(string, optional): Custom TTS service endpoint URLextra(object, optional): Additional provider-specific parametersmaxConcurrentTasks(number,optional): Max Concurrent tasks for non streaming tts cmd
mediaPass(MediaPassOption, optional): Media pass-through configuration for external audio processingurl(string): WebSocket URL for media streaminginputSampleRate(number): Sample rate of audio received from WebSocket serveroutputSampleRate(number): Sample rate of audio sent to WebSocket serverpacketSize(number, optional): Packet size sent to WebSocket server in bytes (default: 2560)
subscribe(boolean, optional): Enable real-time audio subscription for non-WebSocket calls (SIP/WebRTC). If true, audio will be pushed via the control WebSocket using binary frames with a 1-byte track header (0x00 for caller, 0x01 for callee).handshakeTimeout(number, optional): Timeout for connection handshake in seconds (e.g., 30)enableIpv6(boolean, optional): Enable IPv6 support for networkinginactivityTimeout(number, optional): Timeout for audio inactivity in secondssip(SipOption, optional): SIP protocol configurationusername(string): SIP username for authenticationpassword(string): SIP password for authenticationrealm(string): SIP realm/domainheaders(object, optional): Additional SIP headers as key-value pairs
extra(object, optional): Additional custom parameters as key-value pairscodec(string, optional): Audio codec for WebSocket calls ("pcmu", "pcma", "g722", "pcm")eou(EouOption, optional): End of Utterance detection configurationtype(string, optional): EOU detection providerendpoint(string, optional): Custom EOU service endpoint URLsecretKey(string, optional): Secret key for EOU service authenticationsecretId(string, optional): Secret ID for EOU service authenticationtimeout(number, optional): Maximum timeout for EOU detection in milliseconds
The ReferOption object is used in the refer command and contains the following fields:
{
"denoise": true,
"timeout": 30,
"moh": "http://rustpbx.com/hold_music.wav",
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"bufferSize": 4000,
"samplerate": 16000,
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"startWhenAnswer": true
},
"autoHangup": true,
"sip": {
"username": "transfer_user",
"password": "transfer_password",
"realm": "rustpbx.com",
"headers": {
"X-Transfer-Source": "pbx"
}
}
}Fields:
denoise(boolean, optional): Enable noise reduction during transfertimeout(number, optional): Transfer timeout in secondsmoh(string, optional): Music on hold URL to play during transferasr(TranscriptionOption, optional): Automatic Speech Recognition configurationautoHangup(boolean, optional): Automatically hang up after transfer completionsip(SipOption, optional): SIP configuration for the transfer
Events are received as JSON messages from the server. All timestamps are in milliseconds. Each event contains an event field that indicates the event type, and most events include a trackId field to identify the associated audio track.
Triggered when: An incoming call is received (SIP calls only).
Fields:
event(string): Always "incoming"trackId(string): Unique identifier for the audio track. Used to identify which track generated this event.timestamp(number): Event timestamp in milliseconds since Unix epochcaller(string): Caller's SIP URI or phone numbercallee(string): Callee's SIP URI or phone numbersdp(string): SDP offer from the caller
{
"event": "incoming",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}Triggered when: Call is answered and SDP negotiation is complete.
Fields:
event(string): Always "answer"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochsdp(string): SDP answer from the server
{
"event": "answer",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}Triggered when: Call is rejected.
Fields:
event(string): Always "reject"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochreason(string): Reason for rejectioncode(number, optional): SIP response code
{
"event": "reject",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"reason": "Busy",
"code": 486
}Triggered when: Call is ringing (SIP calls only).
Fields:
event(string): Always "ringing"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochearlyMedia(boolean): Whether early media is available
{
"event": "ringing",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"earlyMedia": false
}Triggered when: Call is ended.
Fields:
event(string): Always "hangup"timestamp(number): Event timestamp in milliseconds since Unix epochreason(string, optional): Reason for hangupinitiator(string, optional): Who initiated the hangup (user, system, etc.)startTime(string): ISO 8601 timestamp when call startedhangupTime(string): ISO 8601 timestamp when call endedanswerTime(string, optional): ISO 8601 timestamp when call was answeredringingTime(string, optional): ISO 8601 timestamp when call started ringingfrom(Attendee, optional): Information about the callerto(Attendee, optional): Information about the calleeextra(object, optional): Additional call metadata
{
"event": "hangup",
"timestamp": 1640995200000,
"reason": "user_requested",
"initiator": "user",
"startTime": "2024-01-01T12:00:00Z",
"hangupTime": "2024-01-01T12:05:30Z",
"answerTime": "2024-01-01T12:00:05Z",
"ringingTime": "2024-01-01T12:00:02Z",
"from": {
"username": "alice",
"realm": "rustpbx.com",
"source": "sip:alice@rustpbx.com"
},
"to": {
"username": "bob",
"realm": "rustpbx.com",
"source": "sip:bob@rustpbx.com"
},
"extra": {
"call_quality": "good",
"network_type": "wifi"
}
}Triggered when: Voice activity detection detects speech start.
Fields:
event(string): Always "speaking"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number): When speech started in milliseconds since Unix epochisFiller(boolean, optional): Whether this speech segment is a filler wordconfidence(number, optional): Confidence score of the voice detection (0.0–1.0)
{
"event": "speaking",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995200000,
"isFiller": false,
"confidence": 0.95
}Triggered when: Voice activity detection detects silence.
Fields:
event(string): Always "silence"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number): When silence started in milliseconds since Unix epochduration(number): Duration of silence in milliseconds
{
"event": "silence",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995195000,
"duration": 5000
}Triggered when: Answer machine detection algorithm identifies automated response.
Fields:
event(string): Always "answerMachineDetection"timestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number): Detection window start time in milliseconds since Unix epochendTime(number): Detection window end time in milliseconds since Unix epochtext(string): Detected automated message text
{
"event": "answerMachineDetection",
"timestamp": 1640995200000,
"startTime": 1640995200000,
"endTime": 1640995205000,
"text": "Hello, you have reached ABC Company. Please leave a message..."
}Triggered when: End of utterance detection identifies when user has finished speaking.
Fields:
event(string): Always "eou"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochcompleted(boolean): Whether the utterance was completed normallyinterruptPoint(string, optional): Position in TTS subtitle text where the interruption occurred
{
"event": "eou",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"completed": true,
"interruptPoint": null
}Triggered when: ASR provides final transcription result.
Fields:
event(string): Always "asrFinal"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochindex(number): ASR result sequence numberstartTime(number, optional): Start time of speech in milliseconds since Unix epochendTime(number, optional): End time of speech in milliseconds since Unix epochtext(string): Final transcribed textisFiller(boolean, optional): Whether this result is a filler wordconfidence(number, optional): Confidence score (0.0–1.0)taskId(string, optional): ASR provider task identifier
{
"event": "asrFinal",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"index": 1,
"startTime": 1640995200000,
"endTime": 1640995205000,
"text": "Hello, how can I help you today?",
"isFiller": false,
"confidence": 0.98,
"taskId": "asr-task-001"
}Triggered when: ASR provides partial transcription result (streaming mode).
Fields:
event(string): Always "asrDelta"trackId(string): Unique identifier for the audio track.index(number): ASR result sequence numbertimestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number, optional): Start time of speech in milliseconds since Unix epochendTime(number, optional): End time of speech in milliseconds since Unix epochtext(string): Partial transcribed textisFiller(boolean, optional): Whether this result is a filler wordconfidence(number, optional): Confidence score (0.0–1.0)taskId(string, optional): ASR provider task identifier
{
"event": "asrDelta",
"trackId": "track-abc123",
"index": 1,
"timestamp": 1640995200000,
"startTime": 1640995200000,
"endTime": 1640995203000,
"text": "Hello, how can",
"isFiller": false,
"confidence": 0.85
}Triggered when: Audio track starts (TTS, file playback, etc.).
Fields:
event(string): Always "trackStart"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochplayId(string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.
{
"event": "trackStart",
"trackId": "track-tts-456",
"timestamp": 1640995200000,
"playId": "llm-001"
}Triggered when: Audio track ends (TTS finished, file playback finished, etc.).
Fields:
event(string): Always "trackEnd"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochduration(number): Duration of track in millisecondsssrc(number): RTP Synchronization Source identifierplayId(string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.
{
"event": "trackEnd",
"trackId": "track-tts-456",
"timestamp": 1640995230000,
"duration": 30000,
"ssrc": 1234567890,
"playId": "llm-001"
}Triggered when: Current playback is interrupted by user input or another command.
Fields:
event(string): Always "interruption"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochplayId(string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.subtitle(string, optional): Current TTS text being played when interruptedposition(number, optional): Word index position in the subtitle when interruptedtotalDuration(number): Total duration of the TTS content in millisecondscurrent(number): Elapsed time since start of TTS when interrupted in milliseconds
{
"event": "interruption",
"trackId": "track-tts-456",
"timestamp": 1640995215000,
"playId": "llm-001",
"subtitle": "Hello, this is a long message that was interrupted",
"position": 5,
"totalDuration": 30000,
"current": 15000
}Triggered when: DTMF tone is detected.
Fields:
event(string): Always "dtmf"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochdigit(string): DTMF digit (0-9, *, #, A-D)
{
"event": "dtmf",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"digit": "1"
}Triggered when: Server sends a periodic heartbeat to keep the connection alive.
Fields:
event(string): Always "ping"timestamp(number): Event timestamp in milliseconds since Unix epochpayload(string, optional): ISO 8601 timestamp of the ping
The client should respond with a WebSocket Pong frame (this is handled automatically by most WebSocket clients). The server sends a Ping every
ping_intervalseconds (default: 20). Setping_interval=0to disable.
{
"event": "ping",
"timestamp": 1640995200000,
"payload": "2024-01-01T12:00:00Z"
}Triggered when: A call is placed on hold or taken off hold.
Fields:
event(string): Always "hold"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochonHold(boolean):trueif call is now on hold,falseif taken off hold
{
"event": "hold",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"onHold": true
}Triggered when: Audio inactivity timeout expires (no audio activity detected for inactivityTimeout seconds).
Fields:
event(string): Always "inactivity"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epoch
{
"event": "inactivity",
"trackId": "track-abc123",
"timestamp": 1640995200000
}Triggered when: A function/tool call is made by the AI agent (Playbook mode).
Fields:
event(string): Always "functionCall"trackId(string): Unique identifier for the audio track.callId(string): Unique identifier for this function callname(string): Name of the function being calledarguments(string): JSON-encoded arguments string for the functiontimestamp(number): Event timestamp in milliseconds since Unix epoch
{
"event": "functionCall",
"trackId": "track-abc123",
"callId": "call-uuid-123",
"name": "get_weather",
"arguments": "{\"city\": \"Beijing\"}",
"timestamp": 1640995200000
}Triggered when: Performance metrics are available.
Fields:
event(string): Always "metrics"timestamp(number): Event timestamp in milliseconds since Unix epochkey(string): Metric key (e.g., "ttfb.asr.tencent", "completed.asr.tencent")duration(number): Duration in millisecondsdata(object): Additional metric data
{
"event": "metrics",
"timestamp": 1640995200000,
"key": "ttfb.asr.tencent",
"duration": 150,
"data": {
"index": 1,
"provider": "tencent",
"model": "16k_zh"
}
}Triggered when: An error occurs during processing.
Fields:
event(string): Always "error"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochsender(string): Component that generated the error (asr, tts, media, etc.)error(string): Error message descriptioncode(number, optional): Error code
{
"event": "error",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "asr",
"error": "Connection timeout to ASR service",
"code": 408
}Triggered when: A conversation history entry is added.
Fields:
event(string): Always "addHistory"sender(string, optional): Component that added the history entrytimestamp(number): Event timestamp in milliseconds since Unix epochspeaker(string): Speaker identifier (user, assistant, system, etc.)text(string): Conversation text
{
"event": "addHistory",
"sender": "system",
"timestamp": 1640995200000,
"speaker": "user",
"text": "Hello, I need help with my account"
}Triggered when: Binary audio data is sent (WebSocket calls or calls with subscribe: true).
Fields:
event(string): Always "binary"trackId(string): Unique identifier for the audio track. For subscribed SIP/WebRTC calls, Caller usesserver-side-trackid, Callee uses the session ID.timestamp(number): Event timestamp in milliseconds since Unix epochdata(array): Binary audio data bytes. Insubscribemode, the first byte is the track index (0 for Caller, 1 for Callee) followed by original PCM data.
{
"event": "binary",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"data": [/* binary audio data array */]
}Triggered when: Custom or extension events are generated.
Fields:
event(string): Always "other"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochsender(string): Component that generated the eventextra(object, optional): Additional event data as key-value pairs
{
"event": "other",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "custom_plugin",
"extra": {
"custom_field": "custom_value",
"plugin_version": "1.0.0"
}
}The Attendee object appears in call events and contains participant information:
{
"username": "alice",
"realm": "rustpbx.com",
"source": "sip:alice@rustpbx.com"
}Fields:
username(string): Username portion of the SIP URIrealm(string): Domain/realm portion of the SIP URIsource(string): Full SIP URI or phone number
Endpoint: GET /list
Description: Returns a list of all currently active calls.
Parameters: None
Response:
{
"active_calls": [
{
"id": "s.session-id",
"callType": "webrtc",
"cs.option": { ... },
"ringTime": "2024-01-01T12:00:02Z",
"startTime": "2024-01-01T12:00:05Z"
}
]
}Usage:
curl http://localhost:8080/listEndpoint: GET /kill/{id}
Description: Terminates a specific active call by its session ID.
Parameters:
id(path parameter, string): The session ID of the call to terminate.
Response:
{ "status": "killed", "id": "s.session123" }If the session is not found:
{ "status": "not_found", "id": "s.session123" }Usage:
curl http://localhost:8080/kill/s.session123Endpoint: POST /command/{id}
Description: Sends a command to a specific active call by its session ID. Accepts the same command objects as the WebSocket command interface.
Parameters:
id(path parameter, string): The session ID of the target call.
Request Body: A command object (see WebSocket Commands for the full list).
{ "command": "tts", "text": "Hello, how can I help you?" }Response:
{ "status": "sent", "id": "s.session123" }If the session is not found:
{ "status": "not_found", "id": "s.session123" }Usage:
curl -X POST http://localhost:8080/command/s.session123 \
-H "Content-Type: application/json" \
-d '{"command": "hangup", "reason": "normal", "initiator": "server"}'Endpoint: GET /iceservers
Description: Returns ICE servers configuration for WebRTC connections.
Parameters: None
Response:
[
{
"urls": ["stun:stun.l.google.com:19302"],
"username": null,
"credential": null
},
{
"urls": ["turn:restsend.com:3478"],
"username": "username",
"credential": "password"
}
]Usage:
curl http://localhost:8080/iceserversEndpoint: GET /events/{id}
Description: Opens a Server-Sent Events (SSE) stream for a specific active call, delivering real-time session events and commands as they occur.
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id |
string | Active call/track ID |
Response: text/event-stream;charset=utf-8
The stream emits two SSE event types:
| SSE Event | Data |
|---|---|
event |
JSON-serialized SessionEvent (same as WebSocket events) |
command |
JSON-serialized command sent to the session |
The stream closes when the call ends (channel closed). Lagged messages are silently skipped.
Errors:
| Status | Description |
|---|---|
| 404 | No active call found for given id |
Usage:
curl -N http://localhost:8080/events/{id}Example output:
event: event
data: {"event":"answer","trackId":"track-abc","timestamp":1700000000}
event: command
data: {"command":"tts","text":"Hello, how can I help you?"}
Endpoint: GET /api/playbooks
Description: Returns a list of all available playbook files in config/playbook/.
Response:
[
{ "name": "demo.md", "updated": "2024-01-01T12:00:00Z" },
{ "name": "simple-demo-en.md", "updated": "2024-01-02T08:00:00Z" }
]Usage:
curl http://localhost:8080/api/playbooksEndpoint: GET /api/playbooks/{name}
Description: Returns the content of a specific playbook file.
Parameters:
name(path parameter, string): Playbook filename (e.g.,demo.md)
Response: Plain text content of the playbook file.
Usage:
curl http://localhost:8080/api/playbooks/demo.mdEndpoint: POST /api/playbooks/{name}
Description: Creates or updates a playbook file.
Parameters:
name(path parameter, string): Playbook filename (e.g.,my-playbook.md)- Body: Plain text playbook content
Response: 200 OK on success.
Usage:
curl -X POST http://localhost:8080/api/playbooks/my-playbook.md \
-H "Content-Type: text/plain" \
--data-binary @my-playbook.mdEndpoint: POST /api/playbook/run
Description: Associates a playbook with a future WebSocket session. When the session connects, the playbook will automatically be loaded.
Request Body (JSON):
{
"playbook": "demo.md",
"type": "webrtc",
"to": "sip:bob@example.com"
}Or with inline content:
{
"content": "---\nname: inline-demo\n...",
"type": "webrtc"
}Fields:
playbook(string): Playbook filename to load fromconfig/playbook/content(string): Inline YAML playbook content (alternative toplaybook)type(string, optional): Call type hintto(string, optional): Callee address
Response:
{ "session_id": "s.uuid-here" }Use the returned session_id as the id parameter when connecting the WebSocket.
Usage:
curl -X POST http://localhost:8080/api/playbook/run \
-H "Content-Type: application/json" \
-d '{"playbook": "demo.md"}'Endpoint: GET /api/records
Description: Returns a list of call event records (.events.jsonl files in the recorder directory).
Response:
[
{ "id": "s.session-uuid", "date": "2024-01-01T12:00:00Z", "duration": "0s", "status": "completed" }
]Usage:
curl http://localhost:8080/api/recordsAll endpoints return appropriate HTTP status codes:
200 OK: Success400 Bad Request: Invalid parameters404 Not Found: Resource not found500 Internal Server Error: Server error
WebSocket connections may be closed with specific close codes indicating the reason for disconnection.
- All WebSocket endpoints support real-time bidirectional communication
- Call sessions are automatically cleaned up when the WebSocket connection is closed
- Event dumping to file can be disabled by setting
dump_events=falsequery parameter - ICE servers are automatically configured based on server configuration
- Audio codecs are automatically negotiated based on capabilities
- VAD (Voice Activity Detection) events are sent for speech detection
- ASR (Automatic Speech Recognition) provides real-time transcription
- TTS (Text-to-Speech) supports streaming synthesis
- All timestamps are in milliseconds
- trackId is used to identify which audio track generated an event
- playId prevents interruption of previous TTS playback when the same ID is used. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
- Session IDs generated by the server are prefixed with
s.(WebSocket sessions) orc.(CLI outbound calls) - The
ping_intervalparameter controls heartbeat frequency (default 20s). Set to 0 to disable - autoHangup automatically ends the call after TTS/playback completion