Active Call API Documentation

This document describes the WebSocket and REST API endpoints provided by Active Call.

Base URL

All API endpoints are relative to the server base URL.

Authentication

Most endpoints require WebSocket upgrade for real-time communication.

WebSocket Call Endpoints

The following three endpoints establish WebSocket connections for different voice communication protocols:

1. WebSocket Call Handler

Endpoint: GET /call

Description: Establishes a WebSocket connection for voice call handling with audio stream transmitted via WebSocket.

Parameters:

id (optional, string): Session ID. If not provided, a new UUID will be generated (prefixed with s.).
dump_events (optional, boolean): Enable event dumping to file. Default: true.
ping_interval (optional, number): Interval in seconds to send Ping events. Default: 20. Set to 0 to disable.
server_side_track (optional, string): Override server-side track ID.

Response: WebSocket connection upgrade

Usage:

const ws = new WebSocket('ws://localhost:8080/call?id=session123&dump_events=true&ping_interval=20');

2. WebRTC Call Handler

Endpoint: GET /call/webrtc

Description: Establishes a WebSocket connection for WebRTC call handling with audio stream transmitted via WebRTC RTP.

Note: WebRTC requires a Secure Context. Ensure you are accessing your web client via HTTPS or 127.0.0.1, otherwise the browser will not enable WebRTC functionality.

Parameters:

id (optional, string): Session ID. If not provided, a new UUID will be generated (prefixed with s.).
dump_events (optional, boolean): Enable event dumping to file. Default: true.
ping_interval (optional, number): Interval in seconds to send Ping events. Default: 20. Set to 0 to disable.
server_side_track (optional, string): Override server-side track ID.

Response: WebSocket connection upgrade

Usage:

const ws = new WebSocket('ws://localhost:8080/call/webrtc?id=session123&dump_events=true');

3. SIP Call Handler

Endpoint: GET /call/sip

Description: Establishes a WebSocket connection for SIP call handling with audio stream transmitted via SIP/RTP.

Parameters:

id (optional, string): Session ID. If not provided, a new UUID will be generated (prefixed with s.).
dump_events (optional, boolean): Enable event dumping to file. Default: true.
ping_interval (optional, number): Interval in seconds to send Ping events. Default: 20. Set to 0 to disable.
server_side_track (optional, string): Override server-side track ID.

Response: WebSocket connection upgrade

Usage:

const ws = new WebSocket('ws://localhost:8080/call/sip?id=session123&dump_events=true');

WebSocket Communication Flow

sequenceDiagram
    participant Client
    participant RustPBX
    participant MediaEngine
    participant ASR/TTS

    Client->>RustPBX: WebSocket Connect
    RustPBX->>Client: Connection Established
    
    Client->>RustPBX: Send Command (JSON)
    RustPBX->>MediaEngine: Process Command
    MediaEngine->>ASR/TTS: Audio Processing
    
    ASR/TTS->>MediaEngine: Processing Results
    MediaEngine->>RustPBX: Generate Events
    RustPBX->>Client: Send Events (JSON)
    
    Note over Client,RustPBX: Audio Stream Flow
    Client->>RustPBX: Audio Data (Binary/WebRTC/SIP)
    RustPBX->>MediaEngine: Process Audio
    MediaEngine->>Client: Audio Response

WebRTC Call Flow

sequenceDiagram
    participant Client
    participant RustPBX
    participant WebRTC Engine
    participant ICE Servers

    Client->>RustPBX: WebSocket Connect (/call/webrtc)
    RustPBX->>Client: Connection Established
    
    Client->>RustPBX: Send Invite Command with SDP Offer
    RustPBX->>WebRTC Engine: Create PeerConnection
    RustPBX->>ICE Servers: Get ICE Servers
    WebRTC Engine->>RustPBX: Generate SDP Answer
    
    RustPBX->>Client: Send Answer Event with SDP
    Client->>RustPBX: Set Remote Description
    
    Note over Client,RustPBX: WebRTC Media Flow
    Client->>RustPBX: RTP Audio Packets (PCM/PCMA/PCMU/G722)
    RustPBX->>Client: RTP Audio Response
    
    Client->>RustPBX: Send TTS/Play Commands
    RustPBX->>Client: Send Audio Events

SIP Call Flow

sequenceDiagram
    participant Client
    participant RustPBX
    participant SIP UA
    participant SIP Server

    Client->>RustPBX: WebSocket Connect (/call/sip)
    RustPBX->>Client: Connection Established
    
    Client->>RustPBX: Send Invite Command with Caller/Callee
    RustPBX->>SIP UA: Create SIP Dialog
    SIP UA->>SIP Server: Send INVITE Request
    SIP Server->>SIP UA: Send 200 OK with SDP Answer
    
    RustPBX->>Client: Send Answer Event with SDP
    Client->>RustPBX: Set Remote Description
    
    Note over SIP UA,SIP Server: SIP/RTP Media Flow
    SIP UA->>SIP Server: RTP Audio Packets (PCM/PCMA/PCMU/G722)
    SIP Server->>SIP UA: RTP Audio Response
    
    Client->>RustPBX: Send TTS/Play Commands
    RustPBX->>Client: Send Audio Events

Voice Stream Communication Methods

1. WebSocket Audio Stream (`/call`)

Audio Format: PCM, PCMA, PCMU, G722
Transport: WebSocket binary messages
Usage: Direct audio streaming over WebSocket connection
Advantages: Simple, low latency, works through firewalls

2. WebRTC Audio Stream (`/call/webrtc`)

Audio Format: PCM, PCMA, PCMU, G722
Transport: WebRTC RTP over UDP
Usage: Browser-compatible, NAT traversal
Advantages: Browser native support, adaptive bitrate

3. SIP Audio Stream (`/call/sip`)

Audio Format: PCM, PCMA, PCMU, G722
Transport: SIP/RTP over UDP
Usage: Traditional telephony integration
Advantages: Standard telephony protocol, PBX integration

MediaPass Feature

MediaPass allows for bidirectional audio streaming between RustPBX and an external WebSocket server. This feature enables another side to receive and send audio streams during a call.

MediaPass Configuration

The mediaPass option in CallOption configures the WebSocket connection for audio streaming:

{
  "mediaPass": {
    "url": "ws://localhost:9090/media",
    "inputSampleRate": 16000,
    "outputSampleRate": 16000,
    "packetSize": 2560
  }
}

MediaPass Fields:

url (string): WebSocket URL to connect to for media streaming
inputSampleRate (number): Sample rate of audio received from the WebSocket server (also the sample rate of the track)
outputSampleRate (number): Sample rate of audio sent to the WebSocket server
packetSize (number, optional): Packet size sent to WebSocket server, default is 2560 bytes
ptime (numer, optional): if ptime is set, server will buffering the input audio, and playing it with ptime period

MediaPass Example Usage

Example 1: Basic MediaPass Setup

{
  "command": "invite",
  "option": {
    "caller": "sip:alice@rustpbx.com",
    "callee": "sip:bob@rustpbx.com",
    "codec": "g722",
    "mediaPass": {
      "url": "ws://ai-server.rustpbx.com:9090/audio",
      "inputSampleRate": 16000,
      "outputSampleRate": 16000,
      "packetSize": 1280
    },
    "asr": {
      "provider": "tencent",
      "language": "zh-CN",
      "secretId": "your_secret_id",
      "secretKey": "your_secret_key",
      "modelType": "16k_zh",
      "samplerate": 16000
    }
  }
}

Example 2: MediaPass with AI Voice Processing

{
  "command": "accept",
  "option": {
    "caller": "sip:caller@rustpbx.com",
    "callee": "sip:agent@rustpbx.com",
    "codec": "pcmu",
    "denoise": true,
    "mediaPass": {
      "url": "ws://ai-voice-processor.rustpbx.com:8090/stream",
      "inputSampleRate": 8000,
      "outputSampleRate": 16000,
      "packetSize": 2560
    },
    "vad": {
      "type": "webrtc",
      "samplerate": 16000,
      "speechPadding": 250,
      "silencePadding": 100,
      "voiceThreshold": 0.5
    },
    "recorder": {
      "recorderFile": "/recordings/call_with_ai.wav",
      "samplerate": 16000,
      "ptime": 200
    }
  }
}

MediaPass WebSocket Protocol

The external WebSocket server should handle binary audio data in PCM format:

Receiving Audio: RustPBX sends PCM audio data as binary WebSocket messages at the configured outputSampleRate
Sending Audio: The WebSocket server can send PCM audio data back to RustPBX at the configured inputSampleRate
Audio Format: Raw PCM data, signed 16-bit little-endian
Packet Size: Configurable via packetSize parameter (default: 2560 bytes)

MediaPass Flow Diagram

sequenceDiagram
    participant Caller
    participant RustPBX
    participant AI_Server
    participant Callee

    Caller->>RustPBX: Audio Stream
    RustPBX->>AI_Server: PCM Audio (WebSocket)
    AI_Server->>AI_Server: Process Audio (ASR/AI/TTS)
    AI_Server->>RustPBX: Processed Audio (WebSocket)
    RustPBX->>Callee: Processed Audio Stream
    
    Note over Caller,Callee: Bidirectional AI-enhanced communication

WebSocket Commands

Commands are sent as JSON messages through the WebSocket connection. All timestamps are in milliseconds. Each command follows a common structure with the command field indicating the operation type.

Core Call Management Commands

Invite Command

Purpose: Initiates a new outbound call.

Fields:

command (string): Always "invite"
option (CallOption): Call configuration parameters

{
  "command": "invite",
  "option": {
    "caller": "sip:alice@rustpbx.com",
    "callee": "sip:bob@rustpbx.com",
    "offer": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n...",
    "codec": "g722",
    "denoise": true,
    "asr": {
      "provider": "tencent",
      "language": "zh-CN",
      "appId": "app_id",
      "secretId": "your_secret_id",
      "secretKey": "your_secret_key",
      "modelType": "16k_zh",
      "samplerate": 16000,
      "startWhenAnswer": true
    },
    "tts": {
      "provider": "tencent",
      "speaker": "xiaoyan",
      "volume": 5,
      "speed": 1.0,
      "emotion": "neutral"
    }
  }
}

Accept Command

Purpose: Accepts an incoming call.

Fields:

command (string): Always "accept"
option (CallOption): Call configuration parameters

{
  "command": "accept",
  "option": {
    "caller": "sip:alice@rustpbx.com",
    "callee": "sip:bob@rustpbx.com",
    "codec": "g722",
    "recorder": {
      "recorderFile": "/path/to/recording.wav",
      "samplerate": 16000,
      "ptime": 200
    }
  }
}

Reject Command

Purpose: Rejects an incoming call.

Fields:

command (string): Always "reject"
reason (string): Reason for rejection
code (number, optional): SIP response code

{
  "command": "reject",
  "reason": "Busy",
  "code": 486
}

Ringing Command

Purpose: Sends ringing response for incoming call.

Note: If a recorder is set in the ringing command, the recorder option in the subsequent accept command will not override the recorder settings from the ringing phase.

Fields:

command (string): Always "ringing"
recorder (RecorderOption, optional): Call recording configuration
- recorderFile (string): Path to the recording file
- samplerate (number): Recording sample rate in Hz (default: 16000)
- ptime (number): Packet time in milliseconds (default: 200)
earlyMedia (boolean): Enable early media during ringing
ringtone (string, optional): Custom ringtone URL

{
  "command": "ringing",
  "recorder": {
      "recorderFile": "/path/to/recording.wav",
      "samplerate": 16000,
      "ptime": 200
  },
  "earlyMedia": true,
  "ringtone": "http://rustpbx.com/ringtone.wav"
}

Media Control Commands

TTS Command

Purpose: Converts text to speech and plays audio.

Fields:

command (string): Always "tts"
text (string): Text to synthesize
speaker (string, optional): Speaker voice name
playId (string, optional): Unique identifier for this TTS session. If the same playId is used, it will not interrupt the previous playback.
autoHangup (boolean, optional): If true, the call will be automatically hung up after TTS playback is finished.
streaming (boolean, optional): If true, indicates streaming text input (like LLM streaming output).
endOfStream (boolean, optional): If true, indicates the input text is finished (used with streaming).
waitInputTimeout (number, optional): Maximum time to wait for user input in seconds
option (SynthesisOption, optional): TTS provider specific options
base64 (bool, optional): If true, text is base64 encoded PCM samples of sample rate 16000 hz, DO NOT use this feature in Streaming TTS

{
  "command": "tts",
  "text": "Hello, this is a test message",
  "speaker": "xiaoyan",
  "playId": "unique_play_id",
  "autoHangup": false,
  "streaming": false,
  "endOfStream": false,
  "waitInputTimeout": 30,
  "option": {
    "provider": "tencent",
    "speaker": "xiaoyan",
    "volume": 5,
    "speed": 1.0
  }
}

Play Command

Purpose: Plays audio from a URL.

Fields:

command (string): Always "play"
url (string): URL of audio file to play (supports HTTP/HTTPS URLs). This URL will be returned as playId in the trackEnd event.
autoHangup (boolean, optional): If true, the call will be automatically hung up after playback is finished.
waitInputTimeout (number, optional): Maximum time to wait for user input in seconds

{
  "command": "play",
  "url": "http://rustpbx.com/audio.mp3",
  "autoHangup": false,
  "waitInputTimeout": 30
}

Interrupt Command

Purpose: Interrupts current TTS or audio playback.

Fields:

command (string): Always "interrupt"
graceful (boolean, optional): If true, waits for the current TTS command to finish playing before stopping. Default: false.
fadeOutMs (number, optional): Fade-out duration in milliseconds before stopping playback.

{
  "command": "interrupt",
  "graceful": false
}

Pause Command

Purpose: Pauses current playback.

{
  "command": "pause"
}

Resume Command

Purpose: Resumes paused playback.

{
  "command": "resume"
}

Call Transfer Commands

Refer Command

Purpose: Transfers the call to another party (SIP REFER).

Fields:

command (string): Always "refer"
caller (string): Caller identity for the transfer
callee (string): Address of Record (AOR) of the transfer target (e.g., sip:bob@rustpbx.com)
options (ReferOption, optional): Transfer configuration
- denoise (boolean, optional): Enable noise reduction
- timeout (number, optional): Transfer timeout in seconds
- moh (string, optional): Music on hold URL to play during transfer
- asr (TranscriptionOption, optional): Automatic Speech Recognition configuration
  - provider (string): ASR provider (e.g., "tencent", "aliyun", "openai")
  - secretId (string): Provider secret ID
  - secretKey (string): Provider secret key
  - region (string, optional): Provider region
  - model (string, optional): ASR model to use
- autoHangup (boolean, optional): Automatically hang up after transfer completion
- sip (SipOption, optional): SIP configuration
  - username (string): SIP username
  - password (string): SIP password
  - realm (string): SIP realm/domain
  - headers (object, optional): Additional SIP headers

{
  "command": "refer",
  "caller": "sip:alice@rustpbx.com",
  "callee": "sip:charlie@rustpbx.com",
  "options": {
    "denoise": true,
    "timeout": 30,
    "moh": "http://rustpbx.com/hold_music.wav",
    "asr": {
      "provider": "tencent",
      "language": "zh-CN",
      "appId": "app_id",
      "secretId": "your_secret_id",
      "secretKey": "your_secret_key",
      "modelType": "16k_zh",
      "bufferSize": 4000,
      "samplerate": 16000,
      "endpoint": "https://api.rustpbx.com",
      "extra": {
        "custom_param": "value"
      },
      "startWhenAnswer": true
    },
    "autoHangup": true,
    "sip": {
      "username": "transfer_user",
      "password": "transfer_password",
      "realm": "rustpbx.com",
      "headers": {
        "X-Transfer-Source": "pbx"
      }
    }
  }
}

Audio Track Control Commands

Mute Command

Purpose: Mutes a specific audio track.

Fields:

command (string): Always "mute"
trackId (string, optional): Track ID to mute (if not specified, mutes all tracks)

{
  "command": "mute",
  "trackId": "track-123"
}

Unmute Command

Purpose: Unmutes a specific audio track.

Fields:

command (string): Always "unmute"
trackId (string, optional): Track ID to unmute (if not specified, unmutes all tracks)

{
  "command": "unmute",
  "trackId": "track-123"
}

Session Management Commands

Hangup Command

Purpose: Ends the call.

Fields:

command (string): Always "hangup"
reason (string, optional): Reason for hanging up
initiator (string, optional): Who initiated the hangup (user, system, etc.)
headers (object, optional): Additional SIP headers to include in the BYE request (SIP calls only)

{
  "command": "hangup",
  "reason": "user_requested",
  "initiator": "user",
  "headers": {
    "X-Hangup-Cause": "normal"
  }
}

History Command

Purpose: Adds a conversation history entry.

Fields:

command (string): Always "history"
speaker (string): Speaker identifier
text (string): Conversation text

{
  "command": "history",
  "speaker": "user",
  "text": "Hello, I need help with my account"
}

CallOption Object Structure

The CallOption object is used in invite and accept commands and contains the following fields:

{
  "denoise": true,
  "offer": "SDP offer string",
  "callee": "sip:callee@rustpbx.com",
  "caller": "sip:caller@rustpbx.com",
  "recorder": {
    "recorderFile": "/path/to/recording.wav",
    "samplerate": 16000,
    "ptime": 200
  },
  "asr": {
    "provider": "tencent",
    "language": "zh-CN",
    "appId": "app_id",
    "secretId": "your_secret_id",
    "secretKey": "your_secret_key",
    "modelType": "16k_zh",
    "bufferSize": 4000,
    "samplerate": 16000,
    "endpoint": "https://api.rustpbx.com",
    "extra": {
      "custom_param": "value"
    },
    "startWhenAnswer": true
  },
  "vad": {
    "type": "webrtc",
    "samplerate": 16000,
    "speechPadding": 250,
    "silencePadding": 100,
    "ratio": 0.5,
    "voiceThreshold": 0.5,
    "maxBufferDurationSecs": 50,
    "silenceTimeout": null,
    "endpoint": null,
    "secretKey": null,
    "secretId": null
  },
  "tts": {
    "samplerate": 16000,
    "provider": "tencent",
    "speed": 1.0,
    "appId": "app_id",
    "secretId": "your_secret_id",
    "secretKey": "your_secret_key",
    "volume": 5,
    "speaker": "1345",
    "codec": "pcm",
    "subtitle": true,
    "emotion": "neutral",
    "endpoint": "https://api.rustpbx.com",
    "extra": {
      "custom_param": "value"
    },
    "cacheKey": "cache_key_example"
  },
  "mediaPass": {
    "url": "ws://localhost:9090/media",
    "inputSampleRate": 16000,
    "outputSampleRate": 16000,
    "packetSize": 2560
  },
  "handshakeTimeout": 30,
  "enableIpv6": false,
  "inactivityTimeout": 50,
  "sip": {
    "username": "user",
    "password": "password",
    "realm": "rustpbx.com",
    "headers": {
      "X-Custom-Header": "value"
    }
  },
  "extra": {
    "custom_field": "custom_value"
  },
  "codec": "g722",
  "eou": {
    "type": "tencent",
    "endpoint": "https://api.rustpbx.com",
    "secretKey": "your_secret_key",
    "secretId": "your_secret_id",
    "timeout": 5000
  }
}

CallOption Fields:

denoise (boolean, optional): Enable noise reduction for audio processing
offer (string, optional): SDP offer string for WebRTC/SIP negotiation
callee (string, optional): Callee's SIP URI or phone number (e.g., "sip:bob@rustpbx.com")
caller (string, optional): Caller's SIP URI or phone number (e.g., "sip:alice@rustpbx.com")
recorder (RecorderOption, optional): Call recording configuration
- recorderFile (string): Path to the recording file
- samplerate (number): Recording sample rate in Hz (default: 16000)
- ptime (number): Packet time in milliseconds (default: 200)
asr (TranscriptionOption, optional): Automatic Speech Recognition configuration
- provider (string): ASR provider ("tencent", "aliyun", "voiceapi")
- language (string, optional): Language code (e.g., "zh-CN", "en-US")
- appId (string, optional): Application ID for the ASR service
- secretId (string, optional): Secret ID for authentication
- secretKey (string, optional): Secret key for authentication
- modelType (string, optional): ASR model type (e.g., "16k_zh", "8k_en")
- bufferSize (number, optional): Audio buffer size in bytes
- samplerate (number, optional): Audio sample rate for ASR processing
- endpoint (string, optional): Custom ASR service endpoint URL
- extra (object, optional): Additional provider-specific parameters
- startWhenAnswer (boolean, optional): Start ASR when call is answered
vad (VADOption, optional): Voice Activity Detection configuration
- type (string): VAD algorithm type ("silero")
- samplerate (number): Audio sample rate for VAD processing (default: 16000)
- speechPadding (number): Padding before speech detection in milliseconds (default: 250)
- silencePadding (number): Padding after silence detection in milliseconds (default: 100)
- ratio (number): Voice detection ratio threshold (default: 0.5)
- voiceThreshold (number): Voice energy threshold (default: 0.5)
- maxBufferDurationSecs (number): Maximum buffer duration in seconds (default: 50)
- silenceTimeout (number, optional): Timeout for silence detection in milliseconds
- endpoint (string, optional): Custom VAD service endpoint
- secretKey (string, optional): Secret key for VAD service authentication
- secretId (string, optional): Secret ID for VAD service authentication
tts (SynthesisOption, optional): Text-to-Speech configuration
- samplerate (number, optional): TTS output sample rate in Hz
- provider (string, optional): TTS provider ("tencent", "aliyun", "deepgram", "supertonic"). Default: "aliyun" for Chinese (zh), "supertonic" for English (en).
- speed (number, optional): Speech speed multiplier (default: 1.0)
- appId (string, optional): Application ID for TTS service
- secretId (string, optional): Secret ID for authentication
- secretKey (string, optional): Secret key for authentication
- volume (number, optional): Speech volume level (1-10)
- speaker (string, optional): Voice speaker name (e.g., "xiaoyan", "xiaoyun")
- codec (string, optional): Audio codec for TTS output
- subtitle (boolean, optional): Enable subtitle generation
- emotion (string, optional): Speech emotion ("neutral", "sad", "happy", "angry", "fear", "news", "story", "radio", "poetry", "call", "sajiao", "disgusted", "amaze", "peaceful", "exciting", "aojiao", "jieshuo")
- endpoint (string, optional): Custom TTS service endpoint URL
- extra (object, optional): Additional provider-specific parameters
- maxConcurrentTasks (number,optional): Max Concurrent tasks for non streaming tts cmd
mediaPass (MediaPassOption, optional): Media pass-through configuration for external audio processing
- url (string): WebSocket URL for media streaming
- inputSampleRate (number): Sample rate of audio received from WebSocket server
- outputSampleRate (number): Sample rate of audio sent to WebSocket server
- packetSize (number, optional): Packet size sent to WebSocket server in bytes (default: 2560)
subscribe (boolean, optional): Enable real-time audio subscription for non-WebSocket calls (SIP/WebRTC). If true, audio will be pushed via the control WebSocket using binary frames with a 1-byte track header (0x00 for caller, 0x01 for callee).
handshakeTimeout (number, optional): Timeout for connection handshake in seconds (e.g., 30)
enableIpv6 (boolean, optional): Enable IPv6 support for networking
inactivityTimeout (number, optional): Timeout for audio inactivity in seconds
sip (SipOption, optional): SIP protocol configuration
- username (string): SIP username for authentication
- password (string): SIP password for authentication
- realm (string): SIP realm/domain
- headers (object, optional): Additional SIP headers as key-value pairs
extra (object, optional): Additional custom parameters as key-value pairs
codec (string, optional): Audio codec for WebSocket calls ("pcmu", "pcma", "g722", "pcm")
eou (EouOption, optional): End of Utterance detection configuration
- type (string, optional): EOU detection provider
- endpoint (string, optional): Custom EOU service endpoint URL
- secretKey (string, optional): Secret key for EOU service authentication
- secretId (string, optional): Secret ID for EOU service authentication
- timeout (number, optional): Maximum timeout for EOU detection in milliseconds

ReferOption Object Structure

The ReferOption object is used in the refer command and contains the following fields:

{
  "denoise": true,
  "timeout": 30,
  "moh": "http://rustpbx.com/hold_music.wav",
  "asr": {
    "provider": "tencent",
    "language": "zh-CN",
    "appId": "app_id",
    "secretId": "your_secret_id",
    "secretKey": "your_secret_key",
    "modelType": "16k_zh",
    "bufferSize": 4000,
    "samplerate": 16000,
    "endpoint": "https://api.rustpbx.com",
    "extra": {
      "custom_param": "value"
    },
    "startWhenAnswer": true
  },
  "autoHangup": true,
  "sip": {
    "username": "transfer_user",
    "password": "transfer_password",
    "realm": "rustpbx.com",
    "headers": {
      "X-Transfer-Source": "pbx"
    }
  }
}

Fields:

denoise (boolean, optional): Enable noise reduction during transfer
timeout (number, optional): Transfer timeout in seconds
moh (string, optional): Music on hold URL to play during transfer
asr (TranscriptionOption, optional): Automatic Speech Recognition configuration
autoHangup (boolean, optional): Automatically hang up after transfer completion
sip (SipOption, optional): SIP configuration for the transfer

WebSocket Events

Events are received as JSON messages from the server. All timestamps are in milliseconds. Each event contains an event field that indicates the event type, and most events include a trackId field to identify the associated audio track.

Call Lifecycle Events

Incoming Event

Triggered when: An incoming call is received (SIP calls only).

Fields:

event (string): Always "incoming"
trackId (string): Unique identifier for the audio track. Used to identify which track generated this event.
timestamp (number): Event timestamp in milliseconds since Unix epoch
caller (string): Caller's SIP URI or phone number
callee (string): Callee's SIP URI or phone number
sdp (string): SDP offer from the caller

{
  "event": "incoming",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "caller": "sip:alice@rustpbx.com",
  "callee": "sip:bob@rustpbx.com",
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}

Answer Event

Triggered when: Call is answered and SDP negotiation is complete.

Fields:

event (string): Always "answer"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
sdp (string): SDP answer from the server

{
  "event": "answer",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}

Reject Event

Triggered when: Call is rejected.

Fields:

event (string): Always "reject"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
reason (string): Reason for rejection
code (number, optional): SIP response code

{
  "event": "reject",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "reason": "Busy",
  "code": 486
}

Ringing Event

Triggered when: Call is ringing (SIP calls only).

Fields:

event (string): Always "ringing"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
earlyMedia (boolean): Whether early media is available

{
  "event": "ringing",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "earlyMedia": false
}

Hangup Event

Triggered when: Call is ended.

Fields:

event (string): Always "hangup"
timestamp (number): Event timestamp in milliseconds since Unix epoch
reason (string, optional): Reason for hangup
initiator (string, optional): Who initiated the hangup (user, system, etc.)
startTime (string): ISO 8601 timestamp when call started
hangupTime (string): ISO 8601 timestamp when call ended
answerTime (string, optional): ISO 8601 timestamp when call was answered
ringingTime (string, optional): ISO 8601 timestamp when call started ringing
from (Attendee, optional): Information about the caller
to (Attendee, optional): Information about the callee
extra (object, optional): Additional call metadata

{
  "event": "hangup",
  "timestamp": 1640995200000,
  "reason": "user_requested",
  "initiator": "user",
  "startTime": "2024-01-01T12:00:00Z",
  "hangupTime": "2024-01-01T12:05:30Z",
  "answerTime": "2024-01-01T12:00:05Z",
  "ringingTime": "2024-01-01T12:00:02Z",
  "from": {
    "username": "alice",
    "realm": "rustpbx.com",
    "source": "sip:alice@rustpbx.com"
  },
  "to": {
    "username": "bob",
    "realm": "rustpbx.com", 
    "source": "sip:bob@rustpbx.com"
  },
  "extra": {
    "call_quality": "good",
    "network_type": "wifi"
  }
}

Voice Activity Detection Events

Speaking Event

Triggered when: Voice activity detection detects speech start.

Fields:

event (string): Always "speaking"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
startTime (number): When speech started in milliseconds since Unix epoch
isFiller (boolean, optional): Whether this speech segment is a filler word
confidence (number, optional): Confidence score of the voice detection (0.0–1.0)

{
  "event": "speaking",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "startTime": 1640995200000,
  "isFiller": false,
  "confidence": 0.95
}

Silence Event

Triggered when: Voice activity detection detects silence.

Fields:

event (string): Always "silence"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
startTime (number): When silence started in milliseconds since Unix epoch
duration (number): Duration of silence in milliseconds

{
  "event": "silence",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "startTime": 1640995195000,
  "duration": 5000
}

AI and Speech Processing Events

Answer Machine Detection Event

Triggered when: Answer machine detection algorithm identifies automated response.

Fields:

event (string): Always "answerMachineDetection"
timestamp (number): Event timestamp in milliseconds since Unix epoch
startTime (number): Detection window start time in milliseconds since Unix epoch
endTime (number): Detection window end time in milliseconds since Unix epoch
text (string): Detected automated message text

{
  "event": "answerMachineDetection",
  "timestamp": 1640995200000,
  "startTime": 1640995200000,
  "endTime": 1640995205000,
  "text": "Hello, you have reached ABC Company. Please leave a message..."
}

EOU (End of Utterance) Event

Triggered when: End of utterance detection identifies when user has finished speaking.

Fields:

event (string): Always "eou"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
completed (boolean): Whether the utterance was completed normally
interruptPoint (string, optional): Position in TTS subtitle text where the interruption occurred

{
  "event": "eou",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "completed": true,
  "interruptPoint": null
}

ASR Final Event

Triggered when: ASR provides final transcription result.

Fields:

event (string): Always "asrFinal"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
index (number): ASR result sequence number
startTime (number, optional): Start time of speech in milliseconds since Unix epoch
endTime (number, optional): End time of speech in milliseconds since Unix epoch
text (string): Final transcribed text
isFiller (boolean, optional): Whether this result is a filler word
confidence (number, optional): Confidence score (0.0–1.0)
taskId (string, optional): ASR provider task identifier

{
  "event": "asrFinal",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "index": 1,
  "startTime": 1640995200000,
  "endTime": 1640995205000,
  "text": "Hello, how can I help you today?",
  "isFiller": false,
  "confidence": 0.98,
  "taskId": "asr-task-001"
}

ASR Delta Event

Triggered when: ASR provides partial transcription result (streaming mode).

Fields:

event (string): Always "asrDelta"
trackId (string): Unique identifier for the audio track.
index (number): ASR result sequence number
timestamp (number): Event timestamp in milliseconds since Unix epoch
startTime (number, optional): Start time of speech in milliseconds since Unix epoch
endTime (number, optional): End time of speech in milliseconds since Unix epoch
text (string): Partial transcribed text
isFiller (boolean, optional): Whether this result is a filler word
confidence (number, optional): Confidence score (0.0–1.0)
taskId (string, optional): ASR provider task identifier

{
  "event": "asrDelta",
  "trackId": "track-abc123",
  "index": 1,
  "timestamp": 1640995200000,
  "startTime": 1640995200000,
  "endTime": 1640995203000,
  "text": "Hello, how can",
  "isFiller": false,
  "confidence": 0.85
}

Audio Track Events

Track Start Event

Triggered when: Audio track starts (TTS, file playback, etc.).

Fields:

event (string): Always "trackStart"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
playId (string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.

{
  "event": "trackStart",
  "trackId": "track-tts-456",
  "timestamp": 1640995200000,
  "playId": "llm-001"
}

Track End Event

Triggered when: Audio track ends (TTS finished, file playback finished, etc.).

Fields:

event (string): Always "trackEnd"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
duration (number): Duration of track in milliseconds
ssrc (number): RTP Synchronization Source identifier
playId (string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.

{
  "event": "trackEnd",
  "trackId": "track-tts-456",
  "timestamp": 1640995230000,
  "duration": 30000,
  "ssrc": 1234567890,
  "playId": "llm-001"
}

Interruption Event

Triggered when: Current playback is interrupted by user input or another command.

Fields:

event (string): Always "interruption"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
playId (string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.
subtitle (string, optional): Current TTS text being played when interrupted
position (number, optional): Word index position in the subtitle when interrupted
totalDuration (number): Total duration of the TTS content in milliseconds
current (number): Elapsed time since start of TTS when interrupted in milliseconds

{
  "event": "interruption",
  "trackId": "track-tts-456",
  "timestamp": 1640995215000,
  "playId": "llm-001",
  "subtitle": "Hello, this is a long message that was interrupted",
  "position": 5,
  "totalDuration": 30000,
  "current": 15000
}

User Input Events

DTMF Event

Triggered when: DTMF tone is detected.

Fields:

event (string): Always "dtmf"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
digit (string): DTMF digit (0-9, *, #, A-D)

{
  "event": "dtmf",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "digit": "1"
}

System Events

Ping Event

Triggered when: Server sends a periodic heartbeat to keep the connection alive.

Fields:

event (string): Always "ping"
timestamp (number): Event timestamp in milliseconds since Unix epoch
payload (string, optional): ISO 8601 timestamp of the ping

The client should respond with a WebSocket Pong frame (this is handled automatically by most WebSocket clients). The server sends a Ping every ping_interval seconds (default: 20). Set ping_interval=0 to disable.

{
  "event": "ping",
  "timestamp": 1640995200000,
  "payload": "2024-01-01T12:00:00Z"
}

Hold Event

Triggered when: A call is placed on hold or taken off hold.

Fields:

event (string): Always "hold"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
onHold (boolean): true if call is now on hold, false if taken off hold

{
  "event": "hold",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "onHold": true
}

Inactivity Event

Triggered when: Audio inactivity timeout expires (no audio activity detected for inactivityTimeout seconds).

Fields:

event (string): Always "inactivity"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch

{
  "event": "inactivity",
  "trackId": "track-abc123",
  "timestamp": 1640995200000
}

FunctionCall Event

Triggered when: A function/tool call is made by the AI agent (Playbook mode).

Fields:

event (string): Always "functionCall"
trackId (string): Unique identifier for the audio track.
callId (string): Unique identifier for this function call
name (string): Name of the function being called
arguments (string): JSON-encoded arguments string for the function
timestamp (number): Event timestamp in milliseconds since Unix epoch

{
  "event": "functionCall",
  "trackId": "track-abc123",
  "callId": "call-uuid-123",
  "name": "get_weather",
  "arguments": "{\"city\": \"Beijing\"}",
  "timestamp": 1640995200000
}

Metrics Event

Triggered when: Performance metrics are available.

Fields:

event (string): Always "metrics"
timestamp (number): Event timestamp in milliseconds since Unix epoch
key (string): Metric key (e.g., "ttfb.asr.tencent", "completed.asr.tencent")
duration (number): Duration in milliseconds
data (object): Additional metric data

{
  "event": "metrics",
  "timestamp": 1640995200000,
  "key": "ttfb.asr.tencent",
  "duration": 150,
  "data": {
    "index": 1,
    "provider": "tencent",
    "model": "16k_zh"
  }
}

Error Event

Triggered when: An error occurs during processing.

Fields:

event (string): Always "error"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
sender (string): Component that generated the error (asr, tts, media, etc.)
error (string): Error message description
code (number, optional): Error code

{
  "event": "error",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sender": "asr",
  "error": "Connection timeout to ASR service",
  "code": 408
}

Add History Event

Triggered when: A conversation history entry is added.

Fields:

event (string): Always "addHistory"
sender (string, optional): Component that added the history entry
timestamp (number): Event timestamp in milliseconds since Unix epoch
speaker (string): Speaker identifier (user, assistant, system, etc.)
text (string): Conversation text

{
  "event": "addHistory",
  "sender": "system",
  "timestamp": 1640995200000,
  "speaker": "user",
  "text": "Hello, I need help with my account"
}

Binary Event (Audio Data)

Triggered when: Binary audio data is sent (WebSocket calls or calls with subscribe: true).

Fields:

event (string): Always "binary"
trackId (string): Unique identifier for the audio track. For subscribed SIP/WebRTC calls, Caller uses server-side-trackid, Callee uses the session ID.
timestamp (number): Event timestamp in milliseconds since Unix epoch
data (array): Binary audio data bytes. In subscribe mode, the first byte is the track index (0 for Caller, 1 for Callee) followed by original PCM data.

{
  "event": "binary",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "data": [/* binary audio data array */]
}

Other Event

Triggered when: Custom or extension events are generated.

Fields:

event (string): Always "other"
trackId (string): Unique identifier for the audio track.
timestamp (number): Event timestamp in milliseconds since Unix epoch
sender (string): Component that generated the event
extra (object, optional): Additional event data as key-value pairs

{
  "event": "other",
  "trackId": "track-abc123",
  "timestamp": 1640995200000,
  "sender": "custom_plugin",
  "extra": {
    "custom_field": "custom_value",
    "plugin_version": "1.0.0"
  }
}

Attendee Object Structure

The Attendee object appears in call events and contains participant information:

{
  "username": "alice",
  "realm": "rustpbx.com",
  "source": "sip:alice@rustpbx.com"
}

Fields:

username (string): Username portion of the SIP URI
realm (string): Domain/realm portion of the SIP URI
source (string): Full SIP URI or phone number

REST API Endpoints

4. List Active Calls

Endpoint: GET /list

Description: Returns a list of all currently active calls.

Parameters: None

Response:

{
  "active_calls": [
    {
      "id": "s.session-id",
      "callType": "webrtc",
      "cs.option": { ... },
      "ringTime": "2024-01-01T12:00:02Z",
      "startTime": "2024-01-01T12:00:05Z"
    }
  ]
}

Usage:

curl http://localhost:8080/list

5. Kill Call

Endpoint: GET /kill/{id}

Description: Terminates a specific active call by its session ID.

Parameters:

id (path parameter, string): The session ID of the call to terminate.

Response:

{ "status": "killed", "id": "s.session123" }

If the session is not found:

{ "status": "not_found", "id": "s.session123" }

Usage:

curl http://localhost:8080/kill/s.session123

6. Send Command

Endpoint: POST /command/{id}

Description: Sends a command to a specific active call by its session ID. Accepts the same command objects as the WebSocket command interface.

Parameters:

id (path parameter, string): The session ID of the target call.

Request Body: A command object (see WebSocket Commands for the full list).

{ "command": "tts", "text": "Hello, how can I help you?" }

Response:

{ "status": "sent", "id": "s.session123" }

If the session is not found:

{ "status": "not_found", "id": "s.session123" }

Usage:

curl -X POST http://localhost:8080/command/s.session123 \
  -H "Content-Type: application/json" \
  -d '{"command": "hangup", "reason": "normal", "initiator": "server"}'

7. Get ICE Servers

Endpoint: GET /iceservers

Description: Returns ICE servers configuration for WebRTC connections.

Parameters: None

Response:

[
  {
    "urls": ["stun:stun.l.google.com:19302"],
    "username": null,
    "credential": null
  },
  {
    "urls": ["turn:restsend.com:3478"],
    "username": "username",
    "credential": "password"
  }
]

Usage:

curl http://localhost:8080/iceservers

8. Stream Events

Endpoint: GET /events/{id}

Description: Opens a Server-Sent Events (SSE) stream for a specific active call, delivering real-time session events and commands as they occur.

Path Parameters:

Parameter	Type	Description
`id`	string	Active call/track ID

Response: text/event-stream;charset=utf-8

The stream emits two SSE event types:

SSE Event	Data
`event`	JSON-serialized `SessionEvent` (same as WebSocket events)
`command`	JSON-serialized command sent to the session

The stream closes when the call ends (channel closed). Lagged messages are silently skipped.

Errors:

Status	Description
404	No active call found for given `id`

Usage:

curl -N http://localhost:8080/events/{id}

Example output:

event: event
data: {"event":"answer","trackId":"track-abc","timestamp":1700000000}

event: command
data: {"command":"tts","text":"Hello, how can I help you?"}

9. Playbook API

List Playbooks

Endpoint: GET /api/playbooks

Description: Returns a list of all available playbook files in config/playbook/.

Response:

[
  { "name": "demo.md", "updated": "2024-01-01T12:00:00Z" },
  { "name": "simple-demo-en.md", "updated": "2024-01-02T08:00:00Z" }
]

Usage:

curl http://localhost:8080/api/playbooks

Get Playbook

Endpoint: GET /api/playbooks/{name}

Description: Returns the content of a specific playbook file.

Parameters:

name (path parameter, string): Playbook filename (e.g., demo.md)

Response: Plain text content of the playbook file.

Usage:

curl http://localhost:8080/api/playbooks/demo.md

Save Playbook

Endpoint: POST /api/playbooks/{name}

Description: Creates or updates a playbook file.

Parameters:

name (path parameter, string): Playbook filename (e.g., my-playbook.md)
Body: Plain text playbook content

Response: 200 OK on success.

Usage:

curl -X POST http://localhost:8080/api/playbooks/my-playbook.md \
  -H "Content-Type: text/plain" \
  --data-binary @my-playbook.md

Run Playbook

Endpoint: POST /api/playbook/run

Description: Associates a playbook with a future WebSocket session. When the session connects, the playbook will automatically be loaded.

Request Body (JSON):

{
  "playbook": "demo.md",
  "type": "webrtc",
  "to": "sip:bob@example.com"
}

Or with inline content:

{
  "content": "---\nname: inline-demo\n...",
  "type": "webrtc"
}

Fields:

playbook (string): Playbook filename to load from config/playbook/
content (string): Inline YAML playbook content (alternative to playbook)
type (string, optional): Call type hint
to (string, optional): Callee address

Response:

{ "session_id": "s.uuid-here" }

Use the returned session_id as the id parameter when connecting the WebSocket.

Usage:

curl -X POST http://localhost:8080/api/playbook/run \
  -H "Content-Type: application/json" \
  -d '{"playbook": "demo.md"}'

List Records

Endpoint: GET /api/records

Description: Returns a list of call event records (.events.jsonl files in the recorder directory).

Response:

[
  { "id": "s.session-uuid", "date": "2024-01-01T12:00:00Z", "duration": "0s", "status": "completed" }
]

Usage:

curl http://localhost:8080/api/records

Error Handling

All endpoints return appropriate HTTP status codes:

200 OK: Success
400 Bad Request: Invalid parameters
404 Not Found: Resource not found
500 Internal Server Error: Server error

WebSocket connections may be closed with specific close codes indicating the reason for disconnection.

Notes

All WebSocket endpoints support real-time bidirectional communication
Call sessions are automatically cleaned up when the WebSocket connection is closed
Event dumping to file can be disabled by setting dump_events=false query parameter
ICE servers are automatically configured based on server configuration
Audio codecs are automatically negotiated based on capabilities
VAD (Voice Activity Detection) events are sent for speech detection
ASR (Automatic Speech Recognition) provides real-time transcription
TTS (Text-to-Speech) supports streaming synthesis
All timestamps are in milliseconds
trackId is used to identify which audio track generated an event
playId prevents interruption of previous TTS playback when the same ID is used. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
Session IDs generated by the server are prefixed with s. (WebSocket sessions) or c. (CLI outbound calls)
The ping_interval parameter controls heartbeat frequency (default 20s). Set to 0 to disable
autoHangup automatically ends the call after TTS/playback completion

FilesExpand file tree

api.md

Latest commit

History

api.md

File metadata and controls

Active Call API Documentation

Base URL

Authentication

WebSocket Call Endpoints

1. WebSocket Call Handler

2. WebRTC Call Handler

3. SIP Call Handler

WebSocket Communication Flow

WebRTC Call Flow

SIP Call Flow

Voice Stream Communication Methods

1. WebSocket Audio Stream (/call)

2. WebRTC Audio Stream (/call/webrtc)

3. SIP Audio Stream (/call/sip)

MediaPass Feature

MediaPass Configuration

MediaPass Example Usage

Example 1: Basic MediaPass Setup

Example 2: MediaPass with AI Voice Processing

MediaPass WebSocket Protocol

MediaPass Flow Diagram

WebSocket Commands

Core Call Management Commands

Invite Command

Accept Command

Reject Command

Ringing Command

Media Control Commands

TTS Command

Play Command

Interrupt Command

Pause Command

Resume Command

Call Transfer Commands

Refer Command

Audio Track Control Commands

Mute Command

Unmute Command

Session Management Commands

Hangup Command

History Command

CallOption Object Structure

ReferOption Object Structure

WebSocket Events

Call Lifecycle Events

Incoming Event

Answer Event

Reject Event

Ringing Event

Hangup Event

Voice Activity Detection Events

Speaking Event

Silence Event

AI and Speech Processing Events

Answer Machine Detection Event

EOU (End of Utterance) Event

ASR Final Event

ASR Delta Event

Audio Track Events

Track Start Event

Track End Event

Interruption Event

User Input Events

DTMF Event

System Events

Ping Event

Hold Event

Inactivity Event

FunctionCall Event

Metrics Event

Error Event

Add History Event

Binary Event (Audio Data)

Other Event

1. WebSocket Audio Stream (`/call`)

2. WebRTC Audio Stream (`/call/webrtc`)

3. SIP Audio Stream (`/call/sip`)