Skip to content

[Feature Request] Real-time Audio Streaming from HFP to Web Client #42

@albal

Description

@albal

Summary

This is a feature request to explore the possibility of streaming live audio from a phone call (received via the Bluetooth HFP profile) through the ESP32 and out to a connected web client for real-time playback in a web browser.

This would allow a user to monitor or participate in a phone call directly from a web interface, effectively turning the project into a remote audio gateway or a simple web-based softphone.

Proposed Architecture

The end-to-end data flow for this feature would look like this:

  1. Phone Call Audio Source: A live phone call on a paired mobile device.
  2. Bluetooth HFP Link: The phone streams the call audio to the ESP32. The audio data is typically mono, 8kHz/16kHz 16-bit PCM.
  3. ESP32 Firmware (The Bridge):
    • Capture HFP Audio: The firmware would need to use the esp_hf_client_register_data_callback() function to register a callback. This callback would receive raw PCM audio buffers from the Bluetooth stack in real-time.
    • WebSocket Server: The existing HTTP server would be augmented with a WebSocket endpoint (e.g., /ws).
    • Real-time Relay: The HFP audio data callback would immediately take the received PCM data and forward it over the established WebSocket connection to any listening clients.
  4. Web Client (Browser):
    • A WebSocket connection is established to the ESP32.
    • The JavaScript front-end uses the Web Audio API to process the incoming raw PCM data. It buffers these small chunks and schedules them for seamless playback, creating a continuous audio stream.

Implementation Sketch

A proof-of-concept would require significant changes to both the firmware and the web front-end.

1. ESP32 Firmware Changes:

// 1. A new callback function to handle incoming audio data
void hfp_audio_data_callback(const uint8_t *data, uint32_t len)
{
    // This function is called by the BT stack with PCM data.
    // It needs to send the `data` buffer of `len` bytes
    // over an active WebSocket connection.
    // Example:
    // httpd_ws_send_frame_to_all_clients(data, len, HTTPD_WS_TYPE_BINARY);
}

// 2. In app_main(), after initializing the HFP client:
void app_main(void)
{
    // ... existing HFP init ...
    ret = esp_hf_client_register_callback(esp_hf_client_cb);

    // Register the new data callback to capture audio
    ret = esp_hf_client_register_data_callback(hfp_audio_data_callback);

    // ... rest of app_main ...
}

// 3. A WebSocket handler needs to be added to the httpd_server setup.

2. Web Client (JavaScript) Changes:

// This is a simplified example. A robust solution needs a proper jitter buffer.

// Connect to the ESP32's WebSocket endpoint
const socket = new WebSocket('ws://' + window.location.host + '/ws');
socket.binaryType = 'arraybuffer';

// Initialize the Web Audio API with the correct sample rate from HFP (e.g., 8000Hz)
const audioContext = new AudioContext({ sampleRate: 8000 });
let nextPlayTime = 0;

socket.onmessage = async (event) => {
    // 1. Get raw PCM data (Int16) from the ArrayBuffer
    const pcmData = new Int16Array(event.data);

    // 2. Create an AudioBuffer
    const audioBuffer = audioContext.createBuffer(
        1, // Number of channels (mono)
        pcmData.length, // Buffer length
        audioContext.sampleRate // Sample rate
    );

    // 3. Convert Int16 data to Float32 and copy to the buffer
    const float32Data = new Float32Array(pcmData.length);
    for (let i = 0; i < pcmData.length; i++) {
        float32Data[i] = pcmData[i] / 32768.0; // Convert 16-bit PCM to float
    }
    audioBuffer.copyToChannel(float32Data, 0);

    // 4. Schedule the buffer for seamless playback
    const source = audioContext.createBufferSource();
    source.buffer = audioBuffer;
    source.connect(audioContext.destination);

    // Simple scheduling to play buffers back-to-back
    if (audioContext.currentTime > nextPlayTime) {
        nextPlayTime = audioContext.currentTime;
    }
    source.start(nextPlayTime);
    nextPlayTime += audioBuffer.duration;
};

Key Challenges & Considerations

This is a very demanding feature for the ESP32 hardware for several reasons:

  • Real-time Constraints: The entire pipeline must operate with minimal latency to be usable for conversation.
  • Radio Coexistence: This feature requires high-throughput, simultaneous use of both the Bluetooth and Wi-Fi radios. This is a significant performance challenge and can lead to packet loss and audio stuttering.
  • CPU & Memory Load: The ESP32 must handle the HFP stack, a Wi-Fi TCP/IP stack, a WebSocket server, and the real-time data relay logic. This will put a heavy load on the CPU and requires careful memory management.
  • Network Jitter: Wi-Fi is not a real-time protocol. The web client would need a sophisticated jitter buffer to provide smooth audio playback despite variations in network packet arrival times.

Conclusion

While this feature is technically feasible, it represents a massive increase in project complexity. It would require deep expertise in real-time embedded programming, network streaming, and advanced web development. It should be considered a major undertaking rather than a simple addition.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions