Feature Proposal: “Hold Music” — Expectation-Setting Voice Prompts During High Latency

During long response times (e.g. 10+ seconds when GPT is “thinking”), users currently sit in silence until the first audio stream begins. This can feel awkward or like the system hasn’t heard them.

Instead of filler sounds (“um,” “ah”), we can use this gap to *set expectations* in a natural, human way. Example: a short cached line in the selected voice such as:

* “Got it, let me think about that for a moment.”
* “Hmm, that’s a big one. Give me a second.”
* “Okay, I’m working on it.”

This reassures the user they’ve been heard and prepares them for a thoughtful response, similar to how in human conversation it’s acceptable to say, “Let me think about that.”

---

**Proposed Approach**

1. **Latency detection**

   * Detect when a model’s expected response time > ~10s (e.g. GPT-5 thinking mode).
   * Only trigger “Hold Music” prompts in these cases, not for instant responses.

2. **Prompt playback**

   * Short audio snippets in the **selected voice + language**.
   * Can be **cached** phrases or **generated on the fly** (TTS).
   * Randomize from a small pool for variety.

3. **User experience goals**

   * Communicate “you’ve been heard.”
   * Reduce awkward silence.
   * Maintain **tone consistency** (voice, language, pacing).
   * Feel natural and optional (can be toggled off).

---

**Technical Options**

* **Cached audio prompts** per supported voice → cheap, fast, reliable.
* **Dynamic generation** with lightweight LLM → flexible, more variety, but more expensive.

Fallback: If no audio asset is available in the current voice/language, skip playback.

---

**Open Questions**

* Threshold: is 10s the right cutoff? Should it be adaptive?
* Should this live entirely client-side (pre-recorded prompts) or server-side (generated per request)?
* Do we expose a user setting: “Play expectation-setting prompts when response time is high”?
* How do we avoid confusion between “Hold Music” and actual AI content (make it clearly a system/acknowledgement message)?

---

**Why It Matters**

* Makes conversations feel smoother and more human.
* Sets the right expectations instead of leaving users in silence.
* Reinforces trust that the assistant is listening, even when it needs time to think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: “Hold Music” — Expectation-Setting Voice Prompts During High Latency #228

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Proposal: “Hold Music” — Expectation-Setting Voice Prompts During High Latency #228

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions