Skip to content

fix(inbound): preserve quoted context for voice messages with ref_msg (#48)#63

Open
draix wants to merge 1 commit intoTencent:mainfrom
draix:fix/48-voice-quoted-context
Open

fix(inbound): preserve quoted context for voice messages with ref_msg (#48)#63
draix wants to merge 1 commit intoTencent:mainfrom
draix:fix/48-voice-quoted-context

Conversation

@draix
Copy link
Copy Markdown

@draix draix commented Apr 14, 2026

Summary

When a user sends a voice message that quotes/replies to a previous text message, the agent received the transcribed text but lost the quoted context. Text replies with ref_msg already prepended [引用: ...] correctly — voice replies did not, leaving the agent unable to understand what the user was responding to.

Root cause

bodyFromItemList() in src/messaging/inbound.ts handled TEXT + ref_msg with full quoted-context logic, but the VOICE branch returned voice_item.text unconditionally without checking ref_msg:

// Before: ref_msg silently ignored for voice
if (item.type === MessageItemType.VOICE && item.voice_item?.text) {
  return item.voice_item.text;  // ← ref_msg dropped entirely
}

Fix

Apply the same quoted-context logic to VOICE as TEXT. The fix is structurally identical to the TEXT branch above it — same null checks, same isMediaItem guard, same parts accumulation, same output format.

Scenario Before After
Voice replies to text (title) yes please [引用: Can you schedule a meeting?]\nyes please
Voice replies to text (content) agreed [引用: Let's meet at 3pm]\nagreed
Voice replies to image/video/file/voice nice photo nice photo (unchanged — media can't be quoted as text)
Standalone voice (no ref_msg) schedule meeting schedule meeting (unchanged)
Untranscribed voice `` `` (unchanged)

Testing

11 new tests in describe("voice messages with quoted context (#48)"):

Quoted context included:

  • title-only ref_msg[引用: title]\n<voice text>
  • message_item-only ref_msg[引用: content]\n<voice text>
  • title + message_item[引用: title | content]\n<voice text>

Quoted context omitted (media replies — can't express as text):

  • ref_msg is IMAGE → voice text only
  • ref_msg is VIDEO → voice text only
  • ref_msg is FILE → voice text only
  • ref_msg is VOICE → voice text only

No-change regression guards:

  • empty ref_msg → voice text only
  • standalone voice (no ref_msg) → voice text only
  • untranscribed voice, no ref_msg → empty body
  • untranscribed voice, with ref_msg → empty body

tsc --noEmit passes. All 36 inbound tests pass.

Note on src/auth/pairing.test.ts: the one failing test in the suite (uses withFileLock for concurrency safety) reproduces on main without this change — it is pre-existing and unrelated to this PR.

Fixes #48

…Tencent#48)

When a user sends a voice message that quotes/replies to a previous text
message, the transcribed text was returned without the quoted context.
Text replies with ref_msg already prepended '[引用: ...]' — voice replies
did not, making the agent unaware of what the user was responding to.

Root cause
----------
bodyFromItemList() handled TEXT + ref_msg with full quoted-context logic
but the VOICE branch returned voice_item.text unconditionally, ignoring
ref_msg entirely.

Fix
---
Apply the same quoted-context logic to VOICE as TEXT:
- If ref_msg is absent: return transcribed text as-is (unchanged).
- If ref_msg.message_item is a media type (IMAGE/VIDEO/FILE/VOICE):
  return transcribed text only — media cannot be quoted as text.
- If ref_msg has title and/or a TEXT message_item: prepend
  '[引用: <title> | <text>]' before the transcribed voice text.
- If voice has no transcription (voice_item.text absent): return ''
  regardless of ref_msg (nothing to prepend the quote to).

The fix is structurally identical to the TEXT branch above it —
same null checks, same isMediaItem guard, same parts accumulation,
same output format.

Testing
-------
11 new tests added to 'voice messages with quoted context (Tencent#48)':

Quoted-context included:
  - title-only ref_msg        → '[引用: title]\n<voice text>'
  - message_item-only ref_msg → '[引用: content]\n<voice text>'
  - title + message_item      → '[引用: title | content]\n<voice text>'

Quoted-context omitted (media replies):
  - ref_msg is IMAGE    → voice text only
  - ref_msg is VIDEO    → voice text only
  - ref_msg is FILE     → voice text only
  - ref_msg is VOICE    → voice text only

No-change cases (regression guards):
  - empty ref_msg           → voice text only
  - standalone voice         → voice text only
  - untranscribed, no ref   → empty body
  - untranscribed, with ref → empty body

TypeScript: tsc --noEmit passes.
All 36 inbound tests pass.
The pre-existing failure in src/auth/pairing.test.ts is unrelated —
reproduces on main without this change (confirmed).
@draix draix force-pushed the fix/48-voice-quoted-context branch from d03b9f1 to 0bb27ec Compare April 14, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Voice messages with quote (ref_msg) lose quoted context

2 participants