fix(inbound): preserve quoted context for voice messages with ref_msg (#48)#63
Open
draix wants to merge 1 commit intoTencent:mainfrom
Open
fix(inbound): preserve quoted context for voice messages with ref_msg (#48)#63draix wants to merge 1 commit intoTencent:mainfrom
draix wants to merge 1 commit intoTencent:mainfrom
Conversation
…Tencent#48) When a user sends a voice message that quotes/replies to a previous text message, the transcribed text was returned without the quoted context. Text replies with ref_msg already prepended '[引用: ...]' — voice replies did not, making the agent unaware of what the user was responding to. Root cause ---------- bodyFromItemList() handled TEXT + ref_msg with full quoted-context logic but the VOICE branch returned voice_item.text unconditionally, ignoring ref_msg entirely. Fix --- Apply the same quoted-context logic to VOICE as TEXT: - If ref_msg is absent: return transcribed text as-is (unchanged). - If ref_msg.message_item is a media type (IMAGE/VIDEO/FILE/VOICE): return transcribed text only — media cannot be quoted as text. - If ref_msg has title and/or a TEXT message_item: prepend '[引用: <title> | <text>]' before the transcribed voice text. - If voice has no transcription (voice_item.text absent): return '' regardless of ref_msg (nothing to prepend the quote to). The fix is structurally identical to the TEXT branch above it — same null checks, same isMediaItem guard, same parts accumulation, same output format. Testing ------- 11 new tests added to 'voice messages with quoted context (Tencent#48)': Quoted-context included: - title-only ref_msg → '[引用: title]\n<voice text>' - message_item-only ref_msg → '[引用: content]\n<voice text>' - title + message_item → '[引用: title | content]\n<voice text>' Quoted-context omitted (media replies): - ref_msg is IMAGE → voice text only - ref_msg is VIDEO → voice text only - ref_msg is FILE → voice text only - ref_msg is VOICE → voice text only No-change cases (regression guards): - empty ref_msg → voice text only - standalone voice → voice text only - untranscribed, no ref → empty body - untranscribed, with ref → empty body TypeScript: tsc --noEmit passes. All 36 inbound tests pass. The pre-existing failure in src/auth/pairing.test.ts is unrelated — reproduces on main without this change (confirmed).
d03b9f1 to
0bb27ec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a user sends a voice message that quotes/replies to a previous text message, the agent received the transcribed text but lost the quoted context. Text replies with
ref_msgalready prepended[引用: ...]correctly — voice replies did not, leaving the agent unable to understand what the user was responding to.Root cause
bodyFromItemList()insrc/messaging/inbound.tshandledTEXT + ref_msgwith full quoted-context logic, but theVOICEbranch returnedvoice_item.textunconditionally without checkingref_msg:Fix
Apply the same quoted-context logic to
VOICEasTEXT. The fix is structurally identical to theTEXTbranch above it — same null checks, sameisMediaItemguard, samepartsaccumulation, same output format.yes please[引用: Can you schedule a meeting?]\nyes pleaseagreed[引用: Let's meet at 3pm]\nagreednice photonice photo(unchanged — media can't be quoted as text)ref_msg)schedule meetingschedule meeting(unchanged)Testing
11 new tests in
describe("voice messages with quoted context (#48)"):Quoted context included:
title-onlyref_msg→[引用: title]\n<voice text>message_item-onlyref_msg→[引用: content]\n<voice text>title+message_item→[引用: title | content]\n<voice text>Quoted context omitted (media replies — can't express as text):
ref_msgisIMAGE→ voice text onlyref_msgisVIDEO→ voice text onlyref_msgisFILE→ voice text onlyref_msgisVOICE→ voice text onlyNo-change regression guards:
ref_msg→ voice text onlyref_msg) → voice text onlyref_msg→ empty bodyref_msg→ empty bodytsc --noEmitpasses. All 36 inbound tests pass.Fixes #48