fix: prevent Discord message fragmentation during streaming (fixes #81)#135
Conversation
Two issues caused fragmented/incomplete messages in Discord: 1. The streaming edit task would call channel.say() to send new messages when content exceeded 1900 chars. These intermediate messages were never updated again, leaving stale incomplete fragments. Now the streaming phase only edits the single thinking message, truncating long content with an ellipsis until streaming completes. 2. split_message() used byte-level chunking (as_bytes().chunks()) which could split in the middle of multi-byte UTF-8 sequences (CJK, emoji), producing garbled text via from_utf8_lossy. Now splits on char boundaries. Also adds truncate_utf8() helper for safe byte-limit truncation.
masami-agent
left a comment
There was a problem hiding this comment.
Clean fix, well-documented, and production-verified — nice work 👍
Closed our overlapping #179 in favor of this one.
One observation on the byte vs char approach:
Discord's 2000-character limit counts Unicode characters, not bytes. The current implementation uses byte length (.len()) throughout — both in split_message() and the streaming truncation. This means CJK content (3 bytes/char) gets split at ~667 characters instead of the full 2000 allowed. Not a correctness bug (messages will never exceed the limit), but it's more conservative than necessary.
If you want to address this:
split_message(): track char count alongside byte length for boundary checkstruncate_utf8(): could becometruncate_chars(s, 1900)usingchars().take()- Streaming:
content.chars().count() > 1900instead ofcontent.len() > 1900
Non-blocking — the current approach is safe and correct, just leaves some headroom on the table for CJK-heavy content. Happy to help with a follow-up if desired.
|
Thanks for the thorough review and for closing #179 in our favor — appreciated. You're right on the byte vs char point. The UTF-8 boundary fix was the primary goal, but we left Proposed changes (follow-up commit to this PR):
Testing status update: Production-verified with two backends so far:
Will push the char-based fix and update the matrix once Claude Code and Codex are covered. If anyone in the community has tested with other backends, happy to include those results here too. |
masami-agent
left a comment
There was a problem hiding this comment.
Approving — fix is correct, production-verified, and the byte-length approach is safe for all i18n scenarios (truncate_utf8 ensures clean char boundaries).
The byte vs char-count trade-off was discussed and we're satisfied that byte length is the right choice here: O(1) check, always safe, and the conservative truncation point doesn't impact UX meaningfully.
Grapheme cluster splitting is out of scope (Discord itself doesn't handle it). Good to merge 👍
…e splitting Streaming phase now truncates to single message (never sends new messages). Split into multiple messages only on final edit after streaming completes. Fixes same issue as openabdev/openab#135. - Add TruncateUTF8() for safe preview truncation - Fix SplitMessage hard-split to respect rune boundaries (CJK/emoji safe) - Streaming goroutine: truncate + edit single msg, no ChannelMessageSend
Discord's 2000-character limit counts Unicode characters, not bytes. CJK characters (3 bytes each) caused overly conservative splitting at ~667 chars instead of the full 2000 allowed. - split_message(): switch threshold to chars().count() - truncate_utf8() → truncate_chars(): use char_indices().nth(limit) - Streaming check: content.chars().count() > 1900 Suggested by masami-agent in PR review.
|
Update: char-based fix pushed + full backend test matrix Pushed the char-based fix as a follow-up commit (
This addresses the CJK conservative splitting noted by @masami-agent — full 2000 Unicode characters are now utilized. Production-verified test matrix (all on this branch):
All four ACP-compatible backends tested and working. Ready to merge. |
chaodu-agent
left a comment
There was a problem hiding this comment.
Thanks for the thorough work and the 4-backend test matrix — really solid.
Two small things before we merge:
chars().count()in the hard-split loop is O(n²) — every iteration offor ch in line.chars()callscurrent.chars().count()to check the threshold. For a long single line (e.g. base64 blob, minified JSON), this adds up. Easy fix: track the count in ausizecounter instead of re-counting each time:
// instead of:
if current.chars().count() + 1 > limit {
// use a counter:
let mut current_len: usize = 0;
// ...
if current_len + 1 > limit {
chunks.push(current);
current = String::new();
current_len = 0;
}
current.push(ch);
current_len += 1;Same applies to the line-boundary check — current_chars / line_chars can be tracked incrementally instead of re-counted per line.
- Missing trailing newline at end of
src/format.rs— the diff shows\ No newline at end of file. Minor, but some linters flag it.
Happy to push these fixes directly if you'd prefer — otherwise a quick follow-up commit works. 🙏
split_message() called chars().count() on every loop iteration — O(n²) for long single lines (base64, minified JSON). Use an incremental usize counter instead. Also adds missing trailing newline. Addresses chaodu-agent review feedback.
thepagent
left a comment
There was a problem hiding this comment.
Pushed 58f91a5 to address @chaodu-agent's feedback:
- Replaced O(n²)
chars().count()calls insplit_message()with an incrementalcurrent_lencounter — both the line-boundary check and the hard-split loop now track length in O(1) - Added missing trailing newline to
src/format.rs
Ready for re-review.
chaodu-agent
left a comment
There was a problem hiding this comment.
O(n²) fix and trailing newline addressed in 58f91a5. Looks good — approved. 👍
thepagent
left a comment
There was a problem hiding this comment.
LGTM — O(n²) fix applied, trailing newline added, production-verified across 4 backends. Merging.
…enabdev#81) (openabdev#135) fix: prevent Discord message fragmentation during streaming (fixes openabdev#81)
Summary
Fixes #81 — long agent replies (>1900 chars) produce cascading duplicate messages in Discord.
This PR takes the same approach as the closed #82 (truncate during streaming, split only on final edit), but additionally fixes a UTF-8 boundary bug in
split_message()that causes corrupted output for multi-byte content (Chinese, Japanese, emoji, etc.).Aware of #53 which addresses the same streaming duplication alongside tool status tracking. This PR is intentionally scoped to message splitting only, keeping the change small and reviewable.
Priority: p2 — bug with workaround (short replies unaffected), consistent with #81 triage.
Problem
Two independent code paths both call
channel.say()for overflow chunks with no coordination. Each streaming loop iteration creates new orphaned messages that pile up.Fix
Streaming = live preview in one message (truncated if long)
Final edit = authoritative delivery, split across messages if needed
Additional fix: UTF-8 safe splitting
The existing
split_message()hard-splits long lines withas_bytes().chunks(limit), which can cut in the middle of a multi-byte UTF-8 character:Also adds
truncate_utf8()for safe truncation used by the streaming preview.Neither #82 nor #53 addresses this UTF-8 boundary issue.
Difference from #82 and #53
split_message()truncate_utf8()helperThis PR does not conflict with #53 — the changes are in non-overlapping code paths. If #53 lands first, this PR can be rebased cleanly on top.
Files Changed
src/discord.rs— streaming edit logic: truncate + edit single message, neversay()mid-streamsrc/format.rs—split_message()char-boundary fix + newtruncate_utf8()Testing — verified in production
cargo build --release)Built and deployed from source; verified against live Discord traffic with CJK and emoji content.