Skip to content

fix: prevent frontend freeze on extremely long AI outputs#451

Open
voidborne-d wants to merge 1 commit intoValueCell-ai:mainfrom
voidborne-d:fix-long-output-freeze
Open

fix: prevent frontend freeze on extremely long AI outputs#451
voidborne-d wants to merge 1 commit intoValueCell-ai:mainfrom
voidborne-d:fix-long-output-freeze

Conversation

@voidborne-d
Copy link

Fixes #133

Problem

When an AI model produces very long output (e.g. 100k+ chars), the frontend freezes with 100% CPU and memory spikes. This is 100% reproducible.

Root cause: During streaming, every new token triggers a full re-render of ChatMessage. ReactMarkdown re-parses the entire markdown AST synchronously on each render. For a 100k-char message receiving tokens at ~50/s, this means ~50 full 100k-char markdown parses per second — O(n²) cumulative work that blocks the main thread.

Fix

Two targeted changes to ChatMessage.tsx:

1. useDeferredValue for streaming text

const deferredText = useDeferredValue(text);

React can now skip intermediate renders when tokens arrive faster than the browser can paint. The markdown only re-parses when the browser has idle time, keeping the UI responsive.

2. Tail-render cap for extremely long outputs

const MARKDOWN_RENDER_LIMIT = 50_000; // chars
const renderText = isStreaming && text.length > limit
  ? text.slice(-limit)
  : text;

During streaming, if text exceeds 50k chars, only the last 50k chars are fed to ReactMarkdown. A small hint tells the user content is truncated while streaming. Once streaming completes, the full text is rendered normally.

Why this works

  • useDeferredValue is a React 18 primitive designed exactly for this case — deferring expensive re-renders of fast-changing data
  • The 50k cap is a safety net for extreme cases (multiple pages of output). 50k chars of markdown renders in ~15ms on modern hardware
  • Zero impact on non-streaming or normal-length messages
  • Full content is always preserved and rendered after streaming ends

Testing

  • Tested with qwen3.5-plus generating 150k+ char responses (the exact scenario from the bug report)
  • CPU stays under 30% during streaming (was 100% before)
  • Memory stable (was climbing unbounded before)
  • UI remains interactive throughout

…ai#133)

Two changes to ChatMessage:

1. Use React.useDeferredValue for the streaming text so ReactMarkdown
   can skip intermediate re-parses when tokens arrive faster than the
   browser can render. This keeps the UI thread responsive.

2. During streaming, cap the text fed to ReactMarkdown at 50k chars
   (showing the tail). Once streaming completes the full text is
   rendered normally. This avoids the O(n²) cumulative parse cost that
   causes 100% CPU and memory spikes on very long outputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

当AI的输出极长时,前端会卡死崩溃,复现率100%

1 participant