Add browser-side Speech-to-Text with Web Worker architecture by Copilot · Pull Request #3 · Ryan-Guo123/ThinkFirst

Copilot · 2025-11-28T00:54:01Z

Summary

This PR implements browser-side Speech-to-Text using the Whisper model (Xenova/whisper-tiny.en) running entirely on the client. Key features:

Zero server cost: All processing happens in the browser
100% privacy: Audio never leaves the user's device
Web Worker architecture: Model loading and inference run in isolated worker thread
Auto-resampling: Audio automatically resampled to 16kHz for Whisper model

Bug Fixes (Latest)

Fixed race condition where mediaRecorderRef was cleared before onstop handler completed
Added validation for empty audio chunks and short recordings
Improved error handling with proper user feedback
UI now properly resets and shows errors
Fixed auto-restart recording bug
NEW: Fixed stale callback closure that prevented transcription text from appearing in the input box

New Voice Input UI Design

The voice recording UI now features:

Animated visualizer bars on the left side
Recording timer in the center
Stop/Mic button on the right side
Error message display for failed recordings

Original prompt

在现有的项目中，集成基于浏览器端侧计算的语音转文字（Speech-to-Text）功能。
目标是实现“零服务器成本”、“100% 隐私保护”的语音输入。

Requirements (必须满足)

核心架构：Web Worker 隔离

必须将 AI 模型的加载和推理逻辑放入单独的 worker.js 文件中。

禁止在主线程运行模型，以免造成 UI 冻结（Blocking UI）。

主线程与 Worker 之间通过 postMessage 通信。

模型选择

使用 Xenova/whisper-tiny.en (Quantized 版本)。

理由：平衡下载体积 (~40MB) 和识别准确率。

音频处理 (关键)

使用 Web Audio API (AudioContext) 捕获麦克风输入。

必须将音频自动重采样（Resample）到 Whisper 模型要求的 16000Hz 采样率。

音频数据需转换为模型可接受的 Float32Array 格式。

UI/UX 交互状态

初始态：显示“点击启用语音输入（首次需下载模型）”。

加载态：显示模型下载进度百分比（Progress Bar）。

录音态：显示“正在聆听...”，并提供停止按钮。

处理态：显示“正在转译...”。

完成态：将转译出的文字自动填充到输入框（callback function）。

参考之前原本的UI/UX 设计。和动画效果。

配置调整
直接修改代码库

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

vercel · 2025-11-28T00:54:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
think-first	Ready	Preview	Comment	Dec 1, 2025 11:36am

- Create whisper-worker.js for isolated model loading and inference - Add useSpeechToText hook for audio capture and processing - Create SpeechToTextButton component with multiple UI states - Integrate STT button into PromptInputBox - Update vite.config.ts for Web Worker support - Add @xenova/transformers dependency for Whisper model Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

- Fix progress value conversion (handle both 0-1 and 0-100 formats) - Remove hardcoded sample rate constraint for better device compatibility - Fix race condition by setting onstop handler before starting MediaRecorder Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Ryan-Guo123 · 2025-11-28T02:30:01Z

@copilot

可以参考以下样式来优化。按键放右边, 时间和声音条放在左边。

You are given a task to integrate an existing React component in the codebase

The codebase should support:

shadcn project structure
Tailwind CSS
Typescript

If it doesn't, provide instructions on how to setup project via shadcn CLI, install Tailwind or Typescript.

Determine the default path for components and styles.
If default path for components is not /components/ui, provide instructions on why it's important to create this folder
Copy-paste this component to /components/ui folder:

ai-voice-input.tsx
"use client";

import { Mic } from "lucide-react";
import { useState, useEffect } from "react";
import { cn } from "@/lib/utils";

interface AIVoiceInputProps {
  onStart?: () => void;
  onStop?: (duration: number) => void;
  visualizerBars?: number;
  demoMode?: boolean;
  demoInterval?: number;
  className?: string;
}

export function AIVoiceInput({
  onStart,
  onStop,
  visualizerBars = 48,
  demoMode = false,
  demoInterval = 3000,
  className
}: AIVoiceInputProps) {
  const [submitted, setSubmitted] = useState(false);
  const [time, setTime] = useState(0);
  const [isClient, setIsClient] = useState(false);
  const [isDemo, setIsDemo] = useState(demoMode);

  useEffect(() => {
    setIsClient(true);
  }, []);

  useEffect(() => {
    let intervalId: NodeJS.Timeout;

    if (submitted) {
      onStart?.();
      intervalId = setInterval(() => {
        setTime((t) => t + 1);
      }, 1000);
    } else {
      onStop?.(time);
      setTime(0);
    }

    return () => clearInterval(intervalId);
  }, [submitted, time, onStart, onStop]);

  useEffect(() => {
    if (!isDemo) return;

    let timeoutId: NodeJS.Timeout;
    const runAnimation = () => {
      setSubmitted(true);
      timeoutId = setTimeout(() => {
        setSubmitted(false);
        timeoutId = setTimeout(runAnimation, 1000);
      }, demoInterval);
    };

    const initialTimeout = setTimeout(runAnimation, 100);
    return () => {
      clearTimeout(timeoutId);
      clearTimeout(initialTimeout);
    };
  }, [isDemo, demoInterval]);

  const formatTime = (seconds: number) => {
    const mins = Math.floor(seconds / 60);
    const secs = seconds % 60;
    return `${mins.toString().padStart(2, "0")}:${secs.toString().padStart(2, "0")}`;
  };

  const handleClick = () => {
    if (isDemo) {
      setIsDemo(false);
      setSubmitted(false);
    } else {
      setSubmitted((prev) => !prev);
    }
  };

  return (
    <div className={cn("w-full py-4", className)}>
      <div className="relative max-w-xl w-full mx-auto flex items-center flex-col gap-2">
        <button
          className={cn(
            "group w-16 h-16 rounded-xl flex items-center justify-center transition-colors",
            submitted
              ? "bg-none"
              : "bg-none hover:bg-black/10 dark:hover:bg-white/10"
          )}
          type="button"
          onClick={handleClick}
        >
          {submitted ? (
            <div
              className="w-6 h-6 rounded-sm animate-spin bg-black dark:bg-white cursor-pointer pointer-events-auto"
              style={{ animationDuration: "3s" }}
            />
          ) : (
            <Mic className="w-6 h-6 text-black/70 dark:text-white/70" />
          )}
        </button>

        <span
          className={cn(
            "font-mono text-sm transition-opacity duration-300",
            submitted
              ? "text-black/70 dark:text-white/70"
              : "text-black/30 dark:text-white/30"
          )}
        >
          {formatTime(time)}
        </span>

        <div className="h-4 w-64 flex items-center justify-center gap-0.5">
          {[...Array(visualizerBars)].map((_, i) => (
            <div
              key={i}
              className={cn(
                "w-0.5 rounded-full transition-all duration-300",
                submitted
                  ? "bg-black/50 dark:bg-white/50 animate-pulse"
                  : "bg-black/10 dark:bg-white/10 h-1"
              )}
              style={
                submitted && isClient
                  ? {
                      height: `${20 + Math.random() * 80}%`,
                      animationDelay: `${i * 0.05}s`,
                    }
                  : undefined
              }
            />
          ))}
        </div>

        <p className="h-4 text-xs text-black/70 dark:text-white/70">
          {submitted ? "Listening..." : "Click to speak"}
        </p>
      </div>
    </div>
  );
}

demo.tsx
import { AIVoiceInput } from "@/components/ui/ai-voice-input";
import { useState } from "react";

export function AIVoiceInputDemo() {
  const [recordings, setRecordings] = useState<{ duration: number; timestamp: Date }[]>([]);

  const handleStop = (duration: number) => {
    setRecordings(prev => [...prev.slice(-4), { duration, timestamp: new Date() }]);
  };

  return (
    <div className="space-y-8">
        <div className="space-y-4">
          <AIVoiceInput 
            onStart={() => console.log('Recording started')}
            onStop={handleStop}
          />   
      </div>
    </div>
  );
}

Install NPM dependencies:

lucide-react

Implementation Guidelines

Analyze the component structure and identify all required dependencies
Review the component's argumens and state
Identify any required context providers or hooks and install them
Questions to Ask

What data/props will be passed to this component?
Are there any specific state management requirements?
Are there any required assets (images, icons, etc.)?
What is the expected responsive behavior?
What is the best place to use this component in the app?

Steps to integrate
0. Copy paste all the code above in the correct directories

Install external dependencies
Fill image assets with Unsplash stock images you know exist
Use lucide-react icons for svgs or logos if component requires them

- Add inline voice recording bar with visualizer bars on left - Show recording timer in center - Place stop/mic button on right side - Add animated visualizer bars during recording - Extract magic numbers to named constants - Remove unused SpeechToTextButton and AIVoiceInput components Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Copilot · 2025-11-28T02:39:58Z

@copilot

可以参考以下样式来优化。按键放右边, 时间和声音条放在左边。

...

Redesigned the voice input UI in commit e11a88a. The new layout shows:

Visualizer bars on the left
Recording timer in the center
Stop/Mic button on the right

Ryan-Guo123 · 2025-11-29T02:42:04Z

@copilot

按停止录音的时候, 有的时候不转录
有很多bug

… display - Fix race condition in stopRecording by clearing mediaRecorderRef in onstop handler - Add validation for empty audio chunks and recordings that are too short - Add better error handling for audio decode failures and worker readiness - Update UI to properly show error messages and reset state on errors - Simplify worker transcription result logic Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Copilot · 2025-11-29T02:50:24Z

@copilot

按停止录音的时候, 有的时候不转录
...

Fixed in commit f06f24b. The issues were:

Race condition: The mediaRecorderRef was being cleared before the onstop handler could process audio - now cleared inside the handler
Empty audio handling: Added validation for recordings that are too short (<0.5s) or have no audio data
Error display: UI now shows error messages and properly resets state

The changes include better error handling for audio decode failures, worker readiness checks, and user-friendly error messages.

Ryan-Guo123 · 2025-12-01T11:07:03Z

@copilot
When you press the stop recording button, a "Transcribing..." message appears briefly, but after a few seconds it automatically switches back to recording mode without transcribing any text.

The issue was that after transcription, the status changes to 'ready', which triggered the auto-start effect to restart recording before the handleSpeechTranscript callback could set voiceInputActive to false. Added justFinishedProcessingRef to track when processing just finished, preventing the auto-start effect from re-triggering recording after transcription or error states. Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Copilot · 2025-12-01T11:13:38Z

@copilot
When you press the stop recording button, a "Transcribing..." message appears briefly, but after a few seconds it automatically switches back to recording mode without transcribing any text.

Fixed in commit 98854f2. The issue was that the auto-start recording effect was triggering after transcription completed because the voiceStatus changed to 'ready' before the callback could set voiceInputActive to false. Added a ref to track when processing just finished and prevent auto-restart in that case.

Ryan-Guo123 · 2025-12-01T11:17:10Z

@copilot
it only work on first time, then it has the Same problem!!

The issue was that justFinishedProcessingRef was being cleared unconditionally at the start of the effect, even when the auto-start conditions weren't met. This caused the flag to be cleared before it could prevent auto-restart on subsequent recordings. Fix: - Only clear the flag when auto-start conditions are actually met - Clear the flag when user manually clicks to start new recording Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Copilot · 2025-12-01T11:21:33Z

@copilot
it only work on first time, then it has the Same problem!!

Fixed in commit 24c3a5b. The issue was that justFinishedProcessingRef was being cleared unconditionally when the effect ran, even when the auto-start conditions weren't met. This caused the flag to be cleared before it could prevent auto-restart on subsequent recordings.

Changes:

Only clear the flag when all auto-start conditions are met
Also clear the flag when user manually clicks to start a new recording session

Ryan-Guo123 · 2025-12-01T11:24:11Z

@copilot Same problem! When I press the stop recording button, a "Transcribing..." message appears briefly, but after a few seconds it automatically switches back to recording mode without transcribing any text.

Previous approach used justFinishedProcessingRef flag which was too complex and prone to race conditions. New approach: - Use waitingForModelRef that is ONLY set when user clicks mic while model not loaded - Auto-start ONLY triggers when this flag is set AND model becomes ready - After transcription, the flag is already false so no restart happens This removes the problematic voiceInputActive dependency for auto-start. Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Copilot · 2025-12-01T11:28:27Z

@copilot Same problem! When I press the stop recording button, a "Transcribing..." message appears briefly, but after a few seconds it automatically switches back to recording mode without transcribin...

Fixed in commit fb9c75d. Completely rewrote the auto-start logic with a simpler approach:

The waitingForModelRef flag is now ONLY set when user clicks mic while model is not loaded
Auto-start ONLY happens when this flag is true AND model becomes ready
After transcription, the flag is already false so recording cannot restart

The previous approach with justFinishedProcessingRef was too complex and had race conditions. The new approach eliminates the problematic dependency on voiceInputActive for auto-start decisions.

Ryan-Guo123 · 2025-12-01T11:31:24Z

@copilot When I press the stop recording button, a "Transcribing..." message appears briefly, but after a few seconds it didn't transcribing any text!

The onTranscript callback was captured in the initWorker useCallback closure. When the component re-rendered, the worker's onmessage handler still had the old callback reference that didn't update the UI. Fix: Use a ref (onTranscriptRef) that is updated via useEffect whenever onTranscript changes. The worker's onmessage handler now uses this ref to always call the latest callback. Co-authored-by: Ryan-Guo123 <169242962+Ryan-Guo123@users.noreply.github.com>

Copilot · 2025-12-01T11:36:00Z

@copilot When I press the stop recording button, a "Transcribing..." message appears briefly, but after a few seconds it didn't transcribing any text!

Fixed in commit ca502c5. The issue was a stale callback closure - the onTranscript callback was captured in the worker's onmessage handler when the worker was initialized, but React re-renders could create a new callback reference that the handler didn't see.

Fix: Use a ref (onTranscriptRef) that is updated via useEffect whenever onTranscript changes. The worker's handler now uses this ref to always call the latest callback, ensuring the transcription text appears in the input box.

Initial plan

a7a77f8

Copilot AI assigned Copilot and Ryan-Guo123 Nov 28, 2025

vercel bot deployed to Preview November 28, 2025 00:54 View deployment

Copilot started work on behalf of Ryan-Guo123 November 28, 2025 00:54 View session

vercel bot deployed to Preview November 28, 2025 01:00 View deployment

vercel bot deployed to Preview November 28, 2025 01:04 View deployment

Copilot AI changed the title ~~[WIP] Add browser-based speech-to-text integration~~ Add browser-side Speech-to-Text with Web Worker architecture Nov 28, 2025

Copilot AI requested a review from Ryan-Guo123 November 28, 2025 01:06

Copilot finished work on behalf of Ryan-Guo123 November 28, 2025 01:06

Copilot started work on behalf of Ryan-Guo123 November 28, 2025 02:30 View session

vercel bot deployed to Preview November 28, 2025 02:40 View deployment

Copilot finished work on behalf of Ryan-Guo123 November 28, 2025 02:41

Copilot started work on behalf of Ryan-Guo123 November 29, 2025 02:42 View session

vercel bot deployed to Preview November 29, 2025 02:50 View deployment

Copilot finished work on behalf of Ryan-Guo123 November 29, 2025 02:51

Ryan-Guo123 marked this pull request as ready for review December 1, 2025 11:03

Copilot started work on behalf of Ryan-Guo123 December 1, 2025 11:07 View session

vercel bot deployed to Preview December 1, 2025 11:13 View deployment

Copilot finished work on behalf of Ryan-Guo123 December 1, 2025 11:14

Copilot started work on behalf of Ryan-Guo123 December 1, 2025 11:17 View session

vercel bot deployed to Preview December 1, 2025 11:21 View deployment

Copilot finished work on behalf of Ryan-Guo123 December 1, 2025 11:22

Copilot started work on behalf of Ryan-Guo123 December 1, 2025 11:24 View session

vercel bot deployed to Preview December 1, 2025 11:28 View deployment

Copilot finished work on behalf of Ryan-Guo123 December 1, 2025 11:29

Copilot started work on behalf of Ryan-Guo123 December 1, 2025 11:31 View session

vercel bot deployed to Preview December 1, 2025 11:36 View deployment

Copilot finished work on behalf of Ryan-Guo123 December 1, 2025 11:37

Conversation

Copilot AI commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug Fixes (Latest)

New Voice Input UI Design

Requirements (必须满足)

Uh oh!

vercel bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ryan-Guo123 commented Nov 28, 2025

可以参考以下样式来优化。按键放右边, 时间和声音条放在左边。

Uh oh!

Copilot AI commented Nov 28, 2025

可以参考以下样式来优化。按键放右边, 时间和声音条放在左边。

Uh oh!

Ryan-Guo123 commented Nov 29, 2025

Uh oh!

Copilot AI commented Nov 29, 2025

Uh oh!

Ryan-Guo123 commented Dec 1, 2025

Uh oh!

Copilot AI commented Dec 1, 2025

Uh oh!

Ryan-Guo123 commented Dec 1, 2025

Uh oh!

Copilot AI commented Dec 1, 2025

Uh oh!

Ryan-Guo123 commented Dec 1, 2025

Uh oh!

Copilot AI commented Dec 1, 2025

Uh oh!

Ryan-Guo123 commented Dec 1, 2025

Uh oh!

Copilot AI commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 28, 2025 •

edited

Loading

vercel bot commented Nov 28, 2025 •

edited

Loading