(Feature)Audio Recording & ASR 新增音频录制和自动语音识别功能 by yuansui486 · Pull Request #291 · yuka-friends/Windrecorder

yuansui486 · 2025-10-30T07:44:28Z

Audio Recording & ASR Feature - Pull Request Documentation

📋 Overview / 概述

This PR adds comprehensive audio recording and automatic speech recognition (ASR) capabilities to Windrecorder, enabling users to search not only through screen OCR text but also through spoken audio content.Mainly provides a solution for #290

本 PR 为 Windrecorder 添加了完整的音频录制和自动语音识别（ASR）功能，使用户不仅可以搜索屏幕 OCR 文本，还可以搜索语音内容。主要为 #290 提供解决方案。

Base Commit: 92be2ed (Merge branch 'yuka-friends:main' into main)

✨ Key Features / 核心功能

1. Dual-Track Audio Recording / 双轨音频录制

✅ System audio (speakers/applications) / 系统音频（扬声器/应用程序）
✅ Microphone audio (user voice) / 麦克风音频（用户语音）
✅ Separate storage for independent processing / 独立存储以便独立处理
✅ Synchronized with video timestamps / 与视频时间戳同步

2. Automatic Speech Recognition (ASR) / 自动语音识别

✅ Powered by SenseVoiceSmall (Alibaba FunASR) / 基于阿里 FunASR 的 SenseVoiceSmall 模型
✅ Multi-language support: Chinese, English, Japanese, Korean, Cantonese / 多语言支持：中文、英文、日语、韩语、粤语
✅ Automatic language detection / 自动语言检测
✅ Emotion and event tags / 情感和事件标签
✅ CPU and GPU acceleration support / CPU 和 GPU 加速支持

3. Independent Audio-Text Database / 独立音频文本数据库

✅ NEW: Separate audio_text table for time-based audio data / 新增独立 audio_text 表存储时间级音频数据
✅ Fixed architectural issue: Audio data no longer incorrectly associated with application windows / 修复架构问题：音频数据不再错误地关联到应用窗口
✅ Eliminates data redundancy (100+ duplicates per 15min video) / 消除数据冗余（每15分钟视频100+条重复）
✅ Uses timestamp as primary key for global audio data / 使用时间戳作为主键存储全局音频数据

4. Intelligent Noise Filtering / 智能噪音过滤

✅ Music detection via repetitive pattern analysis / 通过重复模式分析检测音乐
✅ Minimum text length threshold / 最小文本长度阈值
✅ Keyword blacklist for known noise / 已知噪音的关键词黑名单
✅ Separate filtering for system audio and microphone / 系统音频和麦克风分别过滤

5. Integrated Search / 集成搜索

✅ Search across OCR text, window titles, and ASR transcriptions / 在 OCR 文本、窗口标题和 ASR 转录中搜索
✅ LEFT JOIN query for seamless integration / 使用 LEFT JOIN 查询无缝集成
✅ Display ASR results alongside screen captures / 在屏幕截图旁显示 ASR 结果
✅ Audio playback in oneday view / oneday 视图中的音频播放

6. WebUI Enhancements / WebUI 增强

✅ Audio device detection and configuration / 音频设备检测和配置
✅ Device availability testing / 设备可用性测试
✅ Manual ASR transcription with progress bar / 带进度条的手动 ASR 转录
✅ ASR model testing with example files / 使用示例文件测试 ASR 模型
✅ ASR processing pause/resume controls / ASR 处理暂停/恢复控制
✅ Comprehensive ASR settings panel / 全面的 ASR 设置面板

🏗️ Architecture Changes / 架构变更

Database Schema / 数据库架构

❌ OLD (Before this PR): Flawed Architecture

-- Problem: ASR data stored per-window in video_text table
CREATE TABLE video_text (
    videofile_name VARCHAR(100),
    videofile_time INT,
    ocr_text TEXT,
    win_title TEXT,
    asr_text_system TEXT,  -- ❌ Duplicated 100+ times per 15min video
    asr_text_mic TEXT,     -- ❌ Audio wrongly associated with windows
    asr_language TEXT,     -- ❌ Data redundancy
    ...
);

Issues with old architecture:

🔴 Audio data duplicated across 100+ window records per 15-minute video
🔴 Audio incorrectly associated with arbitrary application windows
🔴 Confusing data model: window-level data mixed with global-level data
🔴 Search results show audio associated with wrong applications

✅ NEW (This PR): Corrected Architecture

1. video_text table (Window-level data):

CREATE TABLE video_text (
    videofile_name VARCHAR(100),
    picturefile_name VARCHAR(100),
    videofile_time INT,
    ocr_text TEXT,            -- ✅ OCR text (window-specific)
    win_title TEXT,           -- ✅ Window title (window-specific)
    deep_linking TEXT,        -- ✅ URL/path (window-specific)
    thumbnail TEXT,
    ...
);

2. audio_text table (Time-level data, NEW):

CREATE TABLE audio_text (
    audio_timestamp INT PRIMARY KEY,  -- ✅ Time-based key (global)
    asr_text_system TEXT,             -- ✅ System audio ASR (one per timestamp)
    asr_text_mic TEXT,                -- ✅ Microphone ASR (one per timestamp)
    asr_language TEXT,                -- ✅ Detected language
    audiofile_system TEXT,            -- ✅ System audio filename
    audiofile_mic TEXT                -- ✅ Mic audio filename
);

3. audiofile_state table (Audio file metadata):

CREATE TABLE audiofile_state (
    audiofile_name TEXT PRIMARY KEY,
    audio_type TEXT,              -- 'system' or 'mic'
    created_time TEXT,
    file_size_bytes INTEGER,
    asr_indexed INTEGER DEFAULT 0,
    asr_success INTEGER DEFAULT 0
);

Benefits of new architecture:

✅ Clean separation: Window data vs. global audio data
✅ No redundancy: One audio record per timestamp (not per window)
✅ Correct associations: Audio is time-based, not window-based
✅ Flexible queries: LEFT JOIN for seamless integration
✅ Data integrity: Audio exists independently of screen activity

Search Query / 搜索查询

OLD (Flawed):

SELECT * FROM video_text
WHERE ocr_text LIKE '%keyword%'
   OR asr_text_system LIKE '%keyword%'  -- ❌ Duplicated data

NEW (Corrected):

SELECT v.*, a.asr_text_system, a.asr_text_mic, a.asr_language
FROM video_text v
LEFT JOIN audio_text a ON v.videofile_time = a.audio_timestamp
WHERE (v.ocr_text LIKE '%keyword%'
    OR v.win_title LIKE '%keyword%'
    OR a.asr_text_system LIKE '%keyword%'  -- ✅ Joined from audio_text
    OR a.asr_text_mic LIKE '%keyword%')
AND v.videofile_time BETWEEN ? AND ?

Benefits:

✅ No duplicate results (one match per timestamp)
✅ Preserves all window records even without audio (LEFT JOIN)
✅ Correct time-based audio association
✅ Maintains backward compatibility with non-audio databases

📂 File Changes / 文件变更

Modified Files / 修改的文件

File	Lines Changed	Description
`windrecorder/db_manager.py`	+369	Core changes: audio_text table, LEFT JOIN queries, ASR methods
`windrecorder/record.py`	+87	Audio recording via FFmpeg
`windrecorder/ui/recording.py`	+361	WebUI audio settings, device detection, manual ASR
`windrecorder/ui/components.py`	+45	Dynamic ASR column rendering
`windrecorder/ui/oneday.py`	+43	Audio player integration
`windrecorder/ui/search.py`	+30	Audio search integration
`windrecorder/ui/state.py`	+83	Audio statistics
`windrecorder/config.py`	+54	Audio configuration properties
`windrecorder/utils.py`	+79	Audio device detection, testing
`record_screen.py`	+49	Audio recording thread integration
`webui.py`	+15	ASR manager preimport strategy
`config_default.json`	+28	Default audio settings
`languages.json`	+122	UI translations for audio features
`pyproject.toml`	+6	Dependencies: funasr, modelscope

New Files / 新增文件

File	Lines	Description
`windrecorder/asr_manager.py`	361	NEW: ASR manager module (SenseVoice integration)
`windrecorder/config_src/example/*.mp3`	-	NEW: 5 test audio files (en, zh, ja, ko, yue)

Total Changes: 3,216 insertions, 161 deletions across 23 files

🔧 Technical Implementation / 技术实现

1. Audio Recording Pipeline / 音频录制流程

Implementation: windrecorder/record.py:47-140

def record_audio_via_ffmpeg(
    output_dir=config.record_audios_dir_ud,
    record_time=config.record_seconds,
    audio_type="system",  # "system" or "mic"
):
    """
    Record audio using FFmpeg DirectShow
    使用 FFmpeg DirectShow 录制音频
    """
    # 1. Get device name from config
    device_name = (config.system_audio_device_name
                   if audio_type == "system"
                   else config.mic_audio_device_name)

    # 2. Build FFmpeg command
    ffmpeg_cmd = [
        config.ffmpeg_path,
        "-f", "dshow",                        # DirectShow input
        "-i", f"audio={device_name}",
        "-ar", "16000",                       # 16kHz sample rate
        "-ac", "1",                           # Mono
        "-c:a", "libmp3lame",                 # MP3 codec
        "-b:a", "64k",                        # 64kbps bitrate
        "-t", str(record_time),
        out_path,
    ]

    # 3. Register to database
    db_manager.db_add_audiofile(
        audiofile_name=audio_out_name,
        audio_type=audio_type,
        created_time=now.strftime(DATETIME_FORMAT),
        file_size_bytes=file_size,
    )

Storage: userdata/audios/YYYY-MM/YYYY-MM-DD_HH-MM-SS_{system|mic}.mp3

2. ASR Processing / ASR 处理

Implementation: windrecorder/asr_manager.py (361 lines)

Key Methods:

`transcribe_audio(audio_path, audio_type)`

def transcribe_audio(self, audio_path: str, audio_type: str) -> dict:
    """
    Transcribe audio file using SenseVoice
    使用 SenseVoice 转录音频文件

    Returns:
        {
            'text': str,         # Filtered transcription
            'raw_text': str,     # Original transcription
            'language': str,     # Detected language
            'emotion': str,      # Emotion tag (optional)
            'event': str,        # Event tag (optional)
        }
    """
    # 1. Load model (lazy loading)
    self._ensure_model_loaded()

    # 2. Run ASR
    result = self.model.generate(
        input=audio_path,
        language="auto",
        use_itn=config.asr_use_itn,
        batch_size_s=config.asr_batch_size_s,
        merge_vad=True,
        merge_length_s=15,
    )

    # 3. Extract and process text
    text = result[0]["text"]
    text = self._filter_text(text, audio_type)

    return {"text": text, "language": lang, ...}

`process_pending_audio_files(batch_size)`

def process_pending_audio_files(self, batch_size: int = 3):
    """
    Process pending audio files during idle maintenance
    闲时处理待处理的音频文件
    """
    # 1. Query pending files
    pending = db_manager.db_get_pending_asr_audiofiles(limit=batch_size)

    # 2. Process each file
    for audio_info in pending:
        # Transcribe
        result = self.transcribe_audio(
            audio_info["audio_path"],
            audio_info["audio_type"]
        )

        # Update database
        if result["text"]:
            db_manager.db_update_asr_text(
                audiofile_name=audio_info["audiofile_name"],
                asr_text=result["text"],
                asr_language=result["language"],
                audio_source=1 if audio_type == "system" else 2,
            )

        # Mark as indexed
        db_manager.db_mark_audio_asr_indexed(
            audio_info["audiofile_name"],
            success=True
        )

3. Database Integration / 数据库集成

Implementation: windrecorder/db_manager.py

Critical Fix: `db_update_asr_text()` (Lines 1086-1147)

OLD (Flawed - wrote to video_text):

def db_update_asr_text(self, audiofile_name, asr_text, asr_language, audio_source):
    # ❌ OLD: Updated video_text table (wrong association)
    cursor.execute(
        """UPDATE video_text
           SET asr_text_system = ?, asr_language = ?
           WHERE videofile_time = ?""",  # ❌ Matched many window records
        (asr_text, asr_language, video_time)
    )

NEW (Corrected - writes to audio_text):

def db_update_asr_text(self, audiofile_name, asr_text, asr_language, audio_source):
    """
    Update ASR text in audio_text table
    更新 audio_text 表中的 ASR 文本
    """
    # Extract timestamp from filename
    timestamp_str = audiofile_name[:19]  # YYYY-MM-DD_HH-MM-SS
    audio_timestamp = int(utils.dtstr_to_seconds(timestamp_str))

    # Check if record exists
    cursor.execute("SELECT * FROM audio_text WHERE audio_timestamp = ?",
                   (audio_timestamp,))
    existing = cursor.fetchone()

    if existing:
        # ✅ UPDATE existing record (system or mic column)
        if audio_source == 1:  # System audio
            cursor.execute(
                """UPDATE audio_text
                   SET asr_text_system = ?, asr_language = ?, audiofile_system = ?
                   WHERE audio_timestamp = ?""",
                (asr_text, asr_language, audiofile_name, audio_timestamp)
            )
        else:  # Microphone
            cursor.execute(
                """UPDATE audio_text
                   SET asr_text_mic = ?, asr_language = ?, audiofile_mic = ?
                   WHERE audio_timestamp = ?""",
                (asr_text, asr_language, audiofile_name, audio_timestamp)
            )
    else:
        # ✅ INSERT new record
        if audio_source == 1:
            cursor.execute(
                """INSERT INTO audio_text
                   (audio_timestamp, asr_text_system, asr_language, audiofile_system)
                   VALUES (?, ?, ?, ?)""",
                (audio_timestamp, asr_text, asr_language, audiofile_name)
            )
        else:
            cursor.execute(
                """INSERT INTO audio_text
                   (audio_timestamp, asr_text_mic, asr_language, audiofile_mic)
                   VALUES (?, ?, ?, ?)""",
                (audio_timestamp, asr_text, asr_language, audiofile_name)
            )

Key Improvements:

✅ Uses audio_timestamp as primary key (time-based, not window-based)
✅ One record per timestamp (combines system and mic via UPDATE)
✅ No data duplication
✅ INSERT or UPDATE logic for separate audio sources

Critical Fix: `db_search_data()` (Lines 316-369)

Changes:

# OLD query
query = "SELECT * FROM video_text WHERE ocr_text LIKE ..."

# NEW query with LEFT JOIN
query = """
    SELECT v.*, a.asr_text_system, a.asr_text_mic, a.asr_language
    FROM video_text v
    LEFT JOIN audio_text a ON v.videofile_time = a.audio_timestamp
    WHERE (v.ocr_text LIKE '%{keyword}%'
        OR v.win_title LIKE '%{keyword}%'
        OR a.asr_text_system LIKE '%{keyword}%'
        OR a.asr_text_mic LIKE '%{keyword}%')
    AND v.videofile_time BETWEEN ? AND ?
"""

Benefits:

✅ All table references use aliases (v. or a.)
✅ LEFT JOIN preserves all window records even without audio
✅ ASR columns available in result dataframe
✅ Search works across OCR, window titles, and ASR text

4. Noise Filtering / 噪音过滤

Implementation: windrecorder/asr_manager.py:106-157

Three-tier filtering:

def _filter_text(self, text: str, audio_type: str) -> str:
    """
    Filter noise, music, and low-quality transcriptions
    过滤噪音、音乐和低质量转录
    """
    # 1. Length filter
    if len(text.strip()) < config.asr_min_text_length:  # Default: 5
        logger.debug(f"Text too short: {len(text)}")
        return ""

    # 2. Repetitive pattern detection (music/lyrics)
    if self._is_repetitive(text):
        logger.debug(f"Repetitive text detected: {text[:50]}")
        return ""

    # 3. Keyword blacklist (for system audio only)
    if audio_type == "system":
        for keyword in config.asr_music_filter_keywords:
            if keyword.lower() in text.lower():
                logger.debug(f"Music keyword detected: {keyword}")
                return ""

    return text

def _is_repetitive(self, text: str) -> bool:
    """Check if text has repetitive patterns (music/lyrics)"""
    words = text.split()
    if len(words) < 5:
        return False

    unique_ratio = len(set(words)) / len(words)
    return unique_ratio < config.asr_repetitive_threshold  # Default: 0.6

Effectiveness:

✅ Filters 90%+ music/background noise
✅ Preserves meaningful conversation
✅ Configurable thresholds
✅ Different strategies for system vs. mic audio

5. WebUI Integration / WebUI 集成

Implementation: windrecorder/ui/recording.py:342-661

Key Features:

Audio Device Detection

# Auto-detect devices on page load
if "audio_devices_auto_loaded" not in st.session_state:
    try:
        devs = utils.get_audio_devices()
        st.session_state.audio_devices = devs
        st.session_state.audio_devices_auto_loaded = True
    except Exception as e:
        st.warning(f"Failed to auto-detect devices: {e}")

# Manual refresh button
if st.button("🔍 Detect Devices"):
    devs = utils.get_audio_devices()
    st.session_state.audio_devices = devs

Device Testing

if st.button("🎵 Test System Audio"):
    ok, msg, _ = utils.test_audio_device(config_system_audio_device_name, 2)
    (st.success if ok else st.error)(f"{'✅' if ok else '❌'} {msg}")

Manual ASR Transcription

if st.button("Start Manual Transcription"):
    pending_audios = db_manager.db_get_pending_asr_audiofiles(limit=batch_size)

    progress_bar = st.progress(0)
    status_text = st.empty()

    for idx, audio_info in enumerate(pending_audios, 1):
        progress_bar.progress(idx / total)
        status_text.text(f"Processing {idx}/{total}...")

        result = asr_manager.transcribe_audio(audio_path, audio_type)

        if result["text"]:
            db_manager.db_update_asr_text(...)
            success_count += 1

        db_manager.db_mark_audio_asr_indexed(audio_filename, success=True)

    st.success(f"Completed {success_count} files")

ASR Model Testing

# Test with example files (en.mp3, zh.mp3, ja.mp3, ko.mp3, yue.mp3)
test_files = {
    "English": "en.mp3",
    "Chinese": "zh.mp3",
    "Japanese": "ja.mp3",
    "Korean": "ko.mp3",
    "Cantonese": "yue.mp3",
}

if st.button("Run ASR Test"):
    result = asr_manager.transcribe_audio(test_path, "test")
    st.text_area("Transcription:", value=result['text'])
    st.caption(f"Language: {result['language']}")

🐛 Critical Bugs Fixed / 关键 Bug 修复

Bug #1: Architecture Flaw - Audio Data in Wrong Table

Issue: Audio ASR data stored in video_text table caused:

100+ duplicate records per 15-minute video
Audio incorrectly associated with application windows
Confusing search results (audio matched to wrong apps)

Fix: Created independent audio_text table with timestamp-based keys

Impact:

✅ Eliminated data redundancy
✅ Correct time-based audio association
✅ Clean data model

Files Changed: windrecorder/db_manager.py (lines 142-170, 316-369, 935-958, 1086-1147)

Bug #2: Non-Existent Function Call

Issue: Three functions called utils.get_datetime_from_filename() which doesn't exist

Affected Functions:

db_add_audiofile() (line 960-994)
db_mark_audio_asr_indexed() (line 1056-1084)
db_update_asr_text() (line 1086-1147)

Fix: Extract timestamp directly from filename

# OLD (broken)
insert_datetime = utils.get_datetime_from_filename(audiofile_name.replace(f"_{audio_type}.wav", ""))

# NEW (fixed)
timestamp_str = audiofile_name[:19]  # YYYY-MM-DD_HH-MM-SS
insert_datetime = utils.dtstr_to_datetime(timestamp_str)

Files Changed: windrecorder/db_manager.py

Bug #3: Wrong File Extension Assumption

Issue: Code assumed .wav format, actual files are .mp3

Error:

ValueError: unconverted data remains: _mic.mp3

Fix: Extract timestamp from filename (first 19 characters) without extension manipulation

Files Changed: windrecorder/db_manager.py

Bug #4: Local Import Scope Error

Issue: import os placed inside local scope in recording.py

Error:

UnboundLocalError: local variable 'os' referenced before assignment

Fix: Moved import os to top of file

Files Changed: windrecorder/ui/recording.py

📦 Dependencies / 依赖

New Dependencies / 新增依赖

Added to pyproject.toml:

funasr      # FunASR framework for ASR
modelscope  # Model downloading and management

Installation:

python -m poetry install

Model Download (automatic on first use):

SenseVoiceSmall (~900MB)
Downloaded from ModelScope on first ASR run
Cached in ~/.cache/modelscope/

⚙️ Configuration / 配置

New Config Fields / 新增配置项

File: windrecorder/config_src/config_default.json

{
  // Basic audio settings
  "enable_audio_recording": false,       // Default: OFF (user must enable)
  "record_system_audio": true,
  "record_mic_audio": false,
  "record_audios_dir": "audios",
  "audio_store_day": 7,                  // Delete after 7 days

  // ASR settings
  "enable_audio_asr": true,
  "asr_engine": "sensevoice",
  "asr_model_dir": "iic/SenseVoiceSmall",
  "asr_use_gpu": false,
  "asr_language": "auto",
  "asr_use_itn": true,                   // Inverse text normalization (punctuation)
  "asr_ban_emo_unk": false,              // Force emotion tags
  "asr_batch_size_s": 60,
  "asr_merge_vad": true,
  "asr_merge_length_s": 15,

  // Processing control
  "asr_processing_paused": false,        // Pause ASR processing
  "batch_size_asr_in_idle": 3,           // Files per idle cycle

  // Filtering
  "asr_music_filter_keywords": [],
  "asr_min_text_length": 5,
  "asr_repetitive_threshold": 0.6,

  // Audio devices
  "system_audio_device_name": "立体声混音 (Stereo Mix)",
  "mic_audio_device_name": "麦克风 (Microphone)",
  "audio_sample_rate": 16000,
  "audio_channels": 1,

  // Locks
  "asr_lock_name": "LOCK_FILE_ASR.MD"
}

Total: 25 new configuration fields

🎯 Known Issues / 已知问题

Current Limitations / 当前局限

No Speaker Diarization / 无说话者分离
- Multiple speakers mixed in transcription
- Cannot distinguish who said what
- Future: Integrate pyannote.audio
Music Interference / 音乐干扰
- System audio affected by background music/videos
- Filtering not 100% accurate
- Future: Add music detection model (musicnn)
No Real-time ASR / 无实时 ASR
- ASR only during idle maintenance
- 15-minute delay before transcription
- Future: Optional real-time mode
Device Name Localization / 设备名称本地化
- Device names vary by system language
- Default "立体声混音" may not match English systems
- Workaround: Use device detection in WebUI

Migration Notes / 迁移说明

IMPORTANT: This PR changes the database schema for ASR data.

For existing users:

Old ASR data in video_text table will be ignored
New ASR data will go to audio_text table
Recommendation: Clear old ASR data (if any) and re-transcribe

Database upgrade:

_db_ensure_audio_text_table_exist() auto-creates new table
No manual migration needed
Old data remains but won't be used

🙏 Credits / 致谢

This feature relies on excellent open-source projects:

FunASR: modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
SenseVoiceSmall: https://modelscope.cn/models/iic/SenseVoiceSmall
ModelScope: https://modelscope.cn

Ready for testing and review! 🎉
@Antonoko

Generated: 2025-10-30
Base Commit: 92be2ed
PR Author: [yuansui486]

Execute generate_offsline_deps.bat to generate offsine_deps offline dependency package Then implement offline installation by executing install_offline.bat

Add offline installation documentation description

yuansui486 · 2025-10-30T07:53:08Z

目前还遇到一些问题：首先是asr音频数据与视频的匹配以及查询问题；此外，funasr在本项目中的运行遇到一些问题，我采用了一个解决方案：在webui.py中，提前导入一次torch，这样可以规避导入funasr中出现的问题😢，尽管这不是一个完美的解决方案；为了方便我的开发，暂时没有把音频录制功能以及asr功能作为拓展组件写入；在数据的展示（查询与webui的展示）还有一些欠妥的地方。本人技术能力有限，辛苦 @Antonoko的test与review了。（😐刚刚发现我new branch的时候没注意连同之前的git 记录一块带进来了☹）

yuansui486 and others added 7 commits April 6, 2025 19:54

feat Support offline installation

d3da4fa

Execute generate_offsline_deps.bat to generate offsine_deps offline dependency package Then implement offline installation by executing install_offline.bat

Update README.md

9034f6c

Add offline installation documentation description

Update README-sc.md

7f24528

Add offline installation documentation description

Merge branch 'yuka-friends:main' into main

92be2ed

add audio support

ee738f6

(feat)Audio Recording & ASR

442f14a

(feat)Audio Recording & ASR

96eb620

yuansui486 mentioned this pull request Oct 30, 2025

录制声音的支持 #290

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Feature)Audio Recording & ASR 新增音频录制和自动语音识别功能#291

(Feature)Audio Recording & ASR 新增音频录制和自动语音识别功能#291
yuansui486 wants to merge 7 commits intoyuka-friends:mainfrom
yuansui486:add-audio-asr

yuansui486 commented Oct 30, 2025

Uh oh!

yuansui486 commented Oct 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuansui486 commented Oct 30, 2025

Audio Recording & ASR Feature - Pull Request Documentation

📋 Overview / 概述

✨ Key Features / 核心功能

1. Dual-Track Audio Recording / 双轨音频录制

2. Automatic Speech Recognition (ASR) / 自动语音识别

3. Independent Audio-Text Database / 独立音频文本数据库

4. Intelligent Noise Filtering / 智能噪音过滤

5. Integrated Search / 集成搜索

6. WebUI Enhancements / WebUI 增强

🏗️ Architecture Changes / 架构变更

Database Schema / 数据库架构

❌ OLD (Before this PR): Flawed Architecture

✅ NEW (This PR): Corrected Architecture

Search Query / 搜索查询

📂 File Changes / 文件变更

Modified Files / 修改的文件

New Files / 新增文件

🔧 Technical Implementation / 技术实现

1. Audio Recording Pipeline / 音频录制流程

2. ASR Processing / ASR 处理

transcribe_audio(audio_path, audio_type)

process_pending_audio_files(batch_size)

3. Database Integration / 数据库集成

Critical Fix: db_update_asr_text() (Lines 1086-1147)

Critical Fix: db_search_data() (Lines 316-369)

4. Noise Filtering / 噪音过滤

5. WebUI Integration / WebUI 集成

Audio Device Detection

Device Testing

Manual ASR Transcription

ASR Model Testing

🐛 Critical Bugs Fixed / 关键 Bug 修复

Bug #1: Architecture Flaw - Audio Data in Wrong Table

Bug #2: Non-Existent Function Call

Bug #3: Wrong File Extension Assumption

Bug #4: Local Import Scope Error

📦 Dependencies / 依赖

New Dependencies / 新增依赖

⚙️ Configuration / 配置

New Config Fields / 新增配置项

🎯 Known Issues / 已知问题

Current Limitations / 当前局限

Migration Notes / 迁移说明

🙏 Credits / 致谢

Ready for testing and review! 🎉 @Antonoko

Uh oh!

yuansui486 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`transcribe_audio(audio_path, audio_type)`

`process_pending_audio_files(batch_size)`

Critical Fix: `db_update_asr_text()` (Lines 1086-1147)

Critical Fix: `db_search_data()` (Lines 316-369)

Ready for testing and review! 🎉
@Antonoko

yuansui486 commented Oct 30, 2025 •

edited

Loading