(Feature)Audio Recording & ASR 新增音频录制和自动语音识别功能#291
Open
yuansui486 wants to merge 7 commits intoyuka-friends:mainfrom
Open
(Feature)Audio Recording & ASR 新增音频录制和自动语音识别功能#291yuansui486 wants to merge 7 commits intoyuka-friends:mainfrom
yuansui486 wants to merge 7 commits intoyuka-friends:mainfrom
Conversation
Execute generate_offsline_deps.bat to generate offsine_deps offline dependency package Then implement offline installation by executing install_offline.bat
Add offline installation documentation description
Add offline installation documentation description
Open
Contributor
Author
|
目前还遇到一些问题:首先是asr音频数据与视频的匹配以及查询问题;此外,funasr在本项目中的运行遇到一些问题,我采用了一个解决方案:在webui.py中,提前导入一次torch,这样可以规避导入funasr中出现的问题😢,尽管这不是一个完美的解决方案;为了方便我的开发,暂时没有把音频录制功能以及asr功能作为拓展组件写入;在数据的展示(查询与webui的展示)还有一些欠妥的地方。本人技术能力有限,辛苦 @Antonoko的test与review了。 (😐刚刚发现我new branch的时候没注意连同之前的git 记录一块带进来了☹) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Audio Recording & ASR Feature - Pull Request Documentation
📋 Overview / 概述
This PR adds comprehensive audio recording and automatic speech recognition (ASR) capabilities to Windrecorder, enabling users to search not only through screen OCR text but also through spoken audio content.Mainly provides a solution for #290
本 PR 为 Windrecorder 添加了完整的音频录制和自动语音识别(ASR)功能,使用户不仅可以搜索屏幕 OCR 文本,还可以搜索语音内容。主要为 #290 提供解决方案 。
Base Commit:
92be2ed(Merge branch 'yuka-friends:main' into main)✨ Key Features / 核心功能
1. Dual-Track Audio Recording / 双轨音频录制
2. Automatic Speech Recognition (ASR) / 自动语音识别
3. Independent Audio-Text Database / 独立音频文本数据库
audio_texttable for time-based audio data / 新增独立audio_text表存储时间级音频数据4. Intelligent Noise Filtering / 智能噪音过滤
5. Integrated Search / 集成搜索
6. WebUI Enhancements / WebUI 增强
🏗️ Architecture Changes / 架构变更
Database Schema / 数据库架构
❌ OLD (Before this PR): Flawed Architecture
Issues with old architecture:
✅ NEW (This PR): Corrected Architecture
1.
video_texttable (Window-level data):2.
audio_texttable (Time-level data, NEW):3.
audiofile_statetable (Audio file metadata):Benefits of new architecture:
Search Query / 搜索查询
OLD (Flawed):
NEW (Corrected):
Benefits:
📂 File Changes / 文件变更
Modified Files / 修改的文件
windrecorder/db_manager.pywindrecorder/record.pywindrecorder/ui/recording.pywindrecorder/ui/components.pywindrecorder/ui/oneday.pywindrecorder/ui/search.pywindrecorder/ui/state.pywindrecorder/config.pywindrecorder/utils.pyrecord_screen.pywebui.pyconfig_default.jsonlanguages.jsonpyproject.tomlNew Files / 新增文件
windrecorder/asr_manager.pywindrecorder/config_src/example/*.mp3Total Changes: 3,216 insertions, 161 deletions across 23 files
🔧 Technical Implementation / 技术实现
1. Audio Recording Pipeline / 音频录制流程
Implementation:
windrecorder/record.py:47-140Storage:
userdata/audios/YYYY-MM/YYYY-MM-DD_HH-MM-SS_{system|mic}.mp32. ASR Processing / ASR 处理
Implementation:
windrecorder/asr_manager.py(361 lines)Key Methods:
transcribe_audio(audio_path, audio_type)process_pending_audio_files(batch_size)3. Database Integration / 数据库集成
Implementation:
windrecorder/db_manager.pyCritical Fix:
db_update_asr_text()(Lines 1086-1147)OLD (Flawed - wrote to video_text):
NEW (Corrected - writes to audio_text):
Key Improvements:
Critical Fix:
db_search_data()(Lines 316-369)Changes:
Benefits:
v.ora.)4. Noise Filtering / 噪音过滤
Implementation:
windrecorder/asr_manager.py:106-157Three-tier filtering:
Effectiveness:
5. WebUI Integration / WebUI 集成
Implementation:
windrecorder/ui/recording.py:342-661Key Features:
Audio Device Detection
Device Testing
Manual ASR Transcription
ASR Model Testing
🐛 Critical Bugs Fixed / 关键 Bug 修复
Bug #1: Architecture Flaw - Audio Data in Wrong Table
Issue: Audio ASR data stored in
video_texttable caused:Fix: Created independent
audio_texttable with timestamp-based keysImpact:
Files Changed:
windrecorder/db_manager.py(lines 142-170, 316-369, 935-958, 1086-1147)Bug #2: Non-Existent Function Call
Issue: Three functions called
utils.get_datetime_from_filename()which doesn't existAffected Functions:
db_add_audiofile()(line 960-994)db_mark_audio_asr_indexed()(line 1056-1084)db_update_asr_text()(line 1086-1147)Fix: Extract timestamp directly from filename
Files Changed:
windrecorder/db_manager.pyBug #3: Wrong File Extension Assumption
Issue: Code assumed
.wavformat, actual files are.mp3Error:
Fix: Extract timestamp from filename (first 19 characters) without extension manipulation
Files Changed:
windrecorder/db_manager.pyBug #4: Local Import Scope Error
Issue:
import osplaced inside local scope inrecording.pyError:
Fix: Moved
import osto top of fileFiles Changed:
windrecorder/ui/recording.py📦 Dependencies / 依赖
New Dependencies / 新增依赖
Added to
pyproject.toml:Installation:
Model Download (automatic on first use):
~/.cache/modelscope/⚙️ Configuration / 配置
New Config Fields / 新增配置项
File:
windrecorder/config_src/config_default.json{ // Basic audio settings "enable_audio_recording": false, // Default: OFF (user must enable) "record_system_audio": true, "record_mic_audio": false, "record_audios_dir": "audios", "audio_store_day": 7, // Delete after 7 days // ASR settings "enable_audio_asr": true, "asr_engine": "sensevoice", "asr_model_dir": "iic/SenseVoiceSmall", "asr_use_gpu": false, "asr_language": "auto", "asr_use_itn": true, // Inverse text normalization (punctuation) "asr_ban_emo_unk": false, // Force emotion tags "asr_batch_size_s": 60, "asr_merge_vad": true, "asr_merge_length_s": 15, // Processing control "asr_processing_paused": false, // Pause ASR processing "batch_size_asr_in_idle": 3, // Files per idle cycle // Filtering "asr_music_filter_keywords": [], "asr_min_text_length": 5, "asr_repetitive_threshold": 0.6, // Audio devices "system_audio_device_name": "立体声混音 (Stereo Mix)", "mic_audio_device_name": "麦克风 (Microphone)", "audio_sample_rate": 16000, "audio_channels": 1, // Locks "asr_lock_name": "LOCK_FILE_ASR.MD" }Total: 25 new configuration fields
🎯 Known Issues / 已知问题
Current Limitations / 当前局限
No Speaker Diarization / 无说话者分离
Music Interference / 音乐干扰
No Real-time ASR / 无实时 ASR
Device Name Localization / 设备名称本地化
Migration Notes / 迁移说明
IMPORTANT: This PR changes the database schema for ASR data.
For existing users:
video_texttable will be ignoredaudio_texttableDatabase upgrade:
_db_ensure_audio_text_table_exist()auto-creates new table🙏 Credits / 致谢
This feature relies on excellent open-source projects:
Ready for testing and review! 🎉
@Antonoko
Generated: 2025-10-30
Base Commit: 92be2ed
PR Author: [yuansui486]