Claude/implementation testing w2x cb #81

mirai-gpro · 2026-02-05T11:40:53Z

No description provided.

…n testing Cloned official repositories to analyze expression data format and WebGL renderer implementation for TTS lip-sync integration. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Key changes: 1. Disable expressionUpdateInterval timer to avoid race condition with renderer's 60fps getExpressionData() calls 2. Move all sync logic to getExpressionData() - now uses TTS player's currentTime for frame selection 3. Add frames directly to frameBuffer in queueExpressionFrames() instead of using frameQueue 4. Clear frameBuffer when starting new speech to prevent stale frame accumulation 5. Add clearFrameBuffer() public method for external control Root cause identified: - Two mechanisms (30fps timer + 60fps renderer calls) were competing to update expressionData - frameQueue wasn't populated before TTS play event fired due to async REST API timing Also added OpenAvatarChat reference for official WebGL SDK investigation. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

The browser's audio.paused property was unreliable because pause events can fire immediately after play events (buffering/browser quirks). This caused the lip sync to fall back to legacy timing mode instead of using TTS currentTime for frame selection. Changes: - Add ttsActive flag to track play/ended state ourselves - Don't reset ttsActive on pause events (only on ended) - Use ttsActive flag instead of paused property in getExpressionData() - Allow sync to start even when currentTime is 0 (just started) - Add stopTtsSync() method for manual interruption - Reset ttsActive in clearFrameBuffer() https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Two issues fixed: 1. Legacy mode was running simultaneously with TTS-Sync mode, causing lip sync to stop prematurely when legacy timer completed (~1.5s) even though TTS audio was still playing (~2.4s). Fix: Skip legacy mode when ttsActive is true in external sync mode. 2. When frame buffer exhausted before TTS ended, expression would reset because code fell through without returning. Fix: Return last frame expression while TTS is still playing. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

In external TTS sync mode, the play/ended handlers no longer call startFramePlaybackFromQueue() or stopFramePlayback(). These legacy methods were causing issues with multiple sequential TTS playbacks: 1. startFramePlaybackFromQueue() clears frameQueue after flattening, so subsequent TTS plays would find an empty queue even though frameBuffer had frames added by queueExpressionFrames(). 2. stopFramePlayback() resets state that could interfere with the next TTS play starting immediately after. Now TTS-Sync mode relies purely on: - ttsActive flag for tracking play/ended state - frameBuffer populated by queueExpressionFrames() - getExpressionData() using currentTime to select frames https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Update app.py to detect LAM_Audio2Expression at multiple locations (env var, sibling dir, OpenAvatarChat submodule) - Add model path auto-detection for wav2vec2 and LAM weights - Update requirements.txt with PyTorch and transformers dependencies - Add run_local.sh for easy local testing with proper env setup The service now runs with the real Audio2Expression model in inference mode (not mock mode) for proper lip sync based on phoneme analysis. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Issues fixed: 1. Frame buffer was being cleared in getExpressionData() when TTS ended, causing subsequent segments to have no frames 2. Expression API was called in parallel for all segments, causing frame buffer to contain mixed frames from multiple segments Changes: - LAMAvatar.astro: Remove buffer clearing on TTS ended (let controller manage buffer lifecycle) - concierge-controller.ts: - Only clear buffer on isStart=true (new speech start) - For remaining sentences, call expression API right before playback (not in parallel) to ensure each segment has its own frames - Clear buffer before each subsequent segment plays This ensures each audio segment has the correct frames in the buffer when it plays, regardless of how many segments are in the response. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

The LAM Audio2Expression model can only process ~64 frames (~2 seconds) per inference call. For longer audio, we need to use streaming mode properly by: 1. Added process_full_audio() method that: - Splits audio into 1-second chunks - Calls infer_streaming_audio() for each chunk with context - Maintains streaming context between calls - Concatenates all expression outputs 2. Updated API endpoint to use process_full_audio() instead of process_audio() for single-chunk processing This ensures that a 19.7s audio generates ~591 frames (at 30fps) instead of only 73 frames from a single inference call. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Update sample rate handling: 24000Hz API input (official AvatarLAMConfig default) Model internally resamples to 16000Hz via infer_streaming_audio - Pass ssr (source sample rate) properly through process_audio -> infer_streaming_audio - Auto-detect MP3 native sample rate instead of forcing resampling - Update process_full_audio to accept and propagate sample_rate parameter - Update WebSocket handler to support sample_rate in messages - Update mock expression to use correct sample rate for frame calculation Reference: OpenAvatarChat/src/handlers/avatar/lam/avatar_handler_lam_audio2expression.py https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Includes environment variables for model paths: - LAM_A2E_PATH: LAM_Audio2Expression code - LAM_WEIGHT_PATH: lam_audio2exp_streaming.tar weights - WAV2VEC_PATH: wav2vec2-base-960h encoder Requires 4Gi memory and 2 CPU for PyTorch inference. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Following official OpenAvatarChat documentation: - "只有vad和asr运行在本地gpu，对机器性能依赖很轻" - Audio2Expression runs on CPU (light resource usage) Includes: - LAM_Audio2Expression code - Model files (wav2vec2-base-960h, lam_audio2exp_streaming.tar) - pydub for MP3 decoding - All required dependencies for CPU inference https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Add /app/LAM_Audio2Expression to path candidates - Add /app/models/ paths for model weights and wav2vec2 - Update .gitignore to exclude large model files https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Add start.sh to download models from GCS on startup - Update Dockerfile to install google-cloud-cli - Remove COPY models/ from Dockerfile (models loaded at runtime) https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Cloud Build needs LAM_Audio2Expression for the Docker image, but it's in .gitignore for development. The .gcloudignore allows Cloud Build to upload the directory while keeping it out of git. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Include LAM_Audio2Expression directory in git (removed from .gitignore) - CPU modifications in engines/infer.py: - Added get_device() function for CPU/GPU detection - Changed model.cuda() to model.to(self.device) - Changed torch.load() to use map_location=self.device - Changed all .cuda(non_blocking=True) to .to(self.device) This allows Cloud Build to include the modified code directly. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

PyTorch 2.6 changed the default value of weights_only from False to True. This fix allows loading model checkpoints without the weights_only error. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

…ory to 4Gi

The lam_audio2exp_streaming.tar in GCS is a regular tar file, not gzipped. The previous code incorrectly: 1. Saved it as .tar.gz 2. Tried to gunzip it (which fails) 3. Resulted in missing model file -> mock mode https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Root cause analysis: - GCS model file (356MB) differs from correct local file (390MB) - PyTorch fails with "filename 'storages' not found" when loading corrupted checkpoint - save_path directory may not exist in Docker container Fixes: - Add model file size verification in start.sh (warns if <350MB) - Ensure save_path directory exists before model initialization - Add fix_gcs_model.sh script to re-upload correct model to GCS https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

The LAM_audio2exp_streaming.tar from HuggingFace/OSS is gzip compressed (~356MB) and needs decompression to ~390MB for torch.load to work. Changes: - Auto-detect gzip compression using 'file' command - Decompress in-place if needed - Verify final size is ~390MB after decompression https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Increase timeout from 1200s to 3600s (60 minutes) - Add E2_HIGHCPU_8 machine type for faster builds and network - Add 100GB disk size for large Docker image - Consolidate push steps using --all-tags https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Modified start.sh to start uvicorn immediately while downloading models in background - Modified app.py to initialize model asynchronously after models are ready - Health endpoint always returns 'ok' for Cloud Run liveness probe - Model downloads and initialization happen in parallel with server startup - Server starts in mock mode and switches to inference mode when model is ready https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Start uvicorn in background first for immediate port binding - Run download_models in foreground so Cloud Run captures all logs - Add error handling for gsutil failures - Wait for uvicorn process at the end https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Replaced gsutil bash commands with google-cloud-storage Python library for more reliable model downloading in Cloud Run environment: - Added google-cloud-storage to requirements.txt - Added download_models_from_gcs() function in app.py - Simplified start.sh to just start uvicorn - Python GCS client uses Application Default Credentials automatically https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Major refactoring to use Cloud Run Gen 2 with GCS FUSE mount: - app.py: Use lifespan context, read models from /mnt/models/audio2exp - cloudbuild.yaml: Add --execution-environment gen2, --cpu-boost, --add-volume for GCS bucket mount, increase memory to 8Gi - requirements.txt: Remove google-cloud-storage (not needed with FUSE) Models are mounted as filesystem at /mnt/models, eliminating download time and authentication issues. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

The file in GCS uses uppercase LAM prefix. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Cloud Run memory quota is 40GB per region. With 8Gi memory per instance, 10 instances would require 80GB, exceeding the quota. Reduced to 5 instances (40GB) to comply. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

8Gi × 4 = 32GB, well under 40GB quota limit. Following Gemini's recommendation to avoid edge cases. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Shows mount path contents, model file existence, and engine state. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- Dockerfile: Download models during Cloud Build, COPY into image - cloudbuild.yaml: Add GCS download step, remove FUSE configuration - app.py: Simplify with step-by-step initialization error tracking This eliminates FUSE complexity and runtime model loading issues. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

- LAM_Audio2Expression_HANDOFF.md: Complete project overview and technical architecture - ANALYSIS_REQUEST.md: Specific analysis requests with reflection on previous AI's mistakes These documents are for handing off the task to another model for deeper analysis. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

claude added 30 commits February 5, 2026 11:26

Add LAM_WebRender and LAM_Audio2Expression for lip-sync implementatio…

3fded76

…n testing Cloned official repositories to analyze expression data format and WebGL renderer implementation for TTS lip-sync integration. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Add .gitignore to audio2exp-service to exclude exp/ logs

4a65f2b

https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Add Docker container paths for LAM model deployment

75e00b0

- Add /app/LAM_Audio2Expression to path candidates - Add /app/models/ paths for model weights and wav2vec2 - Update .gitignore to exclude large model files https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Use GCS for model storage instead of embedding in Docker image

07da7dc

- Add start.sh to download models from GCS on startup - Update Dockerfile to install google-cloud-cli - Remove COPY models/ from Dockerfile (models loaded at runtime) https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Fix PyTorch 2.6 compatibility: add weights_only=False to torch.load

47d0e26

PyTorch 2.6 changed the default value of weights_only from False to True. This fix allows loading model checkpoints without the weights_only error. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Add --no-cache to Docker build to prevent stale code deployment

49bdd39

Fix cloudbuild.yaml: use BUILD_ID instead of COMMIT_SHA, increase mem…

0b722bf

…ory to 4Gi

Fix model loading: decompress gzipped model file after download

5d4efd4

Increase Cloud Build timeout to 90 min for deployment step

5defd58

https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Fix model filename to match GCS: LAM_audio2exp_streaming.tar

3248d6b

The file in GCS uses uppercase LAM prefix. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

claude added 5 commits February 7, 2026 10:18

Reduce max-instances to 5 for Cloud Run quota limits

2882ffa

Cloud Run memory quota is 40GB per region. With 8Gi memory per instance, 10 instances would require 80GB, exceeding the quota. Reduced to 5 instances (40GB) to comply. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Reduce max-instances to 4 for safer quota margin

348581a

8Gi × 4 = 32GB, well under 40GB quota limit. Following Gemini's recommendation to avoid edge cases. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Add /debug endpoint for Cloud Run troubleshooting

db3ab05

Shows mount path contents, model file existence, and engine state. https://claude.ai/code/session_01JWNLRvnwsRuDRVFzGGC37z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/implementation testing w2x cb #81

Claude/implementation testing w2x cb #81

Uh oh!

mirai-gpro commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Claude/implementation testing w2x cb #81

Are you sure you want to change the base?

Claude/implementation testing w2x cb #81

Uh oh!

Conversation

mirai-gpro commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants