Claude/implement repo tests m2 smn #80

mirai-gpro · 2026-02-02T06:20:17Z

No description provided.

- Add pytest configuration and test infrastructure - Add unit tests for utils (Registry, CosineWarmupScheduler) - Add unit tests for losses (PixelLoss, TVLoss) - Add unit tests for models (BasicBlock, ConditionBlock, TransformerDecoder) - Add unit tests for datasets (camera utilities) - Support graceful test skipping when PyTorch is not installed https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Add tmp/ directory for HuggingFace downloads - Add *.tar for asset archives - Add .pytest_cache/ for pytest https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Copy src/ and public/ from production gourmet-sp repository - Prepare for LAM_WebRender 3D avatar integration - Keep separate from production environment for safe testing https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Create LAMAvatar.astro component for WebGL Gaussian Splatting - Integrate LAMAvatar into Concierge.astro with toggle option - Add setup documentation in README.md - Support graceful fallback to 2D image when WebGL unavailable Features: - Dynamic import of gaussian-splat-renderer-for-lam NPM package - Expression data API for Audio2Expression integration - Chat state management (Idle/Listening/Thinking/Responding) https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Add lam-websocket-manager.ts with JBIN binary parser - Update LAMAvatar.astro with WebSocket connection management - Add audio playback support for TTS output - Update README with WebSocket integration documentation The WebSocket manager parses JBIN format (4-byte magic + JSON header + binary data) containing ARKit 52-channel expression data from OpenAvatarChat backend. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Standalone service that receives TTS audio from gourmet-sp and generates ARKit expression data (52 channels) for LAMAvatar lip sync. Features: - REST API: POST /api/audio2expression - WebSocket: /ws/{session_id} for real-time streaming - Mock mode when Audio2Expression model not available - Minimal changes required to gourmet-sp Architecture: gourmet-sp (TTS) → REST API → Audio2Expression → WebSocket → Browser https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Dockerfile for containerization - cloudbuild.yaml for CI/CD deployment - gourmet_support_integration.py - sample code for backend integration - Updated README with deployment instructions Deploy with: gcloud builds submit --config cloudbuild.yaml https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Detailed patch showing how to modify app_customer_support.py: - Add session_id parameter to TTS endpoint - Generate LINEAR16 audio for Audio2Expression (in addition to MP3) - Send audio to audio2exp-service asynchronously via threading Also includes frontend change for core-controller.ts to pass session_id. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Add deploy.ps1 matching gourmet-support pattern - Update cloudbuild.yaml region to us-central1 - Update README with PowerShell deployment instructions Deploy with: ./deploy.ps1 https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Full modified version of gourmet-support/app_customer_support.py with: - AUDIO2EXP_SERVICE_URL environment variable - send_to_audio2exp() function for async audio sending - Modified synthesize_speech() to accept session_id and send PCM audio - Health check includes audio2exp status Changes from original: - Line 11: Added 'import requests' - Lines 51-56: Added AUDIO2EXP_SERVICE_URL configuration - Lines 118-137: Added send_to_audio2exp() function - Lines 445-496: Modified synthesize_speech() for Audio2Expression - Line 664: Added audio2exp to health check https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Use uvicorn directly in Dockerfile CMD - Disable model initialization on startup (use mock mode) - Fix PORT environment variable handling https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Add session_id parameter to all 5 TTS fetch calls - Enables Audio2Expression service to receive session context https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Update lam-websocket-manager.ts to handle JSON format from audio2exp-service - Add audio2expWsUrl property to ConciergeController - Add connectLAMAvatarWebSocket method to connect after session init - Enable real-time expression data streaming from backend to frontend https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Handle case where shops is None from JSON parsing failure - Use 'or []' pattern instead of default value to handle both None and missing key https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Add pingInterval and currentWsUrl properties - Send ping every 5 seconds to keep Cloud Run WebSocket alive - Stop ping on disconnect and reconnect https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

…r disappearing - Create createDefaultExpressionData() function with all channels set to 0 - Initialize expressionData with default values instead of empty object - Re-enable expression updates in lam-websocket-manager https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

This preserves the original object reference which may help with renderer stability https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Log expression data receipt with key count and jawOpen value - Add health check timer every 2 seconds to monitor renderer state - Add error handling to expression update callback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

…reference The renderer was modifying the expressionData object directly, causing values to explode exponentially (jawOpen went from 1.0 to 10^75). Now getExpressionData() returns a shallow copy to prevent the renderer's internal modifications from affecting the source data. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Added 'or []' protection to handle case when enrich_shops_with_photos returns None, preventing 'object of type NoneType has no len()' error. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Reduce jaw scaling from rms*10 to rms*2.0 with max 0.7 - Add mouthFunnel and mouthPucker for more natural mouth shapes - Previous formula always returned 1.0 (max), causing no visible variation https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The gaussian-splat-renderer-for-lam only applies lip sync when getChatState() returns 'Responding'. Added setChatState calls to speakTextGCP() to enable lip sync during TTS playback. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Log state transitions in setChatState() - Add current state to Health check log - Helps diagnose if Responding state is being set during TTS https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The previous approach set Responding state before TTS fetch completed, causing it to immediately switch back to Idle. Now using ttsPlayer's play/ended/pause events to accurately track audio playback state. - Add event listeners in init() for play, ended, pause - Remove setChatState calls from speakTextGCP/stopAvatarAnimation - State changes now tied to actual audio playback, not API calls https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Instead of relying on TTS player events (which may not fire correctly due to browser autoplay restrictions), now set state to Responding directly when expression data with jawOpen > 0.01 is received. Uses a 500ms timeout to reset to Idle when expression data stops arriving. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The LAM renderer uses mouthLowerDownLeft/Right as the PRIMARY drivers of mouth movement, not jawOpen. Based on analysis of official demo expression data where mouthLowerDown values are ~0.46 while jawOpen is only ~0.06. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Log mouthLowerDownLeft/Right values in expression updates - Log morphTargetDictionary keys to verify avatar support https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- audio2exp-service: Generate multiple frames based on audio duration (30fps = 1 frame per ~533 audio samples at 16kHz) - lam-websocket-manager: Add ExpressionFrameData type and onExpressionFrames callback for multi-frame support - LAMAvatar: Buffer frames and play back at correct frame rate with startFramePlayback/stopFramePlayback methods This fixes the timing issue where expression data was received all at once before TTS playback started, causing no visible lip sync. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Queue expression frames instead of playing immediately - Add startFramePlaybackFromQueue() public method - Call frame playback from TTS play event handler - This ensures lip sync starts when audio actually plays, not when expression data is received https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

When TTS play event fires but no frames are in queue yet, retry up to 3 times with 200ms delay between retries. This handles the latency difference between direct TTS audio and expression data going through audio2exp WebSocket. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Make stopFramePlayback() public in LAMAvatarController - Call stopFramePlayback() on TTS ended/pause events in concierge-controller - Fixes timing mismatch where frame playback continued after audio ended https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

External dependency cloned for lip sync integration. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Update path to use cloned LAM_Audio2Expression directory - Enable model initialization on startup (CPU mode) - Add proper weight and config path configuration - Add detailed logging for initialization process https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

This patch modifies engines/infer.py to: - Detect available device (CUDA or CPU) - Use .to(device) instead of .cuda() - Add map_location to torch.load() Apply with: cd LAM_Audio2Expression && git apply ../audio2exp-service/cpu_support.patch https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

This implements the official synchronization approach from OpenAvatarChat: 1. audio2exp-service (app.py): - Add JBIN format serializer to bundle audio + expression data - Add batch tracking per session for speech segment handling - Support MP3 audio input (Google Cloud TTS format) - Send bundled JBIN data via WebSocket 2. AudioSyncPlayer (new file): - Web Audio API player with precise timing tracking - Tracks playback position for expression frame sync - Supports batch-based speech segments 3. LAMWebSocketManager: - Parse JBIN bundled data from WebSocket - Store expression frames indexed by batch - Integrate with AudioSyncPlayer for playback - Add getCurrentExpressionFrame() for audio-based sync 4. LAMAvatar.astro: - Support both 'bundled' and 'external' sync modes - External mode: sync with gourmet-sp TTS player - Expression sync timer at 30fps - setExternalTtsPlayer() to link with external audio 5. concierge-controller.ts: - Link external TTS player with LAMAvatar - Send TTS audio to audio2exp-service for expression - Parallel TTS + expression generation The key improvement is that audio and expression are now properly synchronized at the protocol level, following the official approach. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Use separate `import type { AudioSample }` to fix module resolution error in Vite dev server. Interfaces should be imported as types when only used for type checking. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Add queueExpressionFrames() public method to LAMAvatar for external sources to push expression frames - Update sendAudioToExpression() in concierge-controller to pass expression data from REST API to LAMAvatar's frame queue - This connects the audio2exp-service response to the avatar lip sync The flow is now: 1. TTS audio sent to audio2exp-service REST API 2. API returns expression frames 3. Frames queued to LAMAvatar 4. TTS play event triggers frame playback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The gaussian-splat-renderer doesn't call setExpression() in FLAME mode, so expression data was being ignored. Added applyExpressionToMesh() to directly set splatMesh.bsWeight, bypassing the FLAME animation system. Called from: - applyFrame(): Apply expression during frame playback - resetExpression(): Reset expression when playback ends https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The renderer's FLAME mode uses pre-recorded animation from flame_params, which overwrites our live expression data. Now we: 1. Disable FLAME mode when frame playback starts (setFlameMode(false)) 2. Re-enable FLAME mode when playback stops (setFlameMode(true)) 3. Force texture update via updateBoneMatrixTexture() after setting bsWeight This allows our live expression data to be applied to the mesh. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The previous approach of disabling FLAME mode during lip sync caused renderer crashes because updateFlameBones() still runs and needs flame_params['expr']. Instead, we now inject our expression data directly into flame_params['expr'] at the current frame, so the renderer reads our lip sync data during its normal update loop. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Based on official OpenAvatarChat-WebUI implementation research: - The official API uses getExpressionData callback for expression control - Add useFlame: false option to force callback usage instead of FLAME data - Add comprehensive debug logging to understand renderer state: - FLAME mode status (viewer.useFlame, renderer.useFlame) - flame_params structure and format (array vs object) - bsWeight format and current values - morphTargetDictionary channel names - Update applyExpressionToMesh to detect array vs object format Reference: OpenAvatarChat-WebUI/src/utils/gaussianAvatar.ts https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

This will help determine if the renderer is actually calling our getExpressionData callback during its render loop. The log outputs every 30 calls (~1 second at 30fps) showing current expression values. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

When responses are long, they get split into firstSentence and remainingSentences with separate TTS synthesis. These parallel TTS calls were not calling sendAudioToExpression(), causing lip sync frames to only be generated for the first TTS response. Added sendAudioToExpression() calls for both firstSentence and remainingSentences TTS audio. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Check if renderer has setExpression method - Log morphTargetInfluences on splatMesh - Log all methods on renderer and viewer objects - Check jawOpen and mouthLowerDownLeft indices https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Try multiple methods to apply expression data to the mesh: 1. renderer.setExpression() - if renderer has this method 2. viewer.setExpression() - if viewer has this method 3. splatMesh.bsWeight - direct property setting 4. splatMesh.morphTargetInfluences - Three.js standard approach 5. renderer.expressionData - for callback consistency Removed complex FLAME mode injection logic that wasn't working. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Simplified to match official gaussianAvatar.ts implementation: - Removed useFlame: false option (let renderer decide) - Removed applyExpressionToMesh() method (not in official) - Removed setFlameMode() method (not in official) - Removed debug logging in getExpressionData() - applyFrame() now just updates expressionData (like official) - getExpressionData() just returns expressionData (like official) Official approach: renderer calls getExpressionData() callback during its render loop and handles expression application internally. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Match official processor behavior: - Return undefined when not playing (like processor?.getArkitFaceFrame()?.arkitFace) - Return actual expression data only during playback The renderer may differentiate between undefined and empty data. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Changed sendAudioToExpression calls from fire-and-forget to awaited: - Main speakTextGCP: await expression before setting audio src - Split sentence TTS: await expression within each promise This ensures expression data is queued before audio starts playing, fixing the synchronization issue between mouth movement and speech. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Follow official OpenAvatarChat approach: - Remove setInterval timer for frame playback - Calculate frame index from elapsed time using performance.now() - Record playbackStartTime when playback starts - getExpressionData() returns frame based on elapsed time - Fix undefined useBundledSync -> syncMode check https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

The renderer expects all 52 ARKit channels but audio2exp only returns non-zero channels. Merge frame data with this.expressionData to ensure complete expression is returned. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Copy official getArkitFaceFrame() logic: - Return expressionData always (not undefined) - Create new object each frame: this.expressionData = {} - Calculate frameIndex: Math.floor(calcDelta / frameInfoInternal) https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Backend: - Change response format: channels→names, weights→frames[{weights}] - Add frame_rate field to response Frontend: - Update conversion to match official gaussianAvatar.ts logic https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Use .bind(this) like official code - Add call count logging to getExpressionData() - Log when renderer calls the callback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Download official test_expression_1s.json - Add testMode flag (enabled by default) - Implement exact same loop playback logic as official gaussianAvatar.ts - This will verify if renderer works with official data https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Test with official sample avatar to determine if the issue is with concierge.zip or the renderer/expression logic. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Normalize RMS values to use full dynamic range - Add speech-like variation using sine waves at syllable frequencies - Make jaw and mouth move independently with different timing - Add more expression channels (pucker, smile) https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

- Track if external player already linked - Remove old listeners when switching players - Prevent duplicate play/pause/ended events https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

claude added 30 commits February 2, 2026 04:13

Update .gitignore to exclude downloaded assets and cache

36c6215

- Add tmp/ directory for HuggingFace downloads - Add *.tar for asset archives - Add .pytest_cache/ for pytest https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add gourmet-sp Astro project for LAM integration testing

a18edd3

- Copy src/ and public/ from production gourmet-sp repository - Prepare for LAM_WebRender 3D avatar integration - Keep separate from production environment for safe testing https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add deploy.ps1 and update region to us-central1

90a7572

- Add deploy.ps1 matching gourmet-support pattern - Update cloudbuild.yaml region to us-central1 - Update README with PowerShell deployment instructions Deploy with: ./deploy.ps1 https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Fix audio2exp-service startup for Cloud Run

24b7d6d

- Use uvicorn directly in Dockerfile CMD - Disable model initialization on startup (use mock mode) - Fix PORT environment variable handling https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add session_id to TTS calls in concierge-controller.ts for lip sync

a7dd10e

- Add session_id parameter to all 5 TTS fetch calls - Enables Audio2Expression service to receive session context https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Fix NoneType error in app_customer_support_modified.py

96bd00e

- Handle case where shops is None from JSON parsing failure - Use 'or []' pattern instead of default value to handle both None and missing key https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add WebSocket ping/pong keepalive to prevent connection timeout

c31b39d

- Add pingInterval and currentWsUrl properties - Send ping every 5 seconds to keep Cloud Run WebSocket alive - Stop ping on disconnect and reconnect https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Test: Disable expression update to isolate avatar disappearing issue

18085cf

https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Update expression values in-place instead of replacing object reference

aec1275

This preserves the original object reference which may help with renderer stability https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Fix NoneType error in enrich_shops_with_photos return value

e4fa8b4

Added 'or []' protection to handle case when enrich_shops_with_photos returns None, preventing 'object of type NoneType has no len()' error. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add debug logging for chat state changes in LAMAvatar

086ec55

- Log state transitions in setChatState() - Add current state to Health check log - Helps diagnose if Responding state is being set during TTS https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add debug logging for mouth expression channels

9082bcc

- Log mouthLowerDownLeft/Right values in expression updates - Log morphTargetDictionary keys to verify avatar support https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

claude added 30 commits February 4, 2026 23:47

Add LAM_Audio2Expression to gitignore

4a55292

External dependency cloned for lip sync integration. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Fix AudioSample type import for Vite/TypeScript compatibility

2531b3a

Use separate `import type { AudioSample }` to fix module resolution error in Vite dev server. Interfaces should be imported as types when only used for type checking. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add debug logging to verify renderer calls getExpressionData

2f9448f

- Use .bind(this) like official code - Add call count logging to getExpressionData() - Log when renderer calls the callback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Switch to official sample avatar p2-1.zip for testing

da97e8f

Test with official sample avatar to determine if the issue is with concierge.zip or the renderer/expression logic. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Add official sample avatar p2-1.zip for lip sync testing

9157f28

https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Disable test mode for actual TTS audio sync testing

322e291

https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Fix duplicate TTS event listener registration

0248197

- Track if external player already linked - Remove old listeners when switching players - Prevent duplicate play/pause/ended events https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/implement repo tests m2 smn #80

Claude/implement repo tests m2 smn #80

Uh oh!

mirai-gpro commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Claude/implement repo tests m2 smn #80

Are you sure you want to change the base?

Claude/implement repo tests m2 smn #80

Uh oh!

Conversation

mirai-gpro commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants