-
Notifications
You must be signed in to change notification settings - Fork 85
Claude/implement repo tests m2 smn #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mirai-gpro
wants to merge
60
commits into
aigc3d:master
Choose a base branch
from
mirai-gpro:claude/implement-repo-tests-m2SMN
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Claude/implement repo tests m2 smn #80
mirai-gpro
wants to merge
60
commits into
aigc3d:master
from
mirai-gpro:claude/implement-repo-tests-m2SMN
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add pytest configuration and test infrastructure - Add unit tests for utils (Registry, CosineWarmupScheduler) - Add unit tests for losses (PixelLoss, TVLoss) - Add unit tests for models (BasicBlock, ConditionBlock, TransformerDecoder) - Add unit tests for datasets (camera utilities) - Support graceful test skipping when PyTorch is not installed https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Add tmp/ directory for HuggingFace downloads - Add *.tar for asset archives - Add .pytest_cache/ for pytest https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Copy src/ and public/ from production gourmet-sp repository - Prepare for LAM_WebRender 3D avatar integration - Keep separate from production environment for safe testing https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Create LAMAvatar.astro component for WebGL Gaussian Splatting - Integrate LAMAvatar into Concierge.astro with toggle option - Add setup documentation in README.md - Support graceful fallback to 2D image when WebGL unavailable Features: - Dynamic import of gaussian-splat-renderer-for-lam NPM package - Expression data API for Audio2Expression integration - Chat state management (Idle/Listening/Thinking/Responding) https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Add lam-websocket-manager.ts with JBIN binary parser - Update LAMAvatar.astro with WebSocket connection management - Add audio playback support for TTS output - Update README with WebSocket integration documentation The WebSocket manager parses JBIN format (4-byte magic + JSON header + binary data) containing ARKit 52-channel expression data from OpenAvatarChat backend. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Standalone service that receives TTS audio from gourmet-sp and generates
ARKit expression data (52 channels) for LAMAvatar lip sync.
Features:
- REST API: POST /api/audio2expression
- WebSocket: /ws/{session_id} for real-time streaming
- Mock mode when Audio2Expression model not available
- Minimal changes required to gourmet-sp
Architecture:
gourmet-sp (TTS) → REST API → Audio2Expression → WebSocket → Browser
https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Dockerfile for containerization - cloudbuild.yaml for CI/CD deployment - gourmet_support_integration.py - sample code for backend integration - Updated README with deployment instructions Deploy with: gcloud builds submit --config cloudbuild.yaml https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Detailed patch showing how to modify app_customer_support.py: - Add session_id parameter to TTS endpoint - Generate LINEAR16 audio for Audio2Expression (in addition to MP3) - Send audio to audio2exp-service asynchronously via threading Also includes frontend change for core-controller.ts to pass session_id. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Add deploy.ps1 matching gourmet-support pattern - Update cloudbuild.yaml region to us-central1 - Update README with PowerShell deployment instructions Deploy with: ./deploy.ps1 https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Full modified version of gourmet-support/app_customer_support.py with: - AUDIO2EXP_SERVICE_URL environment variable - send_to_audio2exp() function for async audio sending - Modified synthesize_speech() to accept session_id and send PCM audio - Health check includes audio2exp status Changes from original: - Line 11: Added 'import requests' - Lines 51-56: Added AUDIO2EXP_SERVICE_URL configuration - Lines 118-137: Added send_to_audio2exp() function - Lines 445-496: Modified synthesize_speech() for Audio2Expression - Line 664: Added audio2exp to health check https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Use uvicorn directly in Dockerfile CMD - Disable model initialization on startup (use mock mode) - Fix PORT environment variable handling https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Add session_id parameter to all 5 TTS fetch calls - Enables Audio2Expression service to receive session context https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Update lam-websocket-manager.ts to handle JSON format from audio2exp-service - Add audio2expWsUrl property to ConciergeController - Add connectLAMAvatarWebSocket method to connect after session init - Enable real-time expression data streaming from backend to frontend https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Handle case where shops is None from JSON parsing failure - Use 'or []' pattern instead of default value to handle both None and missing key https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Add pingInterval and currentWsUrl properties - Send ping every 5 seconds to keep Cloud Run WebSocket alive - Stop ping on disconnect and reconnect https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
…r disappearing - Create createDefaultExpressionData() function with all channels set to 0 - Initialize expressionData with default values instead of empty object - Re-enable expression updates in lam-websocket-manager https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
This preserves the original object reference which may help with renderer stability https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Log expression data receipt with key count and jawOpen value - Add health check timer every 2 seconds to monitor renderer state - Add error handling to expression update callback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
…reference The renderer was modifying the expressionData object directly, causing values to explode exponentially (jawOpen went from 1.0 to 10^75). Now getExpressionData() returns a shallow copy to prevent the renderer's internal modifications from affecting the source data. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Added 'or []' protection to handle case when enrich_shops_with_photos returns None, preventing 'object of type NoneType has no len()' error. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Reduce jaw scaling from rms*10 to rms*2.0 with max 0.7 - Add mouthFunnel and mouthPucker for more natural mouth shapes - Previous formula always returned 1.0 (max), causing no visible variation https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The gaussian-splat-renderer-for-lam only applies lip sync when getChatState() returns 'Responding'. Added setChatState calls to speakTextGCP() to enable lip sync during TTS playback. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Log state transitions in setChatState() - Add current state to Health check log - Helps diagnose if Responding state is being set during TTS https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The previous approach set Responding state before TTS fetch completed, causing it to immediately switch back to Idle. Now using ttsPlayer's play/ended/pause events to accurately track audio playback state. - Add event listeners in init() for play, ended, pause - Remove setChatState calls from speakTextGCP/stopAvatarAnimation - State changes now tied to actual audio playback, not API calls https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Instead of relying on TTS player events (which may not fire correctly due to browser autoplay restrictions), now set state to Responding directly when expression data with jawOpen > 0.01 is received. Uses a 500ms timeout to reset to Idle when expression data stops arriving. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The LAM renderer uses mouthLowerDownLeft/Right as the PRIMARY drivers of mouth movement, not jawOpen. Based on analysis of official demo expression data where mouthLowerDown values are ~0.46 while jawOpen is only ~0.06. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Log mouthLowerDownLeft/Right values in expression updates - Log morphTargetDictionary keys to verify avatar support https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- audio2exp-service: Generate multiple frames based on audio duration (30fps = 1 frame per ~533 audio samples at 16kHz) - lam-websocket-manager: Add ExpressionFrameData type and onExpressionFrames callback for multi-frame support - LAMAvatar: Buffer frames and play back at correct frame rate with startFramePlayback/stopFramePlayback methods This fixes the timing issue where expression data was received all at once before TTS playback started, causing no visible lip sync. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Queue expression frames instead of playing immediately - Add startFramePlaybackFromQueue() public method - Call frame playback from TTS play event handler - This ensures lip sync starts when audio actually plays, not when expression data is received https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
When TTS play event fires but no frames are in queue yet, retry up to 3 times with 200ms delay between retries. This handles the latency difference between direct TTS audio and expression data going through audio2exp WebSocket. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Make stopFramePlayback() public in LAMAvatarController - Call stopFramePlayback() on TTS ended/pause events in concierge-controller - Fixes timing mismatch where frame playback continued after audio ended https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
External dependency cloned for lip sync integration. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Update path to use cloned LAM_Audio2Expression directory - Enable model initialization on startup (CPU mode) - Add proper weight and config path configuration - Add detailed logging for initialization process https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
This patch modifies engines/infer.py to: - Detect available device (CUDA or CPU) - Use .to(device) instead of .cuda() - Add map_location to torch.load() Apply with: cd LAM_Audio2Expression && git apply ../audio2exp-service/cpu_support.patch https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
This implements the official synchronization approach from OpenAvatarChat: 1. audio2exp-service (app.py): - Add JBIN format serializer to bundle audio + expression data - Add batch tracking per session for speech segment handling - Support MP3 audio input (Google Cloud TTS format) - Send bundled JBIN data via WebSocket 2. AudioSyncPlayer (new file): - Web Audio API player with precise timing tracking - Tracks playback position for expression frame sync - Supports batch-based speech segments 3. LAMWebSocketManager: - Parse JBIN bundled data from WebSocket - Store expression frames indexed by batch - Integrate with AudioSyncPlayer for playback - Add getCurrentExpressionFrame() for audio-based sync 4. LAMAvatar.astro: - Support both 'bundled' and 'external' sync modes - External mode: sync with gourmet-sp TTS player - Expression sync timer at 30fps - setExternalTtsPlayer() to link with external audio 5. concierge-controller.ts: - Link external TTS player with LAMAvatar - Send TTS audio to audio2exp-service for expression - Parallel TTS + expression generation The key improvement is that audio and expression are now properly synchronized at the protocol level, following the official approach. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Use separate `import type { AudioSample }` to fix module resolution
error in Vite dev server. Interfaces should be imported as types
when only used for type checking.
https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Add queueExpressionFrames() public method to LAMAvatar for external sources to push expression frames - Update sendAudioToExpression() in concierge-controller to pass expression data from REST API to LAMAvatar's frame queue - This connects the audio2exp-service response to the avatar lip sync The flow is now: 1. TTS audio sent to audio2exp-service REST API 2. API returns expression frames 3. Frames queued to LAMAvatar 4. TTS play event triggers frame playback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The gaussian-splat-renderer doesn't call setExpression() in FLAME mode, so expression data was being ignored. Added applyExpressionToMesh() to directly set splatMesh.bsWeight, bypassing the FLAME animation system. Called from: - applyFrame(): Apply expression during frame playback - resetExpression(): Reset expression when playback ends https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The renderer's FLAME mode uses pre-recorded animation from flame_params, which overwrites our live expression data. Now we: 1. Disable FLAME mode when frame playback starts (setFlameMode(false)) 2. Re-enable FLAME mode when playback stops (setFlameMode(true)) 3. Force texture update via updateBoneMatrixTexture() after setting bsWeight This allows our live expression data to be applied to the mesh. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The previous approach of disabling FLAME mode during lip sync caused renderer crashes because updateFlameBones() still runs and needs flame_params['expr']. Instead, we now inject our expression data directly into flame_params['expr'] at the current frame, so the renderer reads our lip sync data during its normal update loop. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Based on official OpenAvatarChat-WebUI implementation research: - The official API uses getExpressionData callback for expression control - Add useFlame: false option to force callback usage instead of FLAME data - Add comprehensive debug logging to understand renderer state: - FLAME mode status (viewer.useFlame, renderer.useFlame) - flame_params structure and format (array vs object) - bsWeight format and current values - morphTargetDictionary channel names - Update applyExpressionToMesh to detect array vs object format Reference: OpenAvatarChat-WebUI/src/utils/gaussianAvatar.ts https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
This will help determine if the renderer is actually calling our getExpressionData callback during its render loop. The log outputs every 30 calls (~1 second at 30fps) showing current expression values. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
When responses are long, they get split into firstSentence and remainingSentences with separate TTS synthesis. These parallel TTS calls were not calling sendAudioToExpression(), causing lip sync frames to only be generated for the first TTS response. Added sendAudioToExpression() calls for both firstSentence and remainingSentences TTS audio. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Check if renderer has setExpression method - Log morphTargetInfluences on splatMesh - Log all methods on renderer and viewer objects - Check jawOpen and mouthLowerDownLeft indices https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Try multiple methods to apply expression data to the mesh: 1. renderer.setExpression() - if renderer has this method 2. viewer.setExpression() - if viewer has this method 3. splatMesh.bsWeight - direct property setting 4. splatMesh.morphTargetInfluences - Three.js standard approach 5. renderer.expressionData - for callback consistency Removed complex FLAME mode injection logic that wasn't working. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Simplified to match official gaussianAvatar.ts implementation: - Removed useFlame: false option (let renderer decide) - Removed applyExpressionToMesh() method (not in official) - Removed setFlameMode() method (not in official) - Removed debug logging in getExpressionData() - applyFrame() now just updates expressionData (like official) - getExpressionData() just returns expressionData (like official) Official approach: renderer calls getExpressionData() callback during its render loop and handles expression application internally. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Match official processor behavior: - Return undefined when not playing (like processor?.getArkitFaceFrame()?.arkitFace) - Return actual expression data only during playback The renderer may differentiate between undefined and empty data. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Changed sendAudioToExpression calls from fire-and-forget to awaited: - Main speakTextGCP: await expression before setting audio src - Split sentence TTS: await expression within each promise This ensures expression data is queued before audio starts playing, fixing the synchronization issue between mouth movement and speech. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Follow official OpenAvatarChat approach: - Remove setInterval timer for frame playback - Calculate frame index from elapsed time using performance.now() - Record playbackStartTime when playback starts - getExpressionData() returns frame based on elapsed time - Fix undefined useBundledSync -> syncMode check https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
The renderer expects all 52 ARKit channels but audio2exp only returns non-zero channels. Merge frame data with this.expressionData to ensure complete expression is returned. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Copy official getArkitFaceFrame() logic:
- Return expressionData always (not undefined)
- Create new object each frame: this.expressionData = {}
- Calculate frameIndex: Math.floor(calcDelta / frameInfoInternal)
https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Backend:
- Change response format: channels→names, weights→frames[{weights}]
- Add frame_rate field to response
Frontend:
- Update conversion to match official gaussianAvatar.ts logic
https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Use .bind(this) like official code - Add call count logging to getExpressionData() - Log when renderer calls the callback https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Download official test_expression_1s.json - Add testMode flag (enabled by default) - Implement exact same loop playback logic as official gaussianAvatar.ts - This will verify if renderer works with official data https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Test with official sample avatar to determine if the issue is with concierge.zip or the renderer/expression logic. https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Normalize RMS values to use full dynamic range - Add speech-like variation using sine waves at syllable frequencies - Make jaw and mouth move independently with different timing - Add more expression channels (pucker, smile) https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
- Track if external player already linked - Remove old listeners when switching players - Prevent duplicate play/pause/ended events https://claude.ai/code/session_01BNt1bYnL3hbBtuKwaoSypY
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.