feat: add Phantom Motion V8.0 engine skill#415
feat: add Phantom Motion V8.0 engine skill#415pixelxzen wants to merge 1 commit intonexu-io:mainfrom
Conversation
|
Hi @pixelxzen! 🎉 我会跑一遍深度 review,24 小时内反馈。 感谢让 open-design 变得更好! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26b24dbe2a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # 5. FFmpeg 合成 | ||
| print(f"🚀 正在进行音画合成...") | ||
| tts_path = Path(root_dir) / "audio" / "merged_tts.wav" | ||
| bgm_path = Path(root_dir) / "audio" / "bgm.mp3" |
There was a problem hiding this comment.
Handle silent-BGM fallback when composing MP4
render-mp4.py always loads audio/bgm.mp3, but bgm-generate.py writes audio/bgm.wav when both music APIs are unavailable or fail. In that common fallback path (e.g., missing API keys), FFmpeg receives a nonexistent input and the final export aborts, so the advertised “silent audio safety” path cannot actually produce a video.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| API_URL = "https://generativelanguage.googleapis.com/v1alpha/models/gemini-3.1-flash-tts-preview:generateContent" | ||
| SAMPLE_RATE = 24000 # Gemini TTS 输出采样率 |
There was a problem hiding this comment.
Align WAV sample rate with requested PCM encoding
This script requests audioEncoding: "PCM_48000" but writes the returned PCM into WAV files with SAMPLE_RATE = 24000. If the API honors the requested 48k PCM (which this code explicitly asks for), the WAV headers will be wrong, causing audio to play at the wrong speed and making timings.json durations drift from real playback, which breaks subtitle sync and render timing.
Useful? React with 👍 / 👎.
| --subtitles subtitles.json \ | ||
| --voice {male|female} \ | ||
| --style {documentary|educational|passionate} \ | ||
| --speed {1.0|1.2|1.3|1.5} \ |
There was a problem hiding this comment.
Remove unsupported
--speed from documented TTS command
The primary workflow instructs users/agents to call tts-generate.py with --speed, but the parser in scripts/tts-generate.py does not define that option. Running the documented command will terminate with argparse “unrecognized arguments: --speed”, so the default Phase 2 command in this skill is not executable as written.
Useful? React with 👍 / 👎.
mrcfps
left a comment
There was a problem hiding this comment.
I found a few blocking issues in the new Phantom Motion pipeline that will prevent the documented generation/render path from working reliably. The comments below are focused on the TTS request shape and MP4/audio assembly behavior.
Generated by Looper 0.0.0-dev · runner=reviewer · agent=opencode| "contents": [{"role": "user", "parts": [{"text": prompt}]}], | ||
| "generationConfig": { | ||
| "responseModalities": ["AUDIO"], | ||
| "audioConfig": { |
There was a problem hiding this comment.
This request body uses generationConfig.audioConfig.voice/audioEncoding, but Gemini's TTS generateContent schema expects voice selection under generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName (the bundled tests/xingji/generate_tts.py uses that shape too). Because this script posts the payload directly and then calls raise_for_status(), the documented Phase 2 TTS command can fail with a 400 before producing merged_tts.wav or timings.json, which blocks the rest of the pipeline. Please switch this block to the supported speechConfig structure and, if a specific sample rate is needed, derive it from the returned audio/metadata rather than sending the unsupported audioEncoding field.
| "-filter_complex", "[1:a]volume=1.0[v1]; [2:a]volume=0.4[v2]; [v1][v2]amix=inputs=2:duration=first[a]", | ||
| "-map", "0:v", "-map", "[a]", | ||
| "-c:v", "libx264", "-pix_fmt", "yuv420p", "-crf", "18", | ||
| "-shortest", |
There was a problem hiding this comment.
The final mux drops the timeline offsets that the generator just calculated. timings.json and the skill docs define the video duration as 3s intro + TTS + 3s outro, but this filter starts TTS at timestamp 0 and then amix=duration=first plus -shortest makes ffmpeg stop at the raw TTS length. The exported MP4 therefore loses the 3-second pre-roll alignment and truncates the outro frames. Please pad/delay the narration by 3000 ms (for example with adelay/apad or an explicit silent track), mix for the full computed duration, and avoid -shortest truncating the video below total_animation_duration.
| f.write(audio_bytes) | ||
| print(f"✅ BGM 已就位: {bgm_path.name}") | ||
| else: | ||
| bgm_path = out / "bgm.wav" |
There was a problem hiding this comment.
The advertised safe fallback writes bgm.wav, but the rest of the changed pipeline is hard-coded to consume audio/bgm.mp3 (assemble.py usage and render-mp4.py both point at that name). On machines without valid Lyria/MiniMax credentials—the exact case this fallback is meant to handle—the BGM step succeeds while the subsequent assemble/render step fails because the expected MP3 does not exist. Please either always emit a compatible bgm.mp3 fallback (for example via ffmpeg) or propagate the actual fallback filename through the generated metadata and have assemble/render read that value.
lefarcen
left a comment
There was a problem hiding this comment.
Hi @pixelxzen, I am following up on the review I promised earlier.
mrcfps already covered the TTS schema, mux timing, and BGM fallback blockers, so I will not repeat those. I found a few non-overlapping blockers that can affect security and deterministic rendering; please address these along with the existing mrcfps comments, then I can take another pass.
The direction is ambitious and interesting. These fixes should make the Phantom Motion pipeline much safer to run and easier to validate.
| ) | ||
|
|
||
| # 构建播放器脚本 | ||
| player = PLAYER_JS.replace("__TIMINGS_JSON__", json.dumps(timings, ensure_ascii=False)) |
There was a problem hiding this comment.
P1: timings comes from generated/user-controlled JSON, but this embeds json.dumps(timings) directly into a <script> replacement. A value containing </script><script>... can break out before later DOM textContent handling helps. Please put the data in a non-executable JSON script tag and parse it, or at minimum escape </script, U+2028, and U+2029 before embedding.
| output_path = os.path.abspath(a.output) | ||
| temp_dir = Path(output_path).parent / "frames" | ||
| if temp_dir.exists(): | ||
| subprocess.run(["rm", "-rf", str(temp_dir)]) |
There was a problem hiding this comment.
P1: --output controls Path(output_path).parent, and this deletes a generic sibling frames directory with rm -rf. If someone points output at an existing project directory, unrelated files can be removed. Please use tempfile.TemporaryDirectory() or a unique skill-owned temp directory, and remove only that exact path.
| for (let i = 0; i < totalFrames; i++) { | ||
| const currentTime = i / fps; | ||
| await page.evaluate((t) => { | ||
| if (window.renderOneFrame) window.renderOneFrame(t); |
There was a problem hiding this comment.
P2: The recorder only calls window.renderOneFrame(t), but the generated examples/docs register GSAP/Hyperframes timelines under window.__timelines instead of that hook. Most pages will therefore record static or wall-clock-dependent frames. Please seek the registered timelines to t before each screenshot, or fail loudly when no deterministic frame driver exists.
新增:Phantom Motion 终极数字媒体引擎 V8.0
这是一套专为顶级科学、非遗文化、数据叙事打造的全维度代码动画渲染引擎(HTML 转 MP4)。
核心能力包含:
请官方 Review。所有大体积测试资产与核心交互逻辑均已完备。
给OpenDesign助助力!Yeah!