Skip to content

feat: add Phantom Motion V8.0 engine skill#415

Open
pixelxzen wants to merge 1 commit intonexu-io:mainfrom
pixelxzen:feat/phantom-motion-skill
Open

feat: add Phantom Motion V8.0 engine skill#415
pixelxzen wants to merge 1 commit intonexu-io:mainfrom
pixelxzen:feat/phantom-motion-skill

Conversation

@pixelxzen
Copy link
Copy Markdown

新增:Phantom Motion 终极数字媒体引擎 V8.0

这是一套专为顶级科学、非遗文化、数据叙事打造的全维度代码动画渲染引擎(HTML 转 MP4)。

核心能力包含:

  • ⚛️ 全维数字博物馆:集成 GLTFLoader 与全息紫苏网格(Hologram Mode)渲染。
  • 📈 高级数据图表:D3.js + GSAP 完美实现帧同步的矢量平滑曲线。
  • ♟️ 东方矩阵引擎:纯 SVG 原生代码实时推演围棋与易经阵法。
  • 🎙️ 多模态叙事管线:全面接入 Gemini 3.1 Flash TTS(Charon 史诗男声/Erinome 知性女声)与 Minimax 纯享交响乐 BGM。
  • 🛡️ 安全隔离防线:自带严格的脱敏逻辑与 15 字极简排版美学。

请官方 Review。所有大体积测试资产与核心交互逻辑均已完备。
给OpenDesign助助力!Yeah!

@lefarcen lefarcen added the feature New feature or enhancement label May 4, 2026
@lefarcen
Copy link
Copy Markdown
Contributor

lefarcen commented May 4, 2026

Hi @pixelxzen! 🎉
感谢提 PR — Phantom Motion V8.0 引擎听起来是个非常有野心的多媒体渲染系统!全息博物馆 + 数据图表 + 多模态叙事管线这个组合很有意思。

我会跑一遍深度 review,24 小时内反馈。

感谢让 open-design 变得更好!
— open-design team

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26b24dbe2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# 5. FFmpeg 合成
print(f"🚀 正在进行音画合成...")
tts_path = Path(root_dir) / "audio" / "merged_tts.wav"
bgm_path = Path(root_dir) / "audio" / "bgm.mp3"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle silent-BGM fallback when composing MP4

render-mp4.py always loads audio/bgm.mp3, but bgm-generate.py writes audio/bgm.wav when both music APIs are unavailable or fail. In that common fallback path (e.g., missing API keys), FFmpeg receives a nonexistent input and the final export aborts, so the advertised “silent audio safety” path cannot actually produce a video.

Useful? React with 👍 / 👎.

}

API_URL = "https://generativelanguage.googleapis.com/v1alpha/models/gemini-3.1-flash-tts-preview:generateContent"
SAMPLE_RATE = 24000 # Gemini TTS 输出采样率
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Align WAV sample rate with requested PCM encoding

This script requests audioEncoding: "PCM_48000" but writes the returned PCM into WAV files with SAMPLE_RATE = 24000. If the API honors the requested 48k PCM (which this code explicitly asks for), the WAV headers will be wrong, causing audio to play at the wrong speed and making timings.json durations drift from real playback, which breaks subtitle sync and render timing.

Useful? React with 👍 / 👎.

--subtitles subtitles.json \
--voice {male|female} \
--style {documentary|educational|passionate} \
--speed {1.0|1.2|1.3|1.5} \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove unsupported --speed from documented TTS command

The primary workflow instructs users/agents to call tts-generate.py with --speed, but the parser in scripts/tts-generate.py does not define that option. Running the documented command will terminate with argparse “unrecognized arguments: --speed”, so the default Phase 2 command in this skill is not executable as written.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@mrcfps mrcfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few blocking issues in the new Phantom Motion pipeline that will prevent the documented generation/render path from working reliably. The comments below are focused on the TTS request shape and MP4/audio assembly behavior.

Generated by Looper 0.0.0-dev · runner=reviewer · agent=opencode

"contents": [{"role": "user", "parts": [{"text": prompt}]}],
"generationConfig": {
"responseModalities": ["AUDIO"],
"audioConfig": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request body uses generationConfig.audioConfig.voice/audioEncoding, but Gemini's TTS generateContent schema expects voice selection under generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName (the bundled tests/xingji/generate_tts.py uses that shape too). Because this script posts the payload directly and then calls raise_for_status(), the documented Phase 2 TTS command can fail with a 400 before producing merged_tts.wav or timings.json, which blocks the rest of the pipeline. Please switch this block to the supported speechConfig structure and, if a specific sample rate is needed, derive it from the returned audio/metadata rather than sending the unsupported audioEncoding field.

Comment on lines +131 to +134
"-filter_complex", "[1:a]volume=1.0[v1]; [2:a]volume=0.4[v2]; [v1][v2]amix=inputs=2:duration=first[a]",
"-map", "0:v", "-map", "[a]",
"-c:v", "libx264", "-pix_fmt", "yuv420p", "-crf", "18",
"-shortest",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final mux drops the timeline offsets that the generator just calculated. timings.json and the skill docs define the video duration as 3s intro + TTS + 3s outro, but this filter starts TTS at timestamp 0 and then amix=duration=first plus -shortest makes ffmpeg stop at the raw TTS length. The exported MP4 therefore loses the 3-second pre-roll alignment and truncates the outro frames. Please pad/delay the narration by 3000 ms (for example with adelay/apad or an explicit silent track), mix for the full computed duration, and avoid -shortest truncating the video below total_animation_duration.

f.write(audio_bytes)
print(f"✅ BGM 已就位: {bgm_path.name}")
else:
bgm_path = out / "bgm.wav"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advertised safe fallback writes bgm.wav, but the rest of the changed pipeline is hard-coded to consume audio/bgm.mp3 (assemble.py usage and render-mp4.py both point at that name). On machines without valid Lyria/MiniMax credentials—the exact case this fallback is meant to handle—the BGM step succeeds while the subsequent assemble/render step fails because the expected MP3 does not exist. Please either always emit a compatible bgm.mp3 fallback (for example via ffmpeg) or propagate the actual fallback filename through the generated metadata and have assemble/render read that value.

Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pixelxzen, I am following up on the review I promised earlier.

mrcfps already covered the TTS schema, mux timing, and BGM fallback blockers, so I will not repeat those. I found a few non-overlapping blockers that can affect security and deterministic rendering; please address these along with the existing mrcfps comments, then I can take another pass.

The direction is ambitious and interesting. These fixes should make the Phantom Motion pipeline much safer to run and easier to validate.

)

# 构建播放器脚本
player = PLAYER_JS.replace("__TIMINGS_JSON__", json.dumps(timings, ensure_ascii=False))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: timings comes from generated/user-controlled JSON, but this embeds json.dumps(timings) directly into a <script> replacement. A value containing </script><script>... can break out before later DOM textContent handling helps. Please put the data in a non-executable JSON script tag and parse it, or at minimum escape </script, U+2028, and U+2029 before embedding.

output_path = os.path.abspath(a.output)
temp_dir = Path(output_path).parent / "frames"
if temp_dir.exists():
subprocess.run(["rm", "-rf", str(temp_dir)])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: --output controls Path(output_path).parent, and this deletes a generic sibling frames directory with rm -rf. If someone points output at an existing project directory, unrelated files can be removed. Please use tempfile.TemporaryDirectory() or a unique skill-owned temp directory, and remove only that exact path.

for (let i = 0; i < totalFrames; i++) {
const currentTime = i / fps;
await page.evaluate((t) => {
if (window.renderOneFrame) window.renderOneFrame(t);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The recorder only calls window.renderOneFrame(t), but the generated examples/docs register GSAP/Hyperframes timelines under window.__timelines instead of that hook. Most pages will therefore record static or wall-clock-dependent frames. Please seek the registered timelines to t before each screenshot, or fail loudly when no deterministic frame driver exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants