The AI-Powered Geometry Dash Layout Critic. Using Multimodal LLMs to judge Composition, Representation, and Player Movement in gameplay layouts.
OvertureGPT is an AI that watches Geometry Dash layouts and gives harsh, honest feedback.
It doesn't just "look" at the screen—it listens to the music and watches the player's movement simultaneously. It can tell if your gameplay is too slow for a drop, if your clicks are off-sync, or if your structuring is unreadable.
Why? Because getting good feedback on a layout is hard. OvertureGPT gives you an instant second opinion based on competitive gameplay theory.
- 🔊 True Audio-Visual Sync: It uses a specialized AI (Video-LLaMA 2) that can "hear" the beat and "see" the gameplay at the same time.
- 🌊 Intensity Matching: Detects if you are using 1x speed during a dubstep drop (a major crime).
- 🤖 No Decoration Bias: It ignores art and effects, focusing 100% on the raw layout.
- 📄 Theory-Driven: Critiques are based on real principles of gameplay theory and layout design.
OvertureGPT is a Multimodal RAG (Retrieval-Augmented Generation) Pipeline running on a quantized Large Language Model.
- Model:
Video-LLaMA 2.1 (7B-AV) - Architecture: Qwen-2.5 Backbone + ImageBind (Audio Encoder) + SigLIP (Visual Encoder).
- Optimization: 8-bit quantization (BNB) to run on T4 GPUs (Colab Free Tier).
- Input: Raw
.mp4video files (processed as interleaved audio/visual tokens).
- Ingestion: The user uploads a layout video.
- Perception: The
ImageBindencoder extracts audio spectrograms, whileSigLIPextracts visual frames. - Binding: These modalities are projected into a shared latent space (the AI "connects" the sound of a beat to the visual of a jump).
- Inference: The LLM applies "Gameplay Theory" rules to the multimodal context to generate a critique.
You don't need a powerful PC. The AI runs on Google's cloud.
- Open the Colab Notebook.
- Run the Setup cells to install the AI.
- Upload your layout video (
layout.mp4). - Run the Critique cell and get roasted.
- Short Clips Only: Currently optimized for 10-30 second clips (due to context window limits).
- Lyric Deafness: It hears energy and beats, but it cannot transcribe lyrics.
- Hallucinations: Occasionally, it might invent "blind jumps" that aren't there. It's an assistant, not a god.
Built with ❤️ by ATXM.