diff --git a/benchmarks/multimodal_bytedance/media/image1.jpg b/benchmarks/multimodal_bytedance/media/image1.jpg new file mode 100644 index 00000000..8cc06738 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image1.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image10.jpeg b/benchmarks/multimodal_bytedance/media/image10.jpeg new file mode 100644 index 00000000..6c0dbf29 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image10.jpeg differ diff --git a/benchmarks/multimodal_bytedance/media/image11.jpg b/benchmarks/multimodal_bytedance/media/image11.jpg new file mode 100644 index 00000000..f60deba7 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image11.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image12.png b/benchmarks/multimodal_bytedance/media/image12.png new file mode 100644 index 00000000..e2726c98 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image12.png differ diff --git a/benchmarks/multimodal_bytedance/media/image13.jpg b/benchmarks/multimodal_bytedance/media/image13.jpg new file mode 100644 index 00000000..8c1ec85d Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image13.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image14.jpg b/benchmarks/multimodal_bytedance/media/image14.jpg new file mode 100644 index 00000000..379af0c6 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image14.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image15.png b/benchmarks/multimodal_bytedance/media/image15.png new file mode 100644 index 00000000..51fa082a Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image15.png differ diff --git a/benchmarks/multimodal_bytedance/media/image16.png b/benchmarks/multimodal_bytedance/media/image16.png new file mode 100644 index 00000000..ea2b2160 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image16.png differ diff --git a/benchmarks/multimodal_bytedance/media/image17.jpg b/benchmarks/multimodal_bytedance/media/image17.jpg new file mode 100644 index 00000000..45132eb7 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image17.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image18.png b/benchmarks/multimodal_bytedance/media/image18.png new file mode 100644 index 00000000..9bb28c26 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image18.png differ diff --git a/benchmarks/multimodal_bytedance/media/image19.jpeg b/benchmarks/multimodal_bytedance/media/image19.jpeg new file mode 100644 index 00000000..daa47dc0 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image19.jpeg differ diff --git a/benchmarks/multimodal_bytedance/media/image2.jpeg b/benchmarks/multimodal_bytedance/media/image2.jpeg new file mode 100644 index 00000000..6ddd7fed Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image2.jpeg differ diff --git a/benchmarks/multimodal_bytedance/media/image20.jpg b/benchmarks/multimodal_bytedance/media/image20.jpg new file mode 100644 index 00000000..254bc20c Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image20.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image21.jpg b/benchmarks/multimodal_bytedance/media/image21.jpg new file mode 100644 index 00000000..7757b55b Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image21.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image22.jpg b/benchmarks/multimodal_bytedance/media/image22.jpg new file mode 100644 index 00000000..4fa51277 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image22.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image23.jpg b/benchmarks/multimodal_bytedance/media/image23.jpg new file mode 100644 index 00000000..0ef27b56 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image23.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image3.jpeg b/benchmarks/multimodal_bytedance/media/image3.jpeg new file mode 100644 index 00000000..e8e54d9b Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image3.jpeg differ diff --git a/benchmarks/multimodal_bytedance/media/image4.jpg b/benchmarks/multimodal_bytedance/media/image4.jpg new file mode 100644 index 00000000..8aae1604 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image4.jpg differ diff --git a/benchmarks/multimodal_bytedance/media/image5.png b/benchmarks/multimodal_bytedance/media/image5.png new file mode 100644 index 00000000..03e30b5f Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image5.png differ diff --git a/benchmarks/multimodal_bytedance/media/image6.jpeg b/benchmarks/multimodal_bytedance/media/image6.jpeg new file mode 100644 index 00000000..06438073 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image6.jpeg differ diff --git a/benchmarks/multimodal_bytedance/media/image7.png b/benchmarks/multimodal_bytedance/media/image7.png new file mode 100644 index 00000000..c352dca6 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image7.png differ diff --git a/benchmarks/multimodal_bytedance/media/image8.jpeg b/benchmarks/multimodal_bytedance/media/image8.jpeg new file mode 100644 index 00000000..d10d7e15 Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image8.jpeg differ diff --git a/benchmarks/multimodal_bytedance/media/image9.jpg b/benchmarks/multimodal_bytedance/media/image9.jpg new file mode 100644 index 00000000..367076ef Binary files /dev/null and b/benchmarks/multimodal_bytedance/media/image9.jpg differ diff --git a/benchmarks/multimodal_bytedance/multimodal-comparison.md b/benchmarks/multimodal_bytedance/multimodal-comparison.md new file mode 100644 index 00000000..745baaba --- /dev/null +++ b/benchmarks/multimodal_bytedance/multimodal-comparison.md @@ -0,0 +1,103 @@ +# Multimodal AI Models Comparison: ByteDance vs Imagen vs Kling + +## Introduction + +This repository presents a qualitative evaluation of leading multimodal AI models across text-to-image, text-to-video, and image-to-video generation tasks. After gaining early access to ByteDance's Seedream models, we conducted a comprehensive comparison against Google's Imagen/Veo systems and Kling AI to assess their relative capabilities in visual content generation. + +## Models Evaluated + +### ByteDance +- **Text-to-Image**: Seedream 3.0 (ep-20250819034957-6n8nr) - [Learn more](https://seed.bytedance.com/en/tech/seedream3_0) +- **Text-to-Video**: Seedance 1.0 Lite (ep-20250814014514-7bx9s) - [Learn more](https://seed.bytedance.com/en/seedance) +- **Image-to-Video**: Seedance 1.0 Lite i2v (ep-20250811221031-d6tz7) - [Learn more](https://seed.bytedance.com/en/seedance) + +### Google Imagen/Veo +- **Text-to-Image**: Imagen 4 (imagen-4.0-generate-001) - [Learn more](https://deepmind.google/models/imagen/) +- **Text-to-Video**: Veo 3 (veo-3.0-generate-001) - [Learn more](https://deepmind.google/models/veo/) +- **Image-to-Video**: Veo 3 Fast (veo-3.0-fast-generate-preview) on Google Vertex AI - [Learn more](https://deepmind.google/models/veo/) + +### Kling AI +- **Text-to-Image**: Kling AI Image 2.1 - [Learn more](https://app.klingai.com/global/) + +--- + +## Text-to-Image Generation + +### Prompt 1: Gothic Cathedral at Twilight +**Prompt**: *A dramatic wide-angle view of a gothic cathedral at twilight, its towering spires silhouetted against a deep violet sky with streaks of pink and gold from the setting sun. Stained-glass windows glow from interior candlelight, while intricate stone gargoyles cast long, ornate shadows across cobblestone piazzas. Moody ambiance, photorealistic, 8K detail, cinematic lighting.* + +| Kling | Imagen | ByteDance | +|-------|--------|-----------| +| ![Kling](media/image16.png) | ![Imagen](media/image2.jpeg) | ![ByteDance](media/image13.jpg) | + +### Prompt 2: Cyberpunk Rooftop Garden +**Prompt**: *A neon-soaked rooftop garden in a cyberpunk skyline: bioluminescent plants glowing azure, silver robotic bees hovering, holographic planters floating above sleek concrete planters. Background of towering skyscrapers with vibrant animated advertisements. Moody blue-purple palette, ultra-detailed, cinematic depth of field, photorealism.* + +| Kling | Imagen | ByteDance | +|-------|--------|-----------| +| ![Kling](media/image12.png) | ![Imagen](media/image3.jpeg) | ![ByteDance](media/image21.jpg) | + +### Prompt 3: Southeast Asian Night Market +**Prompt**: *Bustling Southeast Asian night market street food stall in rain: vendor preparing glowing skewers under tuk-tuk lamps, steam rising around neon signs, puddles reflecting vibrant lanterns, rain droplets frozen mid-air. Rich textures, cinematic framing, photorealistic, moody atmosphere.* + +| Kling | Imagen | ByteDance | +|-------|--------|-----------| +| ![Kling](media/image15.png) | ![Imagen](media/image8.jpeg) | ![ByteDance](media/image11.jpg) | + +### Prompt 4: Steampunk Airship Cabin +**Prompt**: *Lavish steampunk airship cabin: polished brass gauges, swirling gears, portholes showing drifting clouds, plush leather armchairs, warm amber lamplight, steam gently rising. Rugged yet elegant, hyper-real detail, cinematic lighting, ultra-high resolution.* + +| Kling | Imagen | ByteDance | +|-------|--------|-----------| +| ![Kling](media/image18.png) | ![Imagen](media/image6.jpeg) | ![ByteDance](media/image4.jpg) | + +### Prompt 5: Underwater Palace +**Prompt**: *Grand underwater palace hall in a lost city: towering coral-hued pillars encrusted with pearlescent barnacles, glowing bioluminescent whales swimming past stained-glass mosaics depicting ancient ocean myths, shafts of light filtering from the surface. Ethereal, dreamlike, photorealistic fantasy style.* + +| Kling | Imagen | ByteDance | +|-------|--------|-----------| +| ![Kling](media/image5.png) | ![Imagen](media/image19.jpeg) | ![ByteDance](media/image1.jpg) | + +### Prompt 6: Autumn Cobblestone Lane +**Prompt**: *Quiet cobblestone lane lined with maple trees in full autumn blaze: crimson, amber, and gold leaves creating a canopy. A vintage bicycle with wicker basket parked against an old stone wall, soft late-afternoon sunlight casting long golden shadows, ultra-detailed, tranquil mood.* + +| Kling | Imagen | ByteDance | +|-------|--------|-----------| +| ![Kling](media/image7.png) | ![Imagen](media/image10.jpeg) | ![ByteDance](media/image9.jpg) | + +--- + +## Text-to-Video Generation + +### Test Scenarios + +| Theme | Veo 3 | ByteDance Seedance | +|-------|-------|-------------------| +| **Desert Nomad at Sunrise**
*A desert nomad emerges from a dune as dawn breaks. Camera starts with a wide establishing shot of golden dunes, then slowly dollies forward to follow the lone figure in flowing robes.* | [View Video](https://drive.google.com/file/d/1ZEIcFwYUgiPn83VKD76x1zaGc3U7rAxk/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1arirdgiK-iqTRrXXDuhFU4qL9vTVDw3w/view?usp=drive_link) | +| **Rainy Cybercity Chase**
*High-speed hovercar chase down a rain-slick neon-lit cybercity street. Camera angles shift rapidly: low side-tracking, overhead drone shot, then panning as car weaves through traffic.* | [View Video](https://drive.google.com/file/d/1KG_QRyNPppVt2pXUhvBUzndvBqtywTA8/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/16b446Rbq0J3GxOsm8unyC-hUCvsxh1KF/view?usp=drive_link) | +| **Forest Stream Awakening**
*A forest stream comes to life at dawn. Close-up of dew-laden moss and ferns as the camera slowly pans upward to reveal a gentle stream flowing past ancient trees.* | [View Video](https://drive.google.com/file/d/12Qr6uUzraUlHawJ7vBWsAUJTQ_NkygPC/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1hQISUfl3KRhaX-sFewcPNzGnXrcc9KJ0/view?usp=drive_link) | +| **Coastal Storm Seaspray**
*A stormy coastline with waves crashing on jagged rocks. Start with wide shot of brooding grey sky and turbulent sea, then slow zoom to focus on droplets frozen mid-air in spray.* | [View Video](https://drive.google.com/file/d/17lN4pt8Y5uK5lvP36zcswQfRJP9qkCij/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1QuoptE7fCmBz3kuU6YaMDM6C4Ib5Qg7Z/view?usp=drive_link) | +| **Festival Lantern Release**
*Nighttime river festival where hundreds of glowing paper lanterns float upward into the starry sky. Camera pans up and sweeps across lanterns rising in unison.* | [View Video](https://drive.google.com/file/d/1UuCmtg7ti56V1cOAgnTM4PCKAP7FIxwv/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1SpFFy45D6dHjQUHYZlvVERZY95zk1249/view?usp=drive_link) | +| **Mechanical Clockwork Assembly**
*Inside a massive clockwork factory: camera tracks through rows of brass gears and pistons moving rhythmically. Steam hisses and small sparks fly as robotic arms assemble intricate mechanical components.* | - | [View Video](https://drive.google.com/file/d/1RDU1Yn8BgbM_nE7ObncG9cpARTUSBonW/view?usp=drive_link) | +| **City Ballet at Twilight**
*A lone ballerina dances on an empty city rooftop at twilight. Camera starts wide to show skyline, then dollies forward as she fluidly spins and leaps.* | - | [View Video](https://drive.google.com/file/d/1gnIuy2tVfNc5pUQiUBui9Uzo5p0z9hkC/view?usp=drive_link) | +| **Mountain Waterfall Panorama**
*Panorama of a majestic waterfall cascading down rocky cliffs into a mist-filled gorge. Camera arcs from high above then swoops down closer to water.* | - | [View Video](https://drive.google.com/file/d/14QApR_p9rAXkkhYbe5l0h4pBfcEbgmOn/view?usp=drive_link) | +| **Alien Biome Exploration**
*An explorer steps onto an alien forest planet. Camera first shows wide exotic canopy, then dolly in on explorer touching glowing foliage.* | - | [View Video](https://drive.google.com/file/d/1G5mm9XfZXZ8c2oWd0fCcexmvIRsWlDHO/view?usp=drive_link) | +| **Historical Market Morning**
*A bustling medieval market at dawn. Camera meanders through stalls with merchants arranging wares, townsfolk greeting each other.* | - | [View Video](https://drive.google.com/file/d/1L2-NEt5RVVwC2Hdys8bagXNPpeys91rX/view?usp=drive_link) | + +--- + +## Image-to-Video Generation + +| Input Image | Veo 3 Output | ByteDance Seedance Output | +|-------------|--------------|---------------------------| +| ![Highway](media/image23.jpg) | [View Video](https://drive.google.com/file/d/1G-TnTdxwzJiTn3_56QGnzrDN0d_rfEf6/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1CsEL6huAhRhRgfII_hU9rrxiWzFOZVvy/view?usp=drive_link) | +| ![Waterfall](media/image22.jpg) | [View Video](https://drive.google.com/file/d/1da9cRYDvMfBlW0EabUI2xgFYoouphb77/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1SsrSA-Q0575xnrGVCZVdFi36j1e4n0qC/view?usp=drive_link) | +| ![Horse](media/image14.jpg) | [View Video](https://drive.google.com/file/d/1o9U0BizcD3FmuaIeUnOy8mJSeYj2b5l1/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1U037F1mQKPoTy3GmYPdvQV8kx_kMeayp/view?usp=drive_link) | +| ![Dancer](media/image17.jpg) | [View Video](https://drive.google.com/file/d/1yzjtbiMnT-XRKHD0E13nBJnAe4syXE8h/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1VLAM-DEV7K6zWHdF1EtJ4LNlNzCgnoFN/view?usp=drive_link) | +| ![Train](media/image20.jpg) | [View Video](https://drive.google.com/file/d/1akkCLqYz8MluI5Q4rK1RzH9f1SiSY-K-/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1pJgo5o254S32wnwYAX0WwVPTgVV3iHFU/view?usp=drive_link) | + +--- + +## License + +This comparison is provided for educational and research purposes. All model outputs are subject to their respective providers' terms of service and usage policies. \ No newline at end of file