Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added benchmarks/multimodal_bytedance/media/image1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image15.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image17.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image18.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image23.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image4.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/multimodal_bytedance/media/image9.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 103 additions & 0 deletions benchmarks/multimodal_bytedance/multimodal-comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Multimodal AI Models Comparison: ByteDance vs Imagen vs Kling

## Introduction

This repository presents a qualitative evaluation of leading multimodal AI models across text-to-image, text-to-video, and image-to-video generation tasks. After gaining early access to ByteDance's Seedream models, we conducted a comprehensive comparison against Google's Imagen/Veo systems and Kling AI to assess their relative capabilities in visual content generation.

## Models Evaluated

### ByteDance
- **Text-to-Image**: Seedream 3.0 (ep-20250819034957-6n8nr) - [Learn more](https://seed.bytedance.com/en/tech/seedream3_0)
- **Text-to-Video**: Seedance 1.0 Lite (ep-20250814014514-7bx9s) - [Learn more](https://seed.bytedance.com/en/seedance)
- **Image-to-Video**: Seedance 1.0 Lite i2v (ep-20250811221031-d6tz7) - [Learn more](https://seed.bytedance.com/en/seedance)

### Google Imagen/Veo
- **Text-to-Image**: Imagen 4 (imagen-4.0-generate-001) - [Learn more](https://deepmind.google/models/imagen/)
- **Text-to-Video**: Veo 3 (veo-3.0-generate-001) - [Learn more](https://deepmind.google/models/veo/)
- **Image-to-Video**: Veo 3 Fast (veo-3.0-fast-generate-preview) on Google Vertex AI - [Learn more](https://deepmind.google/models/veo/)

### Kling AI
- **Text-to-Image**: Kling AI Image 2.1 - [Learn more](https://app.klingai.com/global/)

---

## Text-to-Image Generation

### Prompt 1: Gothic Cathedral at Twilight
**Prompt**: *A dramatic wide-angle view of a gothic cathedral at twilight, its towering spires silhouetted against a deep violet sky with streaks of pink and gold from the setting sun. Stained-glass windows glow from interior candlelight, while intricate stone gargoyles cast long, ornate shadows across cobblestone piazzas. Moody ambiance, photorealistic, 8K detail, cinematic lighting.*

| Kling | Imagen | ByteDance |
|-------|--------|-----------|
| ![Kling](media/image16.png) | ![Imagen](media/image2.jpeg) | ![ByteDance](media/image13.jpg) |

### Prompt 2: Cyberpunk Rooftop Garden
**Prompt**: *A neon-soaked rooftop garden in a cyberpunk skyline: bioluminescent plants glowing azure, silver robotic bees hovering, holographic planters floating above sleek concrete planters. Background of towering skyscrapers with vibrant animated advertisements. Moody blue-purple palette, ultra-detailed, cinematic depth of field, photorealism.*

| Kling | Imagen | ByteDance |
|-------|--------|-----------|
| ![Kling](media/image12.png) | ![Imagen](media/image3.jpeg) | ![ByteDance](media/image21.jpg) |

### Prompt 3: Southeast Asian Night Market
**Prompt**: *Bustling Southeast Asian night market street food stall in rain: vendor preparing glowing skewers under tuk-tuk lamps, steam rising around neon signs, puddles reflecting vibrant lanterns, rain droplets frozen mid-air. Rich textures, cinematic framing, photorealistic, moody atmosphere.*

| Kling | Imagen | ByteDance |
|-------|--------|-----------|
| ![Kling](media/image15.png) | ![Imagen](media/image8.jpeg) | ![ByteDance](media/image11.jpg) |

### Prompt 4: Steampunk Airship Cabin
**Prompt**: *Lavish steampunk airship cabin: polished brass gauges, swirling gears, portholes showing drifting clouds, plush leather armchairs, warm amber lamplight, steam gently rising. Rugged yet elegant, hyper-real detail, cinematic lighting, ultra-high resolution.*

| Kling | Imagen | ByteDance |
|-------|--------|-----------|
| ![Kling](media/image18.png) | ![Imagen](media/image6.jpeg) | ![ByteDance](media/image4.jpg) |

### Prompt 5: Underwater Palace
**Prompt**: *Grand underwater palace hall in a lost city: towering coral-hued pillars encrusted with pearlescent barnacles, glowing bioluminescent whales swimming past stained-glass mosaics depicting ancient ocean myths, shafts of light filtering from the surface. Ethereal, dreamlike, photorealistic fantasy style.*

| Kling | Imagen | ByteDance |
|-------|--------|-----------|
| ![Kling](media/image5.png) | ![Imagen](media/image19.jpeg) | ![ByteDance](media/image1.jpg) |

### Prompt 6: Autumn Cobblestone Lane
**Prompt**: *Quiet cobblestone lane lined with maple trees in full autumn blaze: crimson, amber, and gold leaves creating a canopy. A vintage bicycle with wicker basket parked against an old stone wall, soft late-afternoon sunlight casting long golden shadows, ultra-detailed, tranquil mood.*

| Kling | Imagen | ByteDance |
|-------|--------|-----------|
| ![Kling](media/image7.png) | ![Imagen](media/image10.jpeg) | ![ByteDance](media/image9.jpg) |

---

## Text-to-Video Generation

### Test Scenarios

| Theme | Veo 3 | ByteDance Seedance |
|-------|-------|-------------------|
| **Desert Nomad at Sunrise**<br>*A desert nomad emerges from a dune as dawn breaks. Camera starts with a wide establishing shot of golden dunes, then slowly dollies forward to follow the lone figure in flowing robes.* | [View Video](https://drive.google.com/file/d/1ZEIcFwYUgiPn83VKD76x1zaGc3U7rAxk/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1arirdgiK-iqTRrXXDuhFU4qL9vTVDw3w/view?usp=drive_link) |
| **Rainy Cybercity Chase**<br>*High-speed hovercar chase down a rain-slick neon-lit cybercity street. Camera angles shift rapidly: low side-tracking, overhead drone shot, then panning as car weaves through traffic.* | [View Video](https://drive.google.com/file/d/1KG_QRyNPppVt2pXUhvBUzndvBqtywTA8/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/16b446Rbq0J3GxOsm8unyC-hUCvsxh1KF/view?usp=drive_link) |
| **Forest Stream Awakening**<br>*A forest stream comes to life at dawn. Close-up of dew-laden moss and ferns as the camera slowly pans upward to reveal a gentle stream flowing past ancient trees.* | [View Video](https://drive.google.com/file/d/12Qr6uUzraUlHawJ7vBWsAUJTQ_NkygPC/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1hQISUfl3KRhaX-sFewcPNzGnXrcc9KJ0/view?usp=drive_link) |
| **Coastal Storm Seaspray**<br>*A stormy coastline with waves crashing on jagged rocks. Start with wide shot of brooding grey sky and turbulent sea, then slow zoom to focus on droplets frozen mid-air in spray.* | [View Video](https://drive.google.com/file/d/17lN4pt8Y5uK5lvP36zcswQfRJP9qkCij/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1QuoptE7fCmBz3kuU6YaMDM6C4Ib5Qg7Z/view?usp=drive_link) |
| **Festival Lantern Release**<br>*Nighttime river festival where hundreds of glowing paper lanterns float upward into the starry sky. Camera pans up and sweeps across lanterns rising in unison.* | [View Video](https://drive.google.com/file/d/1UuCmtg7ti56V1cOAgnTM4PCKAP7FIxwv/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1SpFFy45D6dHjQUHYZlvVERZY95zk1249/view?usp=drive_link) |
| **Mechanical Clockwork Assembly**<br>*Inside a massive clockwork factory: camera tracks through rows of brass gears and pistons moving rhythmically. Steam hisses and small sparks fly as robotic arms assemble intricate mechanical components.* | - | [View Video](https://drive.google.com/file/d/1RDU1Yn8BgbM_nE7ObncG9cpARTUSBonW/view?usp=drive_link) |
| **City Ballet at Twilight**<br>*A lone ballerina dances on an empty city rooftop at twilight. Camera starts wide to show skyline, then dollies forward as she fluidly spins and leaps.* | - | [View Video](https://drive.google.com/file/d/1gnIuy2tVfNc5pUQiUBui9Uzo5p0z9hkC/view?usp=drive_link) |
| **Mountain Waterfall Panorama**<br>*Panorama of a majestic waterfall cascading down rocky cliffs into a mist-filled gorge. Camera arcs from high above then swoops down closer to water.* | - | [View Video](https://drive.google.com/file/d/14QApR_p9rAXkkhYbe5l0h4pBfcEbgmOn/view?usp=drive_link) |
| **Alien Biome Exploration**<br>*An explorer steps onto an alien forest planet. Camera first shows wide exotic canopy, then dolly in on explorer touching glowing foliage.* | - | [View Video](https://drive.google.com/file/d/1G5mm9XfZXZ8c2oWd0fCcexmvIRsWlDHO/view?usp=drive_link) |
| **Historical Market Morning**<br>*A bustling medieval market at dawn. Camera meanders through stalls with merchants arranging wares, townsfolk greeting each other.* | - | [View Video](https://drive.google.com/file/d/1L2-NEt5RVVwC2Hdys8bagXNPpeys91rX/view?usp=drive_link) |

---

## Image-to-Video Generation

| Input Image | Veo 3 Output | ByteDance Seedance Output |
|-------------|--------------|---------------------------|
| ![Highway](media/image23.jpg) | [View Video](https://drive.google.com/file/d/1G-TnTdxwzJiTn3_56QGnzrDN0d_rfEf6/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1CsEL6huAhRhRgfII_hU9rrxiWzFOZVvy/view?usp=drive_link) |
| ![Waterfall](media/image22.jpg) | [View Video](https://drive.google.com/file/d/1da9cRYDvMfBlW0EabUI2xgFYoouphb77/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1SsrSA-Q0575xnrGVCZVdFi36j1e4n0qC/view?usp=drive_link) |
| ![Horse](media/image14.jpg) | [View Video](https://drive.google.com/file/d/1o9U0BizcD3FmuaIeUnOy8mJSeYj2b5l1/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1U037F1mQKPoTy3GmYPdvQV8kx_kMeayp/view?usp=drive_link) |
| ![Dancer](media/image17.jpg) | [View Video](https://drive.google.com/file/d/1yzjtbiMnT-XRKHD0E13nBJnAe4syXE8h/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1VLAM-DEV7K6zWHdF1EtJ4LNlNzCgnoFN/view?usp=drive_link) |
| ![Train](media/image20.jpg) | [View Video](https://drive.google.com/file/d/1akkCLqYz8MluI5Q4rK1RzH9f1SiSY-K-/view?usp=drive_link) | [View Video](https://drive.google.com/file/d/1pJgo5o254S32wnwYAX0WwVPTgVV3iHFU/view?usp=drive_link) |

---

## License

This comparison is provided for educational and research purposes. All model outputs are subject to their respective providers' terms of service and usage policies.