Skip to content

somebox/ai-feedback-loops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Loop Generator

Tests

A CLI tool that creates AI-generated animations by iteratively transforming images using AI models via OpenRouter. Give it an image and a "mode" (a preset prompt), and it runs multiple passes of image generation, passing each output as the next input. The result is a sequence of progressively transformed frames, compiled into a video or GIF.

Usage:

python src/image_loop.py --image photo.jpg --mode evolve --frames 10 --model nano-banana

I developed this as a way to research and compare different image models and to identify their biases and limitations. As images are progressively transformed you can observe preferences in race, culture and politics, discover guardrails and see artifacts and distortions that emerge.

The project was inspired by nano-banana-loop but uses OpenRouter to access various image generation models (instead of fal.ai), and adds a gallery, new modes and features.

Flow

Standard mode:

  1. Input: provide an image and a preset mode (or custom prompt)
  2. Iteration: the image is passed to the model with the transformation prompt
  3. Feedback Loop: Each generated frame becomes the input for the next frame
  4. Output: The sequence of frames is compiled into a video or GIF

Prompt loop mode (alternative):

  1. Input: provide an image
  2. Describe: a vision model describes the image as text
  3. Render: the text description is rendered as a new image (no image reference)
  4. Feedback Loop: The rendered image is described and re-rendered
  5. Output: Frames and descriptions compiled into video/GIF

Each run creates a timestamped directory with all frames, metadata, and generated animations. The gallery displays all runs in a web interface.

Gallery

This project includes a simple web gallery to view all generated runs:

Gallery

# Start the gallery server (default port 8080)
python src/gallery.py

# Custom port and output directory
python src/gallery.py --port 3000 --output-dir /path/to/output

Note: With uv, use uv run src/gallery.py instead of python src/gallery.py.

The gallery displays:

  • Run cards with thumbnails of first and last frames
  • Filtering by model and mode
  • Modal viewer with frame-by-frame navigation
  • Statistics including cost, time, and frame details
  • Playback controls for animating through frames

Open http://localhost:8080 in your browser to view the gallery.

Quick Start

Requirements

  • Python 3.11+
  • uv (recommended) or pip with virtual environment
  • ffmpeg (for video generation)
  • OpenRouter API key

Setup

  1. Clone this repository
  2. Install dependencies (choose one method below)
  3. Add your OpenRouter API key to secrets.yaml:
    openrouter_api_key: sk-or-v1-your-key-here
    Or set the OPENROUTER_API_KEY environment variable.

Installation Methods

Option 1: Using uv (Recommended)

uv automatically manages dependencies and Python versions:

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# No additional setup needed - dependencies are managed automatically
# Use "uv run" instead of "python" for all commands (e.g., "uv run src/image_loop.py")

Option 2: Using pip with virtual environment

# Create a virtual environment
python3 -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Note: The gallery (src/gallery.py) can run standalone but still requires dependencies for parsing run metadata. All scripts share the same dependency set.

Basic Usage

# Transform an image with a preset mode
python src/image_loop.py --image photo.jpg --mode evolve --frames 10

# Use a custom prompt
python src/image_loop.py --image photo.jpg --mode custom --prompt "Age this person by 5 years"

# Specify model and output size
python src/image_loop.py --image photo.jpg --mode album-cover --model flux-pro --size square

# Continue an existing run with more frames
python src/image_loop.py --continue output/run_flux-pro_evolve_1218_1234_abcd --frames 10

# List available options
python src/image_loop.py --list-modes
python src/image_loop.py --list-models

Note: Make sure your virtual environment is activated when using standard Python. With uv, use uv run instead of python (e.g., uv run src/image_loop.py).

Examples

Here's a few samples. See examples for more.

East Village → Bizarre

Transform a street scene by progressively adding unexpected elements.

python src/image_loop.py --image east-village.jpg --mode bizarre --model flux-pro --frames 15

East Village Bizarre

https://github.com/somebox/ai-feedback-loops/raw/refs/heads/main/examples/east-village-bizarre.mp4


Cats → Political Right

Push an image toward a political aesthetic (using Riverflow model).

python src/image_loop.py --image cats.png --mode politic-right --model riverflow --frames 10 --size square

Rightwing Cats

https://github.com/somebox/ai-feedback-loops/raw/refs/heads/main/examples/rightwing-cats.mp4


Prompt Loop Mode

Prompt loop is an alternative generation mode that creates a "telephone game" effect with images. Instead of passing the image directly to the model with a transformation prompt, it:

  1. Describes the current image using a vision model (image → text)
  2. Renders a new image from that description alone (text → image)
  3. Repeats with the rendered image as the new input

This creates drift and transformation through the lossy process of describing and re-rendering, rather than through explicit prompts.

# Basic prompt loop
python src/image_loop.py --image photo.jpg --prompt-loop --frames 10

# With a specific describe style
python src/image_loop.py --image photo.jpg --prompt-loop --describe-mode artistic --frames 10

# Custom description prompt
python src/image_loop.py --image photo.jpg --prompt-loop --describe-mode custom \
  --describe-prompt "Describe only the colors and shapes" --frames 5

Prompt Loop Options

Option Description
--prompt-loop Enable prompt loop mode
--describe-mode How to describe images: detailed (default), artistic, simple, technical, narrative, emotional, or custom
--describe-prompt Custom describe prompt (required when describe-mode is custom)

Describe Modes

Mode Description
detailed Full description including subject, composition, colors, lighting, mood
artistic Art director perspective focusing on style and technique
simple Brief, minimal description
technical Technical analysis of visual elements
narrative Story-telling perspective
emotional Focus on mood and atmosphere

Output Structure

Prompt loop runs include a descriptions/ folder with the text generated for each frame:

output/run_nano-banana_promptloop_0124_1530_abc1/
├── images/
│   ├── frame_000.png  (input)
│   ├── frame_001.png  (rendered from description)
│   └── ...
├── descriptions/
│   ├── frame_001.txt  (description of frame_000)
│   ├── frame_002.txt  (description of frame_001)
│   └── ...
├── run.json
└── animation.mp4

Note: Prompt loop requires a multimodal model that supports both vision (image→text) and image generation (text→image). The default is nano-banana (Gemini). Models like flux-pro only support image generation and won't work with prompt loop.


Command-Line Options

Core Options

Option Description
--image, -i Input image path (required for new runs)
--mode, -m Transformation mode (see Available Modes) or custom
--prompt, -p Custom prompt (required when mode is custom)
--prompt-loop Enable prompt loop mode (see Prompt Loop Mode)
--frames, -n Number of frames to generate (default: 10)
--model Model to use (default: flux-pro, see Available Models)
--size, -s Output size: auto, preserve, custom, or preset (default: auto, see Output Sizes)
--continue, -c Continue from an existing run directory
--output, -o Output directory (default: output)

Advanced Options

Option Description
--temperature, -t Generation temperature 0.0-2.0 (default: 0.7, Gemini models only)
--top-p Top-p sampling 0.0-1.0 (default: 0.9, Gemini models only)
--seed Random seed for reproducibility (Flux and Gemini models)
--fps Video/GIF frame rate (default: 1)
--format, -f Output format: mp4, gif, or both (default: mp4)
--verbose, -v Show detailed API responses
--list-modes List all available transformation modes
--list-models List available image generation models from OpenRouter
--describe-mode Prompt loop: how to describe images (default: detailed)
--describe-prompt Prompt loop: custom describe prompt

Note: Parameter support varies by model. Use --list-models to see which parameters each model supports.

Available Modes

Camera movements: up, down, left, right, rotate-left, rotate-right, zoom-in, zoom-out

Time: future, past, next

Style: dramatic, peaceful, powerful, vintage, futuristic, minimalist, wes-anderson, album-cover

Modifications: funny, bizarre, highlight, corrections, realistic, graffiti, improve

Scene: nature, urban, crowded, empty

Other: evolve, cooler, sexy, makeup, politic-left, politic-right, opposite

Use --mode custom --prompt "your prompt" for custom transformations.

Available Models

Run --list-models to fetch available image generation models from OpenRouter with current pricing:

python src/image_loop.py --list-models

This command shows:

  • All available image models from OpenRouter with pricing
  • Verification status (✓/✗) for configured shortcuts
  • Whether each shortcut points to a valid model

Configured shortcuts:

Shortcut Full Model ID
flux-pro black-forest-labs/flux.2-pro
seedream bytedance-seed/seedream-4.5
nano-banana google/gemini-2.5-flash-image
nano-banana-pro google/gemini-3-pro-image-preview
riverflow sourceful/riverflow-v2-standard-preview

You can use any full OpenRouter model ID directly with --model.

Note: If a model disappears or changes, you'll see an error like ❌ API Error (400): Model not found during generation. Update settings.yaml with a valid model ID, or use --list-models to find available alternatives.

Output Sizes

Size Dimensions Description
auto (default) varies Picks the closest preset to input aspect ratio
preserve varies Scales to fit max 1280px, preserves exact aspect ratio
custom --width --height Explicit dimensions (requires both flags)
landscape 1024×768 4:3 aspect ratio
square 1024×1024 1:1 aspect ratio
portrait 768×1024 3:4 aspect ratio
wide 1280×720 16:9 aspect ratio
tall 720×1280 9:16 aspect ratio

The tool warns when significant cropping will occur due to aspect ratio mismatch.

Output Structure

Each run creates a timestamped directory:

output/run_flux-pro_evolve_1218_1234_abcd/
├── images/
│   ├── frame_000.png  (initial image)
│   ├── frame_001.png
│   ├── frame_002.png
│   └── ...
├── animation.mp4      (when --format mp4 or both)
├── animation.gif      (when --format gif or both)
└── run.json

The run.json file contains comprehensive logging:

  • Summary: Quick overview with status, total cost, and time
  • Config: All generation parameters (model, prompt, size, etc.)
  • Stats: Cumulative statistics across all sessions
  • Sessions: History of generation runs including continuations
  • Frames: Per-frame details with timing, file sizes, token usage, and API responses

Example run.json structure:

{
  "summary": {
    "created": "2026-01-15T09:12:14",
    "model": "google/gemini-2.5-flash-image",
    "mode": "future",
    "total_frames": 10,
    "total_cost": "$0.39",
    "total_time": "147.3s",
    "status": "completed"
  },
  "config": { ... },
  "stats": { ... },
  "sessions": [ ... ],
  "frames": [ ... ]
}

Additional Tools

Text-to-Image Generation

Generate a single image from text (without the loop):

python src/generate_from_text.py "A futuristic cityscape at sunset" --model flux-pro --output city.png

Outputs from text-to-image generation will also be displayed in the gallery.

Collage Generator

Generate a grid collage from a completed run:

# 3x3 collage (default medium size: 1600x1200)
python src/collage.py output/run_flux-pro_evolve_1218_1234_abcd --grid 3x3

# 4x4 large collage
python src/collage.py output/run_flux-pro_evolve_1218_1234_abcd --grid 4x4 --size large

# Custom output path
python src/collage.py output/run_flux-pro_evolve_1218_1234_abcd --grid 3x3 -o my_collage.png
Option Description
--grid, -g Grid size (e.g., 3x3, 4x4, 5x3)
--size, -s Output size: small (800x600), medium (1600x1200), large (2400x1800)
--output, -o Output file path (default: collage_NxM.png in run folder)

The collage evenly distributes frames across the grid, always including the first and last frame.

Configuration

Settings are managed in settings.yaml:

  • Models: Model shortcuts and full IDs
  • Prompts: Transformation mode prompts
  • Defaults: Default model, frame count, etc.
  • Sizes: Size preset definitions
  • API: Timeout and other API settings

You can modify settings.yaml to add new modes, change defaults, or configure additional models.

Development

Project Structure

src/
├── image_loop.py          # Main CLI entry point
├── gallery.py             # Web gallery server
├── generate_from_text.py  # Text-to-image tool
├── collage.py             # Collage generator
└── imageloop/             # Core package
    ├── api.py             # OpenRouter API client
    ├── cli.py             # CLI argument parsing and commands
    ├── job.py             # Frame management and output generation
    ├── runlog.py          # Run logging and persistence
    ├── settings.py        # Settings loading and resolution
    ├── sizing.py          # Image sizing and aspect ratio handling
    └── storage.py         # Image I/O and API key management
tests/                      # Pytest test suite
settings.yaml              # Configuration file
secrets.yaml               # API keys (git-ignored)
output/                    # Generated runs (git-ignored)

Running Tests

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run only tests that don't require API calls
pytest -m "not live_api"

Code Organization

The codebase is modularized into focused modules:

  • imageloop.api: Handles all OpenRouter API interactions
  • imageloop.cli: Command-line interface and orchestration
  • imageloop.job: Frame finding and video/GIF generation
  • imageloop.runlog: Run state persistence and reporting
  • imageloop.settings: Configuration loading and resolution
  • imageloop.sizing: Image dimension calculations and resizing
  • imageloop.storage: Image file I/O and data URI conversion

Contributing

PRs welcome. The codebase uses inline dependencies (PEP 723) which work with uv, but a requirements.txt is also provided for standard pip workflows.

When contributing:

  • Follow the existing modular structure
  • Add tests for new functionality
  • Update settings.yaml if adding new modes or models
  • Keep the README up to date with any new features

About

AI image feedback loops, make videos, collages

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages