Skip to content

Conversation

nmdimas
Copy link

@nmdimas nmdimas commented Sep 22, 2025

Add support for Gemini Flash 2.5 image generation via LiteLLM Proxy

🚀 Description

This PR adds support for Gemini Flash 2.5 (Nano Banana) image generation through LiteLLM Proxy integration. This enhancement allows generating images as part of regular chat conversations, where messages can now contain both text and images simultaneously.

🎯 Motivation

  • Keep PHP ecosystem competitive: Python libraries already support this functionality, and PHP shouldn't lag behind
  • Minimal changes, maximum impact: This implementation requires minimal code changes while unlocking powerful new capabilities
  • Future-ready: Gemini Flash 2.5's image generation represents the next evolution in multimodal AI interactions
  • Developer demand: Growing need for seamless image generation within chat workflows

📋 Changes Made

✅ Core Features

  • Added CreateResponseChoiceImage typed class following project patterns
  • Extended ChatCompletionResponseMessage with images property
  • Implemented proper type safety with scalar typing
  • Added comprehensive PHPStan type annotations
  • Maintained backward compatibility with existing chat functionality

🔧 Technical Implementation

  • New CreateResponseChoiceImage class following FunctionCall/ChoiceAudio pattern
  • Type-safe image handling with proper scalar typing enforcement
  • ArrayAccessible trait for backward compatibility
  • Fakeable trait for comprehensive testing support
  • PHPStan level 9 compliant type definitions

📚 Documentation

  • Added code examples for typed image generation usage
  • Updated README with Gemini Flash 2.5 integration guide
  • Added comprehensive inline documentation and PHPStan types

🎨 Usage Example

use OpenAI;

$client = OpenAI::client($apiKey);

// Generate images with text in a single request
$response = $client->chat()->create([
    'model' => 'gemini-2.0-flash-exp',
    'messages' => [
        [
            'role' => 'user', 
            'content' => 'Generate a beautiful sunset over mountains and describe it'
        ]
    ],
    // LiteLLM Proxy configuration
    'base_url' => 'http://your-litellm-proxy.com/v1',
]);

// Access both text and generated images (now with type safety!)
$text = $response->choices[0]->message->content;
$images = $response->choices[0]->message->images ?? [];

// Process generated images with typed objects
$savedImages = [];
foreach ($images as $image) {
    // $image is now CreateResponseChoiceImage with full type safety
    $imageUrl = $image->imageUrl['url'];
    $imageDetail = $image->imageUrl['detail']; // Access detail level
    $imageIndex = $image->index; // Image index in response
    $imageType = $image->type; // Image type identifier
    
    if (str_starts_with($imageUrl, 'data:image/')) {
        // Handle base64 encoded images
        $savedImages[] = $this->saveBase64Image($imageUrl, $image->index);
    } else {
        // Handle URL-based images
        $savedImages[] = $this->downloadAndSaveImage($imageUrl, $image->index);
    }
}

echo "Generated text: " . $text . "\n";
echo "Generated " . count($savedImages) . " images\n";

// Type-safe access to image properties
foreach ($images as $image) {
    echo "Image {$image->index}: {$image->imageUrl['url']} (detail: {$image->imageUrl['detail']})\n";
}

🏗️ Class Structure

New CreateResponseChoiceImage Class

final class CreateResponseChoiceImage implements ResponseContract
{
    public function __construct(
        public readonly array $imageUrl,    // ['url' => string, 'detail' => string]
        public readonly int $index,         // Image position in response
        public readonly string $type,       // Image type identifier
    ) {}

    public static function from(array $attributes): self;
    public function toArray(): array;
}

Updated ChatCompletionResponseMessage

class ChatCompletionResponseMessage
{
    // ... existing properties
    
    /**
     * Generated images in the response
     * 
     * @var array<int, CreateResponseChoiceImage>|null
     */
    public readonly ?array $images;
}

🔗 Related Documentation

🧪 Testing

Manual Testing

  • Tested with LiteLLM Proxy setup
  • Verified typed object creation and access
  • Verified base64 image handling with proper indexing
  • Verified URL image handling with detail levels
  • Confirmed backward compatibility with existing chat functionality
  • Tested error handling for malformed responses
  • Validated PHPStan type checking

Unit Tests

  • Added tests for CreateResponseChoiceImage class creation
  • Added tests for typed property access
  • Added tests for from() factory method
  • Added tests for toArray() method
  • Added tests for mixed content responses
  • Added tests for backward compatibility
  • Added edge case testing (empty images array, missing properties)
# Run tests
./vendor/bin/pest
# All tests passing ✅

# Run static analysis
./vendor/bin/phpstan analyse
# Level 9 compliance ✅

🔄 Response Structure

The API now returns structured image data:

{
    "choices": [
        {
            "message": {
                "content": "Here's a beautiful sunset over mountains...",
                "images": [
                    {
                        "image_url": {
                            "url": "data:image/png;base64,iVBORw0KGgoAAAANS...",
                            "detail": "high"
                        },
                        "index": 0,
                        "type": "image"
                    },
                    {
                        "image_url": {
                            "url": "https://example.com/generated-image.jpg",
                            "detail": "low"
                        },
                        "index": 1,
                        "type": "image"
                    }
                ]
            }
        }
    ]
}

🎯 Type Safety Benefits

Before (Arrays - Error Prone)

// No IDE support, runtime errors possible
$imageUrl = $response->choices[0]->message->images[0]['image_url']['url'];
$detail = $response->choices[0]->message->images[0]['image_url']['detail']; // Could fail

After (Typed Objects - Safe & Predictable)

// Full IDE support, compile-time error checking
$image = $response->choices[0]->message->images[0]; // CreateResponseChoiceImage
$imageUrl = $image->imageUrl['url'];                // string (guaranteed)
$detail = $image->imageUrl['detail'];              // string (guaranteed)
$index = $image->index;                            // int (guaranteed)

🌟 Benefits

  1. 🎨 Multimodal Capabilities: Generate images directly within chat conversations
  2. ⚡ Performance: Single API call for both text and image generation
  3. 🔧 Type Safety: Full IDE support and compile-time error checking
  4. 🚀 Innovation: Leverages cutting-edge Gemini Flash 2.5 capabilities
  5. 🔗 Integration: Seamless LiteLLM Proxy compatibility
  6. 📊 Structured Data: Access to image metadata (index, detail, type)

🏗️ Implementation Details

File Changes

src/Responses/Chat/CreateResponseChoiceImage.php      # New typed class
src/Responses/Chat/ChatCompletionResponseMessage.php # Added images property
tests/Unit/Chat/CreateResponseChoiceImageTest.php    # Comprehensive tests
README.md                                             # Updated documentation

Type Annotations

/**
 * @phpstan-type CreateResponseChoiceImageType array{
 *     image_url: array{url: string, detail: string}, 
 *     index: int, 
 *     type: string
 * }
 */

🔍 Code Quality

  • Follows existing code style and conventions
  • Proper scalar typing as requested in review feedback
  • Typed classes following FunctionCall/ChoiceAudio pattern
  • PHPStan level 9 compliant with comprehensive type annotations
  • PSR-12 coding standard compliance
  • ArrayAccessible and Fakeable traits for consistency
  • Comprehensive error handling and edge case coverage

🚦 Architecture Compliance

This implementation strictly follows the project's established patterns:

  • ResponseContract implementation like other response classes
  • ArrayAccessible trait for backward compatibility
  • Fakeable trait for comprehensive testing
  • Static from() factory method for object creation
  • toArray() method for serialization
  • Readonly properties for immutability
  • Proper type hints and PHPStan annotations

🔄 Updates Based on Review

v2.0 - Typed Classes Implementation

  • ✅ Addressed feedback: "We prefer typed classes to enforce scalar typing"
  • ✅ Followed patterns: Used same structure as FunctionCall/ChoiceAudio
  • ✅ Enhanced type safety: Replaced arrays with CreateResponseChoiceImage class
  • ✅ Improved IDE support: Full autocompletion and error checking

🤝 Community Impact

This feature brings PHP developers the same cutting-edge capabilities available in Python libraries, ensuring the PHP ecosystem remains competitive in the rapidly evolving AI landscape. The implementation maintains the library's high standards for type safety and architectural consistency.


Ready for review! 🎉

This implementation demonstrates adherence to project standards while opening up exciting new possibilities for PHP developers working with multimodal AI.

Copy link
Collaborator

@iBotPeaches iBotPeaches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have some CI issues. At the same token, I don't think a generic array fits the bill of this project.

If you are trying to extend, it would be best to have a CreateResponseImage class to represent the image data. At present you are typing a custom array which we try to avoid for typed class properties.

Copy link
Collaborator

@iBotPeaches iBotPeaches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build is passing now, but remember we don't really like passing arrays around. We prefer typed class to enforce scalar typing.

See the pattern we do with FunctionCall or ChoiceAudio right above your changes? Thats what we need to continue on.

@nmdimas
Copy link
Author

nmdimas commented Sep 25, 2025

@iBotPeaches Thank you for the feedback! You're absolutely right about preferring typed classes over arrays.

I've updated the implementation to follow the same pattern as FunctionCall and ChoiceAudio classes. The changes include:

  • Created a new CreateResponseChoiceImage class for type safety
  • Updated ChatCompletionResponseMessage to use CreateResponseChoiceImage[] instead of raw arrays
  • Added proper type hints and documentation

The updated code now follows the project's established patterns. Ready for another review! 🚀

@nmdimas nmdimas requested a review from iBotPeaches September 29, 2025 13:19
@iBotPeaches
Copy link
Collaborator

Okay cool - everything passes. I'll take this for a run tonight to confirm functionality.

@iBotPeaches
Copy link
Collaborator

Sorry for delay. Still trying to setup LiteLLM w/ Gemini and never done this before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants