Skip to content

imagen comes from 'image' generator or imagine. Uses Gemini AI to play with images

Notifications You must be signed in to change notification settings

thanosa75/imagen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Gemini Image Processing API

A production-ready REST API for processing images using Google's Gemini AI model. This service provides asynchronous job processing with Redis-backed queuing, supporting various image analysis tasks like OCR, object detection, scene understanding, and more.

🌟 Features

  • Asynchronous Processing: Submit jobs and retrieve results when ready
  • Multiple Analysis Types: Pre-configured prompts for different image analysis and generation tasks
  • Image Generation: Generate images from text descriptions using state-of-the-art models
  • Flexible Templating: Customizable prompts with variable substitution
  • Redis Queue: Reliable job queuing and status tracking
  • Docker Support: Easy deployment with Docker Compose
  • Production Ready: Comprehensive error handling, logging, and graceful shutdown

πŸ“‹ Table of Contents

πŸ”§ Prerequisites

  • Docker and Docker Compose (recommended)
  • Node.js 20+ (for local development)
  • Redis 7+ (included in Docker setup)
  • Google Gemini API Key (Get one here)

πŸ“¦ Installation

Using Docker (Recommended)

  1. Clone the repository:

    git clone <repository-url>
    cd imagen
  2. Run the setup script:

    chmod +x setup.sh
    ./setup.sh
  3. Configure your API key: Edit the .env file and add your Gemini API key:

    GEMINI_API_KEY=your_actual_api_key_here
  4. Start the services:

    docker-compose up --build

The API will be available at http://localhost:3000

Local Development

  1. Install dependencies:

    npm install
  2. Set up environment:

    cp .env.example .env
    # Edit .env and add your GEMINI_API_KEY
  3. Start Redis (if not using Docker):

    redis-server
  4. Start the API server:

    npm start
  5. Start the worker (in a separate terminal):

    node src/workers/jobWorker.js

βš™οΈ Configuration

Environment Variables

Configure the application by editing the .env file:

# Server Configuration
PORT=3000
NODE_ENV=development
LOG_LEVEL=info

# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-pro
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image

# Redis Configuration
REDIS_URL=redis://localhost:6379
REDIS_PASSWORD=
REDIS_DB=0

# Job Processing Configuration
MAX_CONCURRENT_JOBS=5
JOB_TIMEOUT_MS=300000
JOB_TTL=86400

# File Upload Configuration
MAX_FILE_SIZE_MB=10
ALLOWED_FILE_TYPES=image/jpeg,image/png,image/webp,image/gif

# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100

πŸš€ Usage

Interactive Documentation

The API documentation is available in interactive Swagger UI format at:

http://localhost:3000/doc

This interface allows you to explore endpoints, view schemas, and test API calls directly from your browser.

API Endpoints

1. List Available Prompts

GET /prompts/show

Retrieve a list of all available prompt templates.

Response (200 OK):

{
  "prompts": [
    {
      "id": "describe_image",
      "name": "Image Description",
      "description": "Provides a detailed description of an image.",
      "requiredVariables": [],
      "supportedOutcomes": ["text"]
    },
    ...
  ]
}

2. Submit a Job

POST /jobs

Submit an image for processing with a specific prompt.

Request:

  • Content-Type: multipart/form-data
  • Body:
    • image (file, required): Image file (JPEG, PNG, WEBP)
    • promptId (string, required): ID of the prompt to use
    • expectedOutcome (string, required): Either "text" or "image"
    • variables (JSON object, optional): Variables for prompt template

Response (201 Created):

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "createdAt": "2026-01-17T12:00:00.000Z"
}

2. Get Job Status

GET /jobs/:id

Retrieve the status and results of a job.

Response (200 OK):

Pending/Processing:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:05.000Z",
  "completedAt": null
}

Completed:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:15.000Z",
  "completedAt": "2026-01-17T12:00:15.000Z",
  "result": {
    "text": "A detailed description of the image..."
  },
  "metadata": {
    "processingTime": 8500,
    "promptUsed": "Analyze this image and provide a detailed description...",
    "modelVersion": "gemini-1.5-pro"
  }
}

Failed:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:10.000Z",
  "completedAt": "2026-01-17T12:00:10.000Z",
  "error": {
    "message": "API rate limit exceeded",
    "code": "RATE_LIMIT_ERROR"
  }
}

3. Get Image Result

GET /jobs/:id/image

Download the processed image (for jobs with expectedOutcome: "image").

Response (200 OK):

  • Content-Type: image/jpeg (or appropriate image MIME type)
  • Body: Binary image data

Note: Currently, Gemini 1.5 Pro returns text descriptions rather than generated images. This endpoint is prepared for future image generation capabilities.

Available Prompts

The API includes 8 pre-configured prompts in data/prompts.json:

1. describe_image

Provides a detailed description of an image.

Variables:

  • detail_level (optional, default: "detailed"): Level of detail
  • focus_area (optional, default: "all visual elements, composition, colors, and subjects"): What to focus on

Example:

{
  "variables": {
    "detail_level": "brief",
    "focus_area": "the main subject only"
  }
}

2. ocr

Extracts all visible text from an image.

Variables:

  • format (optional, default: "plain text"): Output format
  • layout (optional, default: "original layout and structure"): How to preserve layout

Example:

{
  "variables": {
    "format": "JSON with text and coordinates",
    "layout": "line-by-line structure"
  }
}

3. extract_colors

Identifies and extracts dominant colors from an image.

Variables:

  • count (optional, default: "5"): Number of colors to extract
  • format (optional, default: "hex color codes with color names"): Color format
  • additional_info (optional, default: "the approximate percentage of each color in the image"): Extra information

Example:

{
  "variables": {
    "count": "3",
    "format": "RGB values",
    "additional_info": "color mood and palette description"
  }
}

4. object_detection

Detects and identifies objects in an image.

Variables:

  • object_type (optional, default: "objects and items"): Type of objects to detect
  • details (optional, default: "the name, approximate location, and size"): Details to include
  • confidence_instruction (optional, default: "Include your confidence level for each detection."): Confidence handling

Example:

{
  "variables": {
    "object_type": "people and faces",
    "details": "count, positions, and any visible attributes",
    "confidence_instruction": "Only include detections with high confidence."
  }
}

5. image_classification

Classifies an image into categories.

Variables:

  • aspects (optional, default: "subject matter, style, mood, and context"): Classification aspects
  • output_format (optional, default: "a list of categories with confidence scores"): Output format

Example:

{
  "variables": {
    "aspects": "artistic style and genre only",
    "output_format": "the top 3 most relevant categories"
  }
}

6. scene_understanding

Provides comprehensive understanding of a scene.

Variables:

  • elements (optional, default: "the setting, objects, people, activities, and atmosphere"): Scene elements
  • context (optional, default: "the likely context, time of day, location type, and any notable details"): Context information

Example:

{
  "variables": {
    "elements": "the environment and weather conditions",
    "context": "outdoor/indoor setting and time of day"
  }
}

7. compare_images

Compares and analyzes similarities and differences between images.

Variables:

  • comparison_aspects (optional, default: "visual similarities, differences, composition, and style"): What to compare
  • output_focus (optional, default: "key differences and notable similarities"): Output focus

Example:

{
  "variables": {
    "comparison_aspects": "color palette and lighting only",
    "output_focus": "technical differences in photography"
  }
}

8. accessibility_description

Creates detailed alt-text descriptions for accessibility.

Variables:

  • elements (optional, default: "all important visual information, text, and context"): Elements to include
  • style (optional, default: "clear, concise, and descriptive"): Description style
  • max_length (optional, default: "250 words"): Maximum length

Example:

{
  "variables": {
    "elements": "essential information only",
    "style": "brief and factual",
    "max_length": "100 words"
  }
}

πŸ“ Examples

Example 1: List Available Prompts

curl http://localhost:3000/prompts/show

Example 2: Basic Image Description

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/photo.jpg" \
  -F "promptId=describe_image" \
  -F "expectedOutcome=text"

Response:

{
  "jobId": "abc123-def456-ghi789",
  "status": "pending",
  "createdAt": "2026-01-17T12:00:00.000Z"
}

Example 2: OCR with Custom Variables

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/document.png" \
  -F "promptId=ocr" \
  -F "expectedOutcome=text" \
  -F 'variables={"format":"JSON with text and coordinates","layout":"line-by-line structure"}'

Example 3: Check Job Status

curl http://localhost:3000/jobs/abc123-def456-ghi789

Response:

{
  "jobId": "abc123-def456-ghi789",
  "status": "completed",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:15.000Z",
  "completedAt": "2026-01-17T12:00:15.000Z",
  "result": {
    "text": "The image shows a sunset over a calm ocean with vibrant orange and pink hues in the sky..."
  },
  "metadata": {
    "processingTime": 8500,
    "promptUsed": "Analyze this image and provide a detailed description. Focus on all visual elements, composition, colors, and subjects.",
    "modelVersion": "gemini-1.5-pro"
  }
}

Example 4: Color Extraction

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/artwork.jpg" \
  -F "promptId=extract_colors" \
  -F "expectedOutcome=text" \
  -F 'variables={"count":"3","format":"hex color codes with color names"}'

Example 5: Object Detection

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/street_scene.jpg" \
  -F "promptId=object_detection" \
  -F "expectedOutcome=text" \
  -F 'variables={"object_type":"vehicles and pedestrians","details":"count and approximate positions"}'

Example 6: Accessibility Description

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/infographic.png" \
  -F "promptId=accessibility_description" \
  -F "expectedOutcome=text" \
  -F 'variables={"max_length":"150 words","style":"clear and concise"}'

Example 7: Text-to-Image Generation

Generate an image from a text description. Note that the image file input is optional for this mode.

curl -X POST http://localhost:3000/jobs \
  -F "promptId=generate_image" \
  -F "expectedOutcome=image" \
  -F 'variables={"prompt":"generate a cyberpunk city"}'

Note: This feature uses the configured image generation model (default: gemini-2.5-flash-image).

πŸ—οΈ Architecture

System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ HTTP
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Express API Server          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Controllers & Routes       β”‚  β”‚
β”‚  β”‚   - Job Submission           β”‚  β”‚
β”‚  β”‚   - Status Retrieval         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚    Redis     β”‚
    β”‚  - Job Queue β”‚
    β”‚  - Job Data  β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Job Worker Process           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  1. Dequeue Job              β”‚   β”‚
β”‚  β”‚  2. Load Prompt Template     β”‚   β”‚
β”‚  β”‚  3. Process with Gemini API  β”‚   β”‚
β”‚  β”‚  4. Store Results            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Directory Structure

imagen/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ prompts.json          # Prompt templates
β”‚   β”œβ”€β”€ uploads/              # Uploaded images
β”‚   └── results/              # Generated results
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ redis.js          # Redis connection
β”‚   β”‚   └── logger.js         # Logging configuration
β”‚   β”œβ”€β”€ controllers/
β”‚   β”‚   └── jobController.js  # Request handlers
β”‚   β”œβ”€β”€ middleware/
β”‚   β”‚   β”œβ”€β”€ errorHandler.js   # Error handling
β”‚   β”‚   └── validator.js      # Input validation
β”‚   β”œβ”€β”€ repositories/
β”‚   β”‚   └── jobRepository.js  # Job data access
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   └── jobRoutes.js      # API routes
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ geminiService.js  # Gemini API integration
β”‚   β”‚   β”œβ”€β”€ promptService.js  # Prompt management
β”‚   β”‚   └── queueService.js   # Queue operations
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── logger.js         # Logging utilities
β”‚   β”œβ”€β”€ workers/
β”‚   β”‚   └── jobWorker.js      # Background job processor
β”‚   β”œβ”€β”€ app.js                # Express app setup
β”‚   └── server.js             # Server entry point
β”œβ”€β”€ docker-compose.yml        # Docker services
β”œβ”€β”€ Dockerfile                # Container definition
β”œβ”€β”€ package.json              # Dependencies
β”œβ”€β”€ setup.sh                  # Setup script
└── .env.example              # Environment template

Job Lifecycle

  1. Submission: Client uploads image via POST /jobs
  2. Validation: Request validated, image stored
  3. Queuing: Job added to Redis queue
  4. Processing: Worker picks up job, calls Gemini API
  5. Completion: Results stored in Redis
  6. Retrieval: Client polls GET /jobs/:id for results

πŸ” Troubleshooting

Common Issues

1. "GEMINI_API_KEY is not set"

Problem: The Gemini API key is missing or invalid.

Solution:

# Edit .env file
nano .env

# Add your API key
GEMINI_API_KEY=your_actual_api_key_here

# Restart services
docker-compose restart

2. "Connection refused" to Redis

Problem: Redis is not running or not accessible.

Solution:

# Check if Redis container is running
docker-compose ps

# Restart Redis
docker-compose restart redis

# Check Redis logs
docker-compose logs redis

3. Jobs stuck in "pending" status

Problem: Worker is not running or crashed.

Solution:

# Check worker logs
docker-compose logs worker

# Restart worker
docker-compose restart worker

# Check for errors in logs
docker-compose logs -f worker

4. "File too large" error

Problem: Image exceeds maximum file size.

Solution:

# Edit .env to increase limit
MAX_FILE_SIZE_MB=20

# Restart API
docker-compose restart api

5. "Invalid file type" error

Problem: Unsupported image format.

Solution: Ensure your image is in one of these formats:

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • WEBP (.webp)
  • GIF (.gif)

6. Gemini API rate limit errors

Problem: Too many requests to Gemini API.

Solution:

  • Wait a few minutes before retrying
  • Reduce MAX_CONCURRENT_JOBS in .env
  • Check your Gemini API quota

Debugging

Enable Debug Logging

# Edit .env
LOG_LEVEL=debug

# Restart services
docker-compose restart

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f api
docker-compose logs -f worker
docker-compose logs -f redis

Check Redis Data

# Connect to Redis CLI
docker-compose exec redis redis-cli

# List all jobs
KEYS job:*

# Get job details
HGETALL job:abc123-def456-ghi789

# Check queue length
LLEN queue:jobs

Health Check

# Check API health
curl http://localhost:3000/health

# Expected response
{"status":"ok","timestamp":"2026-01-17T12:00:00.000Z"}

πŸ› οΈ Development

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

Code Style

The project uses ESLint for code quality. Run linting with:

npm run lint

Adding New Prompts

  1. Edit data/prompts.json
  2. Add your prompt following this structure:
{
  "id": "your_prompt_id",
  "name": "Your Prompt Name",
  "description": "Description of what this prompt does",
  "template": "Your prompt template with {{variables}}",
  "supportedOutcomes": ["text"],
  "requiredVariables": [],
  "defaultVariables": {
    "variable_name": "default_value"
  },
  "examples": [
    {
      "variables": {
        "variable_name": "example_value"
      },
      "description": "Example description"
    }
  ]
}
  1. Restart the services to load the new prompt

Environment-Specific Configuration

Development:

NODE_ENV=development
LOG_LEVEL=debug

Production:

NODE_ENV=production
LOG_LEVEL=info

πŸ“„ License

ISC

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ž Support

For issues and questions:


Built with ❀️ using Google Gemini AI

About

imagen comes from 'image' generator or imagine. Uses Gemini AI to play with images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published