Gemini Image Processing API

A production-ready REST API for processing images using Google's Gemini AI model. This service provides asynchronous job processing with Redis-backed queuing, supporting various image analysis tasks like OCR, object detection, scene understanding, and more.

🌟 Features

Asynchronous Processing: Submit jobs and retrieve results when ready
Multiple Analysis Types: Pre-configured prompts for different image analysis and generation tasks
Image Generation: Generate images from text descriptions using state-of-the-art models
Flexible Templating: Customizable prompts with variable substitution
Redis Queue: Reliable job queuing and status tracking
Docker Support: Easy deployment with Docker Compose
Production Ready: Comprehensive error handling, logging, and graceful shutdown

🔧 Prerequisites

Docker and Docker Compose (recommended)
Node.js 20+ (for local development)
Redis 7+ (included in Docker setup)
Google Gemini API Key (Get one here)

📦 Installation

Using Docker (Recommended)

Clone the repository:
```
git clone <repository-url>
cd imagen
```
Run the setup script:
```
chmod +x setup.sh
./setup.sh
```
Configure your API key: Edit the .env file and add your Gemini API key:
```
GEMINI_API_KEY=your_actual_api_key_here
```
Start the services:
```
docker-compose up --build
```

The API will be available at http://localhost:3000

Local Development

Install dependencies:
```
npm install
```

Set up environment:

cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Start Redis (if not using Docker):
```
redis-server
```
Start the API server:
```
npm start
```
Start the worker (in a separate terminal):
```
node src/workers/jobWorker.js
```

⚙️ Configuration

Environment Variables

Configure the application by editing the .env file:

# Server Configuration
PORT=3000
NODE_ENV=development
LOG_LEVEL=info

# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-pro
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image

# Redis Configuration
REDIS_URL=redis://localhost:6379
REDIS_PASSWORD=
REDIS_DB=0

# Job Processing Configuration
MAX_CONCURRENT_JOBS=5
JOB_TIMEOUT_MS=300000
JOB_TTL=86400

# File Upload Configuration
MAX_FILE_SIZE_MB=10
ALLOWED_FILE_TYPES=image/jpeg,image/png,image/webp,image/gif

# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100

🚀 Usage

Interactive Documentation

The API documentation is available in interactive Swagger UI format at:

http://localhost:3000/doc

This interface allows you to explore endpoints, view schemas, and test API calls directly from your browser.

API Endpoints

1. List Available Prompts

GET /prompts/show

Retrieve a list of all available prompt templates.

Response (200 OK):

{
  "prompts": [
    {
      "id": "describe_image",
      "name": "Image Description",
      "description": "Provides a detailed description of an image.",
      "requiredVariables": [],
      "supportedOutcomes": ["text"]
    },
    ...
  ]
}

2. Submit a Job

POST /jobs

Submit an image for processing with a specific prompt.

Request:

Content-Type: multipart/form-data
Body:
- image (file, required): Image file (JPEG, PNG, WEBP)
- promptId (string, required): ID of the prompt to use
- expectedOutcome (string, required): Either "text" or "image"
- variables (JSON object, optional): Variables for prompt template

Response (201 Created):

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "createdAt": "2026-01-17T12:00:00.000Z"
}

2. Get Job Status

GET /jobs/:id

Retrieve the status and results of a job.

Response (200 OK):

Pending/Processing:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:05.000Z",
  "completedAt": null
}

Completed:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:15.000Z",
  "completedAt": "2026-01-17T12:00:15.000Z",
  "result": {
    "text": "A detailed description of the image..."
  },
  "metadata": {
    "processingTime": 8500,
    "promptUsed": "Analyze this image and provide a detailed description...",
    "modelVersion": "gemini-1.5-pro"
  }
}

Failed:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:10.000Z",
  "completedAt": "2026-01-17T12:00:10.000Z",
  "error": {
    "message": "API rate limit exceeded",
    "code": "RATE_LIMIT_ERROR"
  }
}

3. Get Image Result

GET /jobs/:id/image

Download the processed image (for jobs with expectedOutcome: "image").

Response (200 OK):

Content-Type: image/jpeg (or appropriate image MIME type)
Body: Binary image data

Note: Currently, Gemini 1.5 Pro returns text descriptions rather than generated images. This endpoint is prepared for future image generation capabilities.

Available Prompts

The API includes 8 pre-configured prompts in data/prompts.json:

1. describe_image

Provides a detailed description of an image.

Variables:

detail_level (optional, default: "detailed"): Level of detail
focus_area (optional, default: "all visual elements, composition, colors, and subjects"): What to focus on

Example:

{
  "variables": {
    "detail_level": "brief",
    "focus_area": "the main subject only"
  }
}

2. ocr

Extracts all visible text from an image.

Variables:

format (optional, default: "plain text"): Output format
layout (optional, default: "original layout and structure"): How to preserve layout

Example:

{
  "variables": {
    "format": "JSON with text and coordinates",
    "layout": "line-by-line structure"
  }
}

3. extract_colors

Identifies and extracts dominant colors from an image.

Variables:

count (optional, default: "5"): Number of colors to extract
format (optional, default: "hex color codes with color names"): Color format
additional_info (optional, default: "the approximate percentage of each color in the image"): Extra information

Example:

{
  "variables": {
    "count": "3",
    "format": "RGB values",
    "additional_info": "color mood and palette description"
  }
}

4. object_detection

Detects and identifies objects in an image.

Variables:

object_type (optional, default: "objects and items"): Type of objects to detect
details (optional, default: "the name, approximate location, and size"): Details to include
confidence_instruction (optional, default: "Include your confidence level for each detection."): Confidence handling

Example:

{
  "variables": {
    "object_type": "people and faces",
    "details": "count, positions, and any visible attributes",
    "confidence_instruction": "Only include detections with high confidence."
  }
}

5. image_classification

Classifies an image into categories.

Variables:

aspects (optional, default: "subject matter, style, mood, and context"): Classification aspects
output_format (optional, default: "a list of categories with confidence scores"): Output format

Example:

{
  "variables": {
    "aspects": "artistic style and genre only",
    "output_format": "the top 3 most relevant categories"
  }
}

6. scene_understanding

Provides comprehensive understanding of a scene.

Variables:

elements (optional, default: "the setting, objects, people, activities, and atmosphere"): Scene elements
context (optional, default: "the likely context, time of day, location type, and any notable details"): Context information

Example:

{
  "variables": {
    "elements": "the environment and weather conditions",
    "context": "outdoor/indoor setting and time of day"
  }
}

7. compare_images

Compares and analyzes similarities and differences between images.

Variables:

comparison_aspects (optional, default: "visual similarities, differences, composition, and style"): What to compare
output_focus (optional, default: "key differences and notable similarities"): Output focus

Example:

{
  "variables": {
    "comparison_aspects": "color palette and lighting only",
    "output_focus": "technical differences in photography"
  }
}

8. accessibility_description

Creates detailed alt-text descriptions for accessibility.

Variables:

elements (optional, default: "all important visual information, text, and context"): Elements to include
style (optional, default: "clear, concise, and descriptive"): Description style
max_length (optional, default: "250 words"): Maximum length

Example:

{
  "variables": {
    "elements": "essential information only",
    "style": "brief and factual",
    "max_length": "100 words"
  }
}

📝 Examples

Example 1: List Available Prompts

curl http://localhost:3000/prompts/show

Example 2: Basic Image Description

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/photo.jpg" \
  -F "promptId=describe_image" \
  -F "expectedOutcome=text"

Response:

{
  "jobId": "abc123-def456-ghi789",
  "status": "pending",
  "createdAt": "2026-01-17T12:00:00.000Z"
}

Example 2: OCR with Custom Variables

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/document.png" \
  -F "promptId=ocr" \
  -F "expectedOutcome=text" \
  -F 'variables={"format":"JSON with text and coordinates","layout":"line-by-line structure"}'

Example 3: Check Job Status

curl http://localhost:3000/jobs/abc123-def456-ghi789

Response:

{
  "jobId": "abc123-def456-ghi789",
  "status": "completed",
  "promptId": "describe_image",
  "expectedOutcome": "text",
  "createdAt": "2026-01-17T12:00:00.000Z",
  "updatedAt": "2026-01-17T12:00:15.000Z",
  "completedAt": "2026-01-17T12:00:15.000Z",
  "result": {
    "text": "The image shows a sunset over a calm ocean with vibrant orange and pink hues in the sky..."
  },
  "metadata": {
    "processingTime": 8500,
    "promptUsed": "Analyze this image and provide a detailed description. Focus on all visual elements, composition, colors, and subjects.",
    "modelVersion": "gemini-1.5-pro"
  }
}

Example 4: Color Extraction

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/artwork.jpg" \
  -F "promptId=extract_colors" \
  -F "expectedOutcome=text" \
  -F 'variables={"count":"3","format":"hex color codes with color names"}'

Example 5: Object Detection

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/street_scene.jpg" \
  -F "promptId=object_detection" \
  -F "expectedOutcome=text" \
  -F 'variables={"object_type":"vehicles and pedestrians","details":"count and approximate positions"}'

Example 6: Accessibility Description

curl -X POST http://localhost:3000/jobs \
  -F "image=@/path/to/infographic.png" \
  -F "promptId=accessibility_description" \
  -F "expectedOutcome=text" \
  -F 'variables={"max_length":"150 words","style":"clear and concise"}'

Example 7: Text-to-Image Generation

Generate an image from a text description. Note that the image file input is optional for this mode.

curl -X POST http://localhost:3000/jobs \
  -F "promptId=generate_image" \
  -F "expectedOutcome=image" \
  -F 'variables={"prompt":"generate a cyberpunk city"}'

Note: This feature uses the configured image generation model (default: gemini-2.5-flash-image).

🏗️ Architecture

System Components

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ HTTP
       ▼
┌─────────────────────────────────────┐
│         Express API Server          │
│  ┌──────────────────────────────┐  │
│  │   Controllers & Routes       │  │
│  │   - Job Submission           │  │
│  │   - Status Retrieval         │  │
│  └──────────────────────────────┘  │
└──────────┬──────────────────────────┘
           │
           ▼
    ┌──────────────┐
    │    Redis     │
    │  - Job Queue │
    │  - Job Data  │
    └──────┬───────┘
           │
           ▼
┌──────────────────────────────────────┐
│         Job Worker Process           │
│  ┌──────────────────────────────┐   │
│  │  1. Dequeue Job              │   │
│  │  2. Load Prompt Template     │   │
│  │  3. Process with Gemini API  │   │
│  │  4. Store Results            │   │
│  └──────────────────────────────┘   │
└──────────────────────────────────────┘

Directory Structure

imagen/
├── data/
│   ├── prompts.json          # Prompt templates
│   ├── uploads/              # Uploaded images
│   └── results/              # Generated results
├── src/
│   ├── config/
│   │   ├── redis.js          # Redis connection
│   │   └── logger.js         # Logging configuration
│   ├── controllers/
│   │   └── jobController.js  # Request handlers
│   ├── middleware/
│   │   ├── errorHandler.js   # Error handling
│   │   └── validator.js      # Input validation
│   ├── repositories/
│   │   └── jobRepository.js  # Job data access
│   ├── routes/
│   │   └── jobRoutes.js      # API routes
│   ├── services/
│   │   ├── geminiService.js  # Gemini API integration
│   │   ├── promptService.js  # Prompt management
│   │   └── queueService.js   # Queue operations
│   ├── utils/
│   │   └── logger.js         # Logging utilities
│   ├── workers/
│   │   └── jobWorker.js      # Background job processor
│   ├── app.js                # Express app setup
│   └── server.js             # Server entry point
├── docker-compose.yml        # Docker services
├── Dockerfile                # Container definition
├── package.json              # Dependencies
├── setup.sh                  # Setup script
└── .env.example              # Environment template

Job Lifecycle

Submission: Client uploads image via POST /jobs
Validation: Request validated, image stored
Queuing: Job added to Redis queue
Processing: Worker picks up job, calls Gemini API
Completion: Results stored in Redis
Retrieval: Client polls GET /jobs/:id for results

🔍 Troubleshooting

Common Issues

1. "GEMINI_API_KEY is not set"

Problem: The Gemini API key is missing or invalid.

Solution:

# Edit .env file
nano .env

# Add your API key
GEMINI_API_KEY=your_actual_api_key_here

# Restart services
docker-compose restart

2. "Connection refused" to Redis

Problem: Redis is not running or not accessible.

Solution:

# Check if Redis container is running
docker-compose ps

# Restart Redis
docker-compose restart redis

# Check Redis logs
docker-compose logs redis

3. Jobs stuck in "pending" status

Problem: Worker is not running or crashed.

Solution:

# Check worker logs
docker-compose logs worker

# Restart worker
docker-compose restart worker

# Check for errors in logs
docker-compose logs -f worker

4. "File too large" error

Problem: Image exceeds maximum file size.

Solution:

# Edit .env to increase limit
MAX_FILE_SIZE_MB=20

# Restart API
docker-compose restart api

5. "Invalid file type" error

Problem: Unsupported image format.

Solution: Ensure your image is in one of these formats:

JPEG (.jpg, .jpeg)
PNG (.png)
WEBP (.webp)
GIF (.gif)

6. Gemini API rate limit errors

Problem: Too many requests to Gemini API.

Solution:

Wait a few minutes before retrying
Reduce MAX_CONCURRENT_JOBS in .env
Check your Gemini API quota

Debugging

Enable Debug Logging

# Edit .env
LOG_LEVEL=debug

# Restart services
docker-compose restart

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f api
docker-compose logs -f worker
docker-compose logs -f redis

Check Redis Data

# Connect to Redis CLI
docker-compose exec redis redis-cli

# List all jobs
KEYS job:*

# Get job details
HGETALL job:abc123-def456-ghi789

# Check queue length
LLEN queue:jobs

Health Check

# Check API health
curl http://localhost:3000/health

# Expected response
{"status":"ok","timestamp":"2026-01-17T12:00:00.000Z"}

🛠️ Development

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

Code Style

The project uses ESLint for code quality. Run linting with:

npm run lint

Adding New Prompts

Edit data/prompts.json
Add your prompt following this structure:

{
  "id": "your_prompt_id",
  "name": "Your Prompt Name",
  "description": "Description of what this prompt does",
  "template": "Your prompt template with {{variables}}",
  "supportedOutcomes": ["text"],
  "requiredVariables": [],
  "defaultVariables": {
    "variable_name": "default_value"
  },
  "examples": [
    {
      "variables": {
        "variable_name": "example_value"
      },
      "description": "Example description"
    }
  ]
}

Restart the services to load the new prompt

Environment-Specific Configuration

Development:

NODE_ENV=development
LOG_LEVEL=debug

Production:

NODE_ENV=production
LOG_LEVEL=info

📄 License

ISC

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📞 Support

For issues and questions:

Check the Troubleshooting section
Review Gemini API documentation
Open an issue in the repository

Built with ❤️ using Google Gemini AI

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
plans		plans
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
API_SUMMARY.md		API_SUMMARY.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
setup.sh		setup.sh

thanosa75/imagen

Folders and files

Latest commit

History

Repository files navigation

Gemini Image Processing API

🌟 Features

📋 Table of Contents

🔧 Prerequisites

📦 Installation

Using Docker (Recommended)

Local Development

⚙️ Configuration

Environment Variables

🚀 Usage

Interactive Documentation

API Endpoints

1. List Available Prompts

2. Submit a Job

2. Get Job Status

3. Get Image Result

Available Prompts

1. describe_image

2. ocr

3. extract_colors

4. object_detection

5. image_classification

6. scene_understanding

7. compare_images

8. accessibility_description

📝 Examples

Example 1: List Available Prompts

Example 2: Basic Image Description

Example 2: OCR with Custom Variables

Example 3: Check Job Status

Example 4: Color Extraction

Example 5: Object Detection

Example 6: Accessibility Description

Example 7: Text-to-Image Generation

🏗️ Architecture

System Components

Directory Structure

Job Lifecycle

🔍 Troubleshooting

Common Issues

1. "GEMINI_API_KEY is not set"

2. "Connection refused" to Redis

3. Jobs stuck in "pending" status

4. "File too large" error

5. "Invalid file type" error

6. Gemini API rate limit errors

Debugging

Enable Debug Logging

View Logs

Check Redis Data

Health Check

🛠️ Development

Running Tests

Code Style

Adding New Prompts

Environment-Specific Configuration

📄 License

🤝 Contributing

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages