A production-ready REST API for processing images using Google's Gemini AI model. This service provides asynchronous job processing with Redis-backed queuing, supporting various image analysis tasks like OCR, object detection, scene understanding, and more.
- Asynchronous Processing: Submit jobs and retrieve results when ready
- Multiple Analysis Types: Pre-configured prompts for different image analysis and generation tasks
- Image Generation: Generate images from text descriptions using state-of-the-art models
- Flexible Templating: Customizable prompts with variable substitution
- Redis Queue: Reliable job queuing and status tracking
- Docker Support: Easy deployment with Docker Compose
- Production Ready: Comprehensive error handling, logging, and graceful shutdown
- Docker and Docker Compose (recommended)
- Node.js 20+ (for local development)
- Redis 7+ (included in Docker setup)
- Google Gemini API Key (Get one here)
-
Clone the repository:
git clone <repository-url> cd imagen
-
Run the setup script:
chmod +x setup.sh ./setup.sh
-
Configure your API key: Edit the
.envfile and add your Gemini API key:GEMINI_API_KEY=your_actual_api_key_here
-
Start the services:
docker-compose up --build
The API will be available at http://localhost:3000
-
Install dependencies:
npm install
-
Set up environment:
cp .env.example .env # Edit .env and add your GEMINI_API_KEY -
Start Redis (if not using Docker):
redis-server
-
Start the API server:
npm start
-
Start the worker (in a separate terminal):
node src/workers/jobWorker.js
Configure the application by editing the .env file:
# Server Configuration
PORT=3000
NODE_ENV=development
LOG_LEVEL=info
# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-pro
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
# Redis Configuration
REDIS_URL=redis://localhost:6379
REDIS_PASSWORD=
REDIS_DB=0
# Job Processing Configuration
MAX_CONCURRENT_JOBS=5
JOB_TIMEOUT_MS=300000
JOB_TTL=86400
# File Upload Configuration
MAX_FILE_SIZE_MB=10
ALLOWED_FILE_TYPES=image/jpeg,image/png,image/webp,image/gif
# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100The API documentation is available in interactive Swagger UI format at:
http://localhost:3000/doc
This interface allows you to explore endpoints, view schemas, and test API calls directly from your browser.
GET /prompts/show
Retrieve a list of all available prompt templates.
Response (200 OK):
{
"prompts": [
{
"id": "describe_image",
"name": "Image Description",
"description": "Provides a detailed description of an image.",
"requiredVariables": [],
"supportedOutcomes": ["text"]
},
...
]
}POST /jobs
Submit an image for processing with a specific prompt.
Request:
- Content-Type:
multipart/form-data - Body:
image(file, required): Image file (JPEG, PNG, WEBP)promptId(string, required): ID of the prompt to useexpectedOutcome(string, required): Either"text"or"image"variables(JSON object, optional): Variables for prompt template
Response (201 Created):
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"createdAt": "2026-01-17T12:00:00.000Z"
}GET /jobs/:id
Retrieve the status and results of a job.
Response (200 OK):
Pending/Processing:
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"promptId": "describe_image",
"expectedOutcome": "text",
"createdAt": "2026-01-17T12:00:00.000Z",
"updatedAt": "2026-01-17T12:00:05.000Z",
"completedAt": null
}Completed:
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"promptId": "describe_image",
"expectedOutcome": "text",
"createdAt": "2026-01-17T12:00:00.000Z",
"updatedAt": "2026-01-17T12:00:15.000Z",
"completedAt": "2026-01-17T12:00:15.000Z",
"result": {
"text": "A detailed description of the image..."
},
"metadata": {
"processingTime": 8500,
"promptUsed": "Analyze this image and provide a detailed description...",
"modelVersion": "gemini-1.5-pro"
}
}Failed:
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"promptId": "describe_image",
"expectedOutcome": "text",
"createdAt": "2026-01-17T12:00:00.000Z",
"updatedAt": "2026-01-17T12:00:10.000Z",
"completedAt": "2026-01-17T12:00:10.000Z",
"error": {
"message": "API rate limit exceeded",
"code": "RATE_LIMIT_ERROR"
}
}GET /jobs/:id/image
Download the processed image (for jobs with expectedOutcome: "image").
Response (200 OK):
- Content-Type:
image/jpeg(or appropriate image MIME type) - Body: Binary image data
Note: Currently, Gemini 1.5 Pro returns text descriptions rather than generated images. This endpoint is prepared for future image generation capabilities.
The API includes 8 pre-configured prompts in data/prompts.json:
Provides a detailed description of an image.
Variables:
detail_level(optional, default: "detailed"): Level of detailfocus_area(optional, default: "all visual elements, composition, colors, and subjects"): What to focus on
Example:
{
"variables": {
"detail_level": "brief",
"focus_area": "the main subject only"
}
}Extracts all visible text from an image.
Variables:
format(optional, default: "plain text"): Output formatlayout(optional, default: "original layout and structure"): How to preserve layout
Example:
{
"variables": {
"format": "JSON with text and coordinates",
"layout": "line-by-line structure"
}
}Identifies and extracts dominant colors from an image.
Variables:
count(optional, default: "5"): Number of colors to extractformat(optional, default: "hex color codes with color names"): Color formatadditional_info(optional, default: "the approximate percentage of each color in the image"): Extra information
Example:
{
"variables": {
"count": "3",
"format": "RGB values",
"additional_info": "color mood and palette description"
}
}Detects and identifies objects in an image.
Variables:
object_type(optional, default: "objects and items"): Type of objects to detectdetails(optional, default: "the name, approximate location, and size"): Details to includeconfidence_instruction(optional, default: "Include your confidence level for each detection."): Confidence handling
Example:
{
"variables": {
"object_type": "people and faces",
"details": "count, positions, and any visible attributes",
"confidence_instruction": "Only include detections with high confidence."
}
}Classifies an image into categories.
Variables:
aspects(optional, default: "subject matter, style, mood, and context"): Classification aspectsoutput_format(optional, default: "a list of categories with confidence scores"): Output format
Example:
{
"variables": {
"aspects": "artistic style and genre only",
"output_format": "the top 3 most relevant categories"
}
}Provides comprehensive understanding of a scene.
Variables:
elements(optional, default: "the setting, objects, people, activities, and atmosphere"): Scene elementscontext(optional, default: "the likely context, time of day, location type, and any notable details"): Context information
Example:
{
"variables": {
"elements": "the environment and weather conditions",
"context": "outdoor/indoor setting and time of day"
}
}Compares and analyzes similarities and differences between images.
Variables:
comparison_aspects(optional, default: "visual similarities, differences, composition, and style"): What to compareoutput_focus(optional, default: "key differences and notable similarities"): Output focus
Example:
{
"variables": {
"comparison_aspects": "color palette and lighting only",
"output_focus": "technical differences in photography"
}
}Creates detailed alt-text descriptions for accessibility.
Variables:
elements(optional, default: "all important visual information, text, and context"): Elements to includestyle(optional, default: "clear, concise, and descriptive"): Description stylemax_length(optional, default: "250 words"): Maximum length
Example:
{
"variables": {
"elements": "essential information only",
"style": "brief and factual",
"max_length": "100 words"
}
}curl http://localhost:3000/prompts/showcurl -X POST http://localhost:3000/jobs \
-F "image=@/path/to/photo.jpg" \
-F "promptId=describe_image" \
-F "expectedOutcome=text"Response:
{
"jobId": "abc123-def456-ghi789",
"status": "pending",
"createdAt": "2026-01-17T12:00:00.000Z"
}curl -X POST http://localhost:3000/jobs \
-F "image=@/path/to/document.png" \
-F "promptId=ocr" \
-F "expectedOutcome=text" \
-F 'variables={"format":"JSON with text and coordinates","layout":"line-by-line structure"}'curl http://localhost:3000/jobs/abc123-def456-ghi789Response:
{
"jobId": "abc123-def456-ghi789",
"status": "completed",
"promptId": "describe_image",
"expectedOutcome": "text",
"createdAt": "2026-01-17T12:00:00.000Z",
"updatedAt": "2026-01-17T12:00:15.000Z",
"completedAt": "2026-01-17T12:00:15.000Z",
"result": {
"text": "The image shows a sunset over a calm ocean with vibrant orange and pink hues in the sky..."
},
"metadata": {
"processingTime": 8500,
"promptUsed": "Analyze this image and provide a detailed description. Focus on all visual elements, composition, colors, and subjects.",
"modelVersion": "gemini-1.5-pro"
}
}curl -X POST http://localhost:3000/jobs \
-F "image=@/path/to/artwork.jpg" \
-F "promptId=extract_colors" \
-F "expectedOutcome=text" \
-F 'variables={"count":"3","format":"hex color codes with color names"}'curl -X POST http://localhost:3000/jobs \
-F "image=@/path/to/street_scene.jpg" \
-F "promptId=object_detection" \
-F "expectedOutcome=text" \
-F 'variables={"object_type":"vehicles and pedestrians","details":"count and approximate positions"}'curl -X POST http://localhost:3000/jobs \
-F "image=@/path/to/infographic.png" \
-F "promptId=accessibility_description" \
-F "expectedOutcome=text" \
-F 'variables={"max_length":"150 words","style":"clear and concise"}'Generate an image from a text description. Note that the image file input is optional for this mode.
curl -X POST http://localhost:3000/jobs \
-F "promptId=generate_image" \
-F "expectedOutcome=image" \
-F 'variables={"prompt":"generate a cyberpunk city"}'Note: This feature uses the configured image generation model (default: gemini-2.5-flash-image).
βββββββββββββββ
β Client β
ββββββββ¬βββββββ
β HTTP
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Express API Server β
β ββββββββββββββββββββββββββββββββ β
β β Controllers & Routes β β
β β - Job Submission β β
β β - Status Retrieval β β
β ββββββββββββββββββββββββββββββββ β
ββββββββββββ¬βββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Redis β
β - Job Queue β
β - Job Data β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Job Worker Process β
β ββββββββββββββββββββββββββββββββ β
β β 1. Dequeue Job β β
β β 2. Load Prompt Template β β
β β 3. Process with Gemini API β β
β β 4. Store Results β β
β ββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββ
imagen/
βββ data/
β βββ prompts.json # Prompt templates
β βββ uploads/ # Uploaded images
β βββ results/ # Generated results
βββ src/
β βββ config/
β β βββ redis.js # Redis connection
β β βββ logger.js # Logging configuration
β βββ controllers/
β β βββ jobController.js # Request handlers
β βββ middleware/
β β βββ errorHandler.js # Error handling
β β βββ validator.js # Input validation
β βββ repositories/
β β βββ jobRepository.js # Job data access
β βββ routes/
β β βββ jobRoutes.js # API routes
β βββ services/
β β βββ geminiService.js # Gemini API integration
β β βββ promptService.js # Prompt management
β β βββ queueService.js # Queue operations
β βββ utils/
β β βββ logger.js # Logging utilities
β βββ workers/
β β βββ jobWorker.js # Background job processor
β βββ app.js # Express app setup
β βββ server.js # Server entry point
βββ docker-compose.yml # Docker services
βββ Dockerfile # Container definition
βββ package.json # Dependencies
βββ setup.sh # Setup script
βββ .env.example # Environment template
- Submission: Client uploads image via POST
/jobs - Validation: Request validated, image stored
- Queuing: Job added to Redis queue
- Processing: Worker picks up job, calls Gemini API
- Completion: Results stored in Redis
- Retrieval: Client polls GET
/jobs/:idfor results
Problem: The Gemini API key is missing or invalid.
Solution:
# Edit .env file
nano .env
# Add your API key
GEMINI_API_KEY=your_actual_api_key_here
# Restart services
docker-compose restartProblem: Redis is not running or not accessible.
Solution:
# Check if Redis container is running
docker-compose ps
# Restart Redis
docker-compose restart redis
# Check Redis logs
docker-compose logs redisProblem: Worker is not running or crashed.
Solution:
# Check worker logs
docker-compose logs worker
# Restart worker
docker-compose restart worker
# Check for errors in logs
docker-compose logs -f workerProblem: Image exceeds maximum file size.
Solution:
# Edit .env to increase limit
MAX_FILE_SIZE_MB=20
# Restart API
docker-compose restart apiProblem: Unsupported image format.
Solution: Ensure your image is in one of these formats:
- JPEG (.jpg, .jpeg)
- PNG (.png)
- WEBP (.webp)
- GIF (.gif)
Problem: Too many requests to Gemini API.
Solution:
- Wait a few minutes before retrying
- Reduce
MAX_CONCURRENT_JOBSin.env - Check your Gemini API quota
# Edit .env
LOG_LEVEL=debug
# Restart services
docker-compose restart# All services
docker-compose logs -f
# Specific service
docker-compose logs -f api
docker-compose logs -f worker
docker-compose logs -f redis# Connect to Redis CLI
docker-compose exec redis redis-cli
# List all jobs
KEYS job:*
# Get job details
HGETALL job:abc123-def456-ghi789
# Check queue length
LLEN queue:jobs# Check API health
curl http://localhost:3000/health
# Expected response
{"status":"ok","timestamp":"2026-01-17T12:00:00.000Z"}# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverageThe project uses ESLint for code quality. Run linting with:
npm run lint- Edit
data/prompts.json - Add your prompt following this structure:
{
"id": "your_prompt_id",
"name": "Your Prompt Name",
"description": "Description of what this prompt does",
"template": "Your prompt template with {{variables}}",
"supportedOutcomes": ["text"],
"requiredVariables": [],
"defaultVariables": {
"variable_name": "default_value"
},
"examples": [
{
"variables": {
"variable_name": "example_value"
},
"description": "Example description"
}
]
}- Restart the services to load the new prompt
Development:
NODE_ENV=development
LOG_LEVEL=debugProduction:
NODE_ENV=production
LOG_LEVEL=infoISC
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- Check the Troubleshooting section
- Review Gemini API documentation
- Open an issue in the repository
Built with β€οΈ using Google Gemini AI