AI image generation for coding agents. Craft optimal prompts, generate images via Google's Nano Banana (Gemini) models, self-review output, and post-process — all from your AI coding assistant.
Works with Claude Code, Codex CLI, Gemini CLI, GitHub Copilot, Cursor, OpenCode, and any tool supporting the SKILL.md open standard.
Pick your platform:
Claude Code (recommended):
# 1. Add the marketplace
claude plugin marketplace add harshkedia177/image-gen-plugin
# 2. Install the plugin
claude plugin install image-genCross-platform (Codex, Gemini CLI, Copilot, Cursor, and more):
npx skills add harshkedia177/image-gen-pluginClaude Code (local dev / one-off):
git clone https://github.com/harshkedia177/image-gen-plugin.git
claude --plugin-dir ./image-gen-pluginManual install (any agent):
# Copy the cross-platform skills into your project
git clone https://github.com/harshkedia177/image-gen-plugin.git
cp -r image-gen-plugin/.agents/skills/* .agents/skills/
rm -rf image-gen-pluginGet a free API key from Google AI Studio and set it:
export GEMINI_API_KEY="your-api-key-here"Add it to your shell profile (~/.zshrc, ~/.bashrc) to persist across sessions.
Just ask your coding agent:
Generate an image of a golden retriever puppy sitting in a sunlit meadow
Or use the slash command in Claude Code:
/image-gen:generate a cyberpunk city at night with neon reflections
That's it. The agent handles prompt crafting, model selection, API calls, and saving the output.
When you ask for an image, the skill:
- Gathers requirements — asks 2-3 smart clarifying questions (style, mood, composition) or skips straight to generation if your description is detailed enough
- Selects the prompting strategy — natural language for simple requests, JSON structured prompts for complex multi-element scenes
- Picks the right model — Nano Banana 2 for speed, Pro for maximum quality
- Crafts an optimized prompt — applies Gemini-specific best practices (priority ordering, constraint handling, text rendering tricks)
- Generates the image — calls the Gemini API with proper
imageConfig(resolution, aspect ratio, thinking mode) - Self-reviews — evaluates the output for prompt adherence, technical quality, and composition; regenerates with targeted fixes if needed
- Post-processes — removes backgrounds, converts formats if requested
- Saves with metadata — image file + markdown sidecar with the exact prompt, settings, and review notes
| Feature | Description |
|---|---|
| Smart prompt crafting | Natural language or JSON structured prompts based on complexity |
| Two models | Nano Banana 2 (fast, ~3s) and Pro (max quality, ~10-20s) |
| Thinking mode | Configurable reasoning for complex scenes (NB2 only, 3 levels) |
| Search grounding | Google Search for real-world accuracy — landmarks, logos, products (NB2 only) |
| Self-review loop | Automatically evaluates and improves output based on complexity |
| Conversational editing | Refine images with follow-ups instead of regenerating from scratch |
| Batch generation | Multiple variations from one concept — vary style, lighting, aspect ratio |
| Text rendering | Text-first workflow for accurate in-image text (signs, posters, logos) |
| Character consistency | 360-degree sheets, identity locking for multi-image character work |
| Background removal | Transparent PNGs via rembg — great for logos and stickers |
| Format conversion | Export as PNG, JPEG, or WebP |
| 14 aspect ratios | From 1:1 to 21:9 ultrawide, 9:16 stories, and more |
| 5 resolutions | 512px preview to 4K print-ready |
| Markdown sidecars | Every image saved with its prompt, model, settings, and review notes |
| Model | API ID | Speed | Cost | Best For |
|---|---|---|---|---|
| Nano Banana 2 | gemini-3.1-flash-image-preview |
~3s at 1K | ~$0.067/2K | Fast iteration, batch, web-grounded generation |
| Nano Banana Pro | gemini-3-pro-image-preview |
~10-20s at 1K | ~$0.134/2K | Maximum quality, complex scenes, professional output |
Fallback: gemini-2.5-flash-image if primary models return 404. Auto-discovers latest models as a last resort.
Simple generation:
Generate a watercolor painting of a lighthouse at sunset
With specific requirements:
Create a 16:9 product photo of a matte black coffee mug on a marble countertop,
soft morning light from the left, shallow depth of field, no text
Batch generation:
/image-gen:batch Generate 4 variations of a minimalist logo for "Brew Lab" coffee shop
— vary the style: line art, geometric, hand-drawn, modern sans-serif
With post-processing:
Generate a cartoon mascot character on a white background, then remove the background
to make it a transparent PNG
Conversational editing:
> [generates initial image]
Make the lighting warmer and add a slight lens flare from the top right
Prompt optimization (fix a bad prompt):
Optimize this image prompt: "cat, cute, fluffy, orange, garden, sunny, sitting, flowers"
Convert from another model:
Optimize this Midjourney prompt for Nano Banana: "cyberpunk city --ar 16:9 --v 6 --style raw"
GEMINI_API_KEY— required. Get one free at Google AI Studio- Python 3.9+ with
rembg— only needed for background removal:pip install rembg
Create .claude/image-gen.local.md in your project to customize defaults:
---
output_dir: ./generated-images
default_model: nano-banana-2
default_resolution: 1K
default_aspect_ratio: 1:1
review_rounds: 0
save_prompt: true
thinking_level: auto
search_grounding: false
---| Setting | Options | Default | Description |
|---|---|---|---|
output_dir |
Any path | ./generated-images |
Where images are saved |
default_model |
nano-banana-2, nano-banana-pro |
nano-banana-2 |
Generation model |
default_resolution |
512px, 0.5K, 1K, 2K, 4K |
1K |
Output resolution |
default_aspect_ratio |
14 ratios (see below) | 1:1 |
Output aspect ratio |
review_rounds |
0-3 |
0 (auto) |
Self-review iterations. 0 = auto based on complexity |
save_prompt |
true, false |
true |
Save markdown sidecar alongside image |
thinking_level |
auto, off, moderate, advanced |
auto |
NB2 thinking mode. Advanced costs 20-40% more |
search_grounding |
true, false |
false |
Enable Google Search for real-world subjects (NB2 only) |
Aspect ratios: 1:1 2:3 3:2 3:4 4:3 4:5 5:4 9:16 16:9 21:9 1:4 4:1 1:8 8:1
| Command | Description |
|---|---|
/image-gen:generate [description] |
Generate a single image. Interactive if no description provided. |
/image-gen:batch [concept] |
Generate multiple variations from one concept. |
| Skill | What It Provides |
|---|---|
| image-generation | Complete end-to-end generation workflow (universal, works on all platforms) |
| prompt-optimizer | Takes rough/bad prompts and rewrites them — diagnoses issues, applies Gemini best practices, recommends settings |
| nano-banana-prompting | Prompt engineering expertise — NL vs JSON strategy, best practices, examples |
| image-post-processing | Background removal and format conversion |
Each skill includes detailed reference files for advanced techniques, character consistency, JSON schema, prompt examples, and full API documentation.
image-gen-plugin/
├── .agents/skills/ # Cross-platform skills (Codex, Gemini CLI, Copilot)
│ ├── image-generation/ # Universal end-to-end workflow
│ ├── prompt-optimizer/ # Prompt diagnosis and rewriting
│ ├── nano-banana-prompting/# Prompt crafting knowledge + references
│ └── image-post-processing/# Background removal & format conversion
├── .claude-plugin/ # Claude Code plugin manifest
├── commands/ # Claude Code slash commands
│ ├── generate.md # /image-gen:generate
│ └── batch.md # /image-gen:batch
├── agents/ # Claude Code subagents
│ ├── prompt-crafter.md # Prompt engineering specialist
│ └── image-reviewer.md # Quality review specialist
├── skills/ # Claude Code plugin skills
│ ├── prompt-optimizer/
│ ├── nano-banana-prompting/
│ └── image-post-processing/
└── scripts/
└── remove-bg.py # Background removal script (rembg)
The plugin calls Google's Gemini API via curl with structured imageConfig parameters for reliable resolution and aspect ratio control. It uses a 3-step model resolution chain:
- Primary model (
gemini-3.1-flash-image-previeworgemini-3-pro-image-preview) - Fallback (
gemini-2.5-flash-image) if primary returns 404 - Auto-discover latest available image models as last resort
Prompts are crafted following Gemini-specific best practices:
- Priority-ordered elements (subject first, constraints last)
- JSON structured prompts for complex scenes to prevent concept bleeding
- Text wrapped in quotes for in-image rendering
imageConfigfor resolution/aspect ratio (not prompt text)- Thinking mode for spatial reasoning in complex compositions
- Search grounding for real-world visual accuracy
MIT