Labsco
shinpr logo

Image Generator

β˜… 133

from shinpr

Image generation and editing with advanced features like multi-image blending and character consistency

πŸ”₯πŸ”₯πŸ”₯βœ“ VerifiedAccount requiredNeeds API keys

MCP Image Generator 🍌

AI image generation and editing MCP server for Cursor, Claude Code, Codex, and any MCP-compatible tool β€” powered by Nano Banana 2 and Nano Banana Pro (Google Gemini), with optional OpenAI GPT Image support.

An MCP server that turns simple text prompts into high-quality images. Unlike a simple API wrapper, this server automatically enhances your prompt and configures sensible defaults for generation β€” you don't need to learn prompt engineering or tune settings. Just describe what you want.

How It Works

Copy & paste β€” that's it
You: "cat on a roof"
 ↓
 Your AI assistant infers context
 (purpose, style, mood, resolution...)
 ↓
 MCP optimizes your prompt
 (adds lighting, composition, atmosphere, artistic details)
 ↓
 Image generation with smart defaults
 (grounding, consistency, resolution β€” all configured automatically)
 ↓
 High-quality image, zero effort

Your AI assistant interprets your intent β€” the style, purpose, and context behind your request. The MCP focuses on output quality by refining the prompt to meet a structured visual clarity standard and selecting appropriate generation settings. You just describe what you want.

The prompt optimizer uses a Subject–Context–Style framework (powered by Gemini 2.5 Flash by default, or OpenAI Responses when IMAGE_PROVIDER=openai) to fill in missing visual details β€” subject characteristics, environment, lighting, camera work β€” while preserving your original intent. It doesn't blindly add details: prompts that already meet the quality standard are left largely intact.

Example β€” what the optimizer does to a short prompt:

Input: "cat on a roof"

After optimization: "A sleek, midnight black cat, perched with poised elegance on the apex of a weathered, terracotta tile roof. Its emerald eyes, narrowed slightly, reflect the warm glow of a setting sun. Each individual tile is distinct, showing subtle variations in color and texture, with patches of moss clinging to the crevices. The cat's fur is sharply defined, catching the golden hour light, highlighting its sleek contours. In the background, the silhouettes of distant, old-world city buildings with ornate spires are softly blurred, bathed in a gradient of fiery orange, soft pink, and deep violet twilight. A gentle, ethereal mist begins to rise from the alleyways below, adding a touch of mystery. The composition is a medium shot, taken from a slightly low angle, emphasizing the cat's commanding presence against the vast sky. Photorealistic style, captured with a prime lens, wide aperture to create a beautiful bokeh, enhancing the depth of field."

Features

  • Built-in Prompt Optimization: Your simple prompt is automatically enriched with photographic and artistic details β€” lighting, composition, atmosphere β€” using Gemini 2.5 Flash by default, or OpenAI Responses when IMAGE_PROVIDER=openai. No prompt engineering skills required.

  • Optional OpenAI Provider: Set IMAGE_PROVIDER=openai to generate and edit images with OpenAI GPT Image models such as gpt-image-2.

  • Three Quality Tiers: Choose between fast iteration, balanced quality, or maximum fidelity with Nano Banana 2 (Gemini 3.1 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image). See Quality Presets .

  • Image Editing: Transform existing images with natural language instructions (image-to-image) while preserving original style and visual consistency.

  • High-Resolution Output: Up to 4K image generation for professional-grade output with superior text rendering and fine details.

  • Flexible Aspect Ratios: From square (1:1) to ultra-wide (21:9) and ultra-tall (1:8) formats.

  • Character Consistency: Maintain consistent character appearance across multiple generations β€” ideal for storyboards, product shots, and visual series.

  • Advanced Capabilities:

  • Google Search grounding for real-time factual accuracy

  • World knowledge for photorealistic depictions of historical figures, landmarks, and factual scenarios

  • Multi-image blending for composite scenes

  • Purpose-aware generation (e.g., "cookbook cover" produces different results than "social media post")

  • Multiple Output Formats: PNG, JPEG, WebP support.

Agent Skill: Image Generation Prompt Guide

This project also provides a standalone Agent Skill (SKILL.md) that teaches AI assistants to write better image generation prompts β€” no MCP server or API key required.

Note: This skill does not generate images itself. It teaches your AI assistant to write better prompts for tools that already have built-in image generation (e.g., Cursor's native image generation).

Based on the Subject-Context-Style framework, covering prompt structure, visual details (lighting, textures, camera angles), advanced techniques (character consistency, composition), and image editing. Works with any image model (Gemini, GPT Image, Flux, Stable Diffusion, Midjourney, etc.).

Install

Copy & paste β€” that's it
npx mcp-image skills install --path 

The skill will be placed at <path>/image-generation/SKILL.md. Specify the skills directory for your AI tool:

Copy & paste β€” that's it
# Cursor
npx mcp-image skills install --path ~/.cursor/skills

# Codex
npx mcp-image skills install --path ~/.codex/skills

# Claude Code
npx mcp-image skills install --path ~/.claude/skills

When to Use the Skill vs the MCP Server

MCP Server Agent Skill Use when Your AI tool does not have built-in image generation Your AI tool already generates images natively Requires Gemini API key Nothing What it does Generates images via Gemini API with automatic prompt optimization Teaches the AI to write better prompts Works with MCP-compatible tools (Cursor, Claude Code, Codex, etc.) Any tool supporting the Agent Skills open standard

Quality Presets

Choose the right balance of speed, quality, and cost:

Preset Model Best for Speed fast (default) Nano Banana 2 (Gemini 3.1 Flash Image) Quick iterations, drafts, high-volume generation ~30–40s balanced Nano Banana 2 + Thinking Production images, good quality with reasonable speed Medium quality Nano Banana Pro (Gemini 3 Pro Image) Final deliverables, maximum fidelity, critical visuals Slow

Set the default via IMAGE_QUALITY environment variable:

Copy & paste β€” that's it
IMAGE_QUALITY=fast # (default) Fastest generation
IMAGE_QUALITY=balanced # Enhanced thinking for better quality
IMAGE_QUALITY=quality # Maximum quality output

To override per-request, just tell your AI assistant (e.g., "generate in high quality" or "use balanced quality"). The assistant will pass the appropriate quality parameter automatically.

Codex:

Copy & paste β€” that's it
[mcp_servers.mcp-image.env]
GEMINI_API_KEY = "your_gemini_api_key_here"
IMAGE_QUALITY = "balanced"

Cursor: Add "IMAGE_QUALITY": "balanced" to the env section in your config.

Claude Code:

Copy & paste β€” that's it
claude mcp add mcp-image --env GEMINI_API_KEY=your-api-key --env IMAGE_QUALITY=balanced --env IMAGE_OUTPUT_DIR=/absolute/path/to/images -- npx -y mcp-image

Skip Prompt Enhancement

Set SKIP_PROMPT_ENHANCEMENT=true to disable automatic prompt optimization and send your prompts directly to the image generator. Useful when you need full control over the exact prompt wording.

Provider Configuration

Variable Default Description IMAGE_PROVIDER gemini gemini or openai GEMINI_API_KEY - Required when IMAGE_PROVIDER=gemini OPENAI_API_KEY - Required when IMAGE_PROVIDER=openai

Using the OpenAI provider

Set IMAGE_PROVIDER=openai to use OpenAI for both prompt enhancement and image generation. mcp-image currently uses gpt-4o-mini for prompt enhancement and gpt-image-2 for image generation. These model choices are fixed by the server and are not configurable through environment variables.

OpenAI may require organization verification before allowing access to gpt-image-2. If image generation fails with a 403 permission or verification error, check your organization settings: https://platform.openai.com/settings/organization/general

OpenAI provider behavior:

  • Supports text-to-image and image-to-image generation.

  • Supports aspectRatio, mapped to the closest supported OpenAI image size.

  • Supports imageSize values 1K, 2K, and 4K.

  • Maps quality as fast -> low, balanced -> medium, and quality -> high.

  • Does not support useGoogleSearch; that option is only available with the Gemini provider.

Prompt enhancement uses a separate OpenAI Responses API call. Set SKIP_PROMPT_ENHANCEMENT=true to send prompts directly to the image model.

API Reference

generate_image Tool

The server uses a two-stage process with separate models for each stage:

  • Prompt Optimization (Gemini 2.5 Flash by default, or gpt-4o-mini via OpenAI Responses in OpenAI mode): Refines your prompt using the Subject–Context–Style framework. Skippable via SKIP_PROMPT_ENHANCEMENT.

  • Image Generation (Nano Banana 2/Pro by default, or gpt-image-2 in OpenAI mode): Creates the final image. In Gemini mode the model varies by quality preset; in OpenAI mode the model is pinned and quality maps to OpenAI's low/medium/high.

Parameters

Parameter Type Required Description prompt string βœ… Text description or editing instruction quality string - Quality preset: fast (default), balanced, quality. Overrides IMAGE_QUALITY env var for this request inputImagePath string - Absolute path to input image for image-to-image editing fileName string - Custom filename for output (auto-generated if not specified) aspectRatio string - 1:1 (default), 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, 1:4, 1:8, 4:1, 8:1 imageSize string - 1K, 2K, 4K. Leave unspecified for standard quality blendImages boolean - Enable multi-image blending for combining multiple visual elements naturally maintainCharacterConsistency boolean - Maintain character appearance consistency across different poses and scenes useWorldKnowledge boolean - Use real-world knowledge for accurate context (historical figures, landmarks, factual scenarios) useGoogleSearch boolean - Enable Google Search grounding for real-time factual accuracy purpose string - Intended use (e.g., "cookbook cover", "social media post"). Helps tailor visual style and details

Response

Copy & paste β€” that's it
{
 "type": "resource",
 "resource": {
 "uri": "file:///path/to/generated/image.png",
 "name": "image-filename.png",
 "mimeType": "image/png"
 },
 "metadata": {
 "model": "gemini-3.1-flash-image-preview",
 "provider": "gemini",
 "processingTime": 5000,
 "timestamp": "2026-01-01T12:00:00.000Z"
 }
}

License

MIT License - see LICENSE for details.

Need help? Open an issue or check the troubleshooting section above.