AI Vision MCP Server
A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.
Features
- Dual Provider Support: Choose between Google Gemini API and Vertex AI
- Multimodal Analysis: Support for both image and video content analysis
- Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
- Storage Integration: Built-in Google Cloud Storage support
- Comprehensive Validation: Zod-based data validation throughout
- Error Handling: Robust error handling with retry logic and circuit breakers
- TypeScript: Full TypeScript support with strict type checking
Quick Start
Pre-requisites
You could choose either to use google provider or vertex_ai provider. For simplicity, google provider is recommended.
Below are the environment variables you need to set based on your selected provider. (Note: Itβs recommended to set the timeout configuration to more than 5 minutes for your MCP client).
(i) Using Google AI Studio Provider
export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"Get your Google AI Studio's api key here
(ii) Using Vertex AI Provider
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"Refer to the guideline here on how to set this up.
Installation
Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.
<details> <summary>Claude Desktop</summary>Add to your Claude Desktop configuration:
(i) Using Google AI Studio Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"VIDEO_PROVIDER": "google",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}(ii) Using Vertex AI Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "vertex_ai",
"VIDEO_PROVIDER": "vertex_ai",
"VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",
"VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"VERTEX_PROJECT_ID": "your-gcp-project-id",
"GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
}
}
}
}(i) Using Google AI Studio Provider
claude mcp add ai-vision-mcp \
-e IMAGE_PROVIDER=google \
-e VIDEO_PROVIDER=google \
-e GEMINI_API_KEY=your-gemini-api-key \
-- npx ai-vision-mcp(ii) Using Vertex AI Provider
claude mcp add ai-vision-mcp \
-e IMAGE_PROVIDER=vertex_ai \
-e VIDEO_PROVIDER=vertex_ai \
-e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \
-e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" \
-e VERTEX_PROJECT_ID=your-gcp-project-id \
-e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \
-- npx ai-vision-mcpNote: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating ~\.claude\settings.json as follows:
{
"env": {
"MCP_TIMEOUT": "60000",
"MCP_TOOL_TIMEOUT": "300000"
}
}Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server
Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See Cursor MCP docs for more info.
(i) Using Google AI Studio Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"VIDEO_PROVIDER": "google",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}(ii) Using Vertex AI Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "vertex_ai",
"VIDEO_PROVIDER": "vertex_ai",
"VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",
"VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"VERTEX_PROJECT_ID": "your-gcp-project-id",
"GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
}
}
}
}Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:
- Open Cline and click on the MCP Servers icon in the top navigation bar.
- Select the Installed tab, then click Advanced MCP Settings.
- In the cline_mcp_settings.json file, add the following configuration:
(i) Using Google AI Studio Provider
{
"mcpServers": {
"timeout": 300,
"type": "stdio",
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"VIDEO_PROVIDER": "google",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}(ii) Using Vertex AI Provider
{
"mcpServers": {
"ai-vision-mcp": {
"timeout": 300,
"type": "stdio",
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "vertex_ai",
"VIDEO_PROVIDER": "vertex_ai",
"VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com",
"VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"VERTEX_PROJECT_ID": "your-gcp-project-id",
"GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
}
}
}
}The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:
npx ai-vision-mcpMCP Tools
The server provides four main MCP tools:
1) analyze_image
Analyzes an image using AI and returns a detailed description.
Parameters:
imageSource(string): URL, base64 data, or file path to the imageprompt(string): Question or instruction for the AImode(string, optional): Analysis mode - one of:general(default) - General image analysispalette- Extract design tokens (colors, spacing, typography)hierarchy- Analyze visual hierarchy and eye flowcomponents- Catalog UI components and design system maturity
options(object, optional): Analysis options including temperature and max tokens
Examples:
- General image analysis:
{
"imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
"prompt": "What is this image about? Describe what you see in detail."
}- Extract design tokens:
{
"imageSource": "https://example.com/design.png",
"prompt": "Extract all design tokens from this screenshot",
"mode": "palette"
}- Analyze visual hierarchy:
{
"imageSource": "C:\\Users\\username\\Downloads\\ui_mockup.png",
"prompt": "Analyze the visual hierarchy and eye flow",
"mode": "hierarchy"
}- Component inventory:
{
"imageSource": "https://example.com/design-system.png",
"prompt": "List all UI components and evaluate design system maturity",
"mode": "components"
}2) compare_images
Compares multiple images using AI and returns a detailed comparison analysis.
Parameters:
imageSources(array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 imagesprompt(string): Question or instruction for comparing the imagesoptions(object, optional): Analysis options including temperature and max tokens
Examples:
- Compare images from URLs:
{
"imageSources": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"prompt": "Compare these two images and tell me the differences"
}- Compare mixed sources:
{
"imageSources": [
"https://example.com/image1.jpg",
"C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg",
"data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."
],
"prompt": "Which image has the best lighting quality?"
}3) detect_objects_in_image
Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.
Parameters:
imageSource(string): URL, base64 data, or file path to the imageprompt(string): Custom detection prompt describing what to detect or recognize in the imageoutputFilePath(string, optional): Explicit output path for the annotated image
Configuration:
This function uses optimized default parameters for object detection and does not accept runtime options parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:
# Recommended environment variable settings for object detection (these are now the defaults)
TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic responses
TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95 # Nucleus sampling
TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30 # Vocabulary selection
MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit for JSONFile Handling Logic:
- Explicit outputFilePath provided β Saves to the exact path specified
- If not explicit outputFilePath β Automatically saves to temporary directory
Response Types:
- Returns
fileobject when explicit outputFilePath is provided - Returns
tempFileobject when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder - Always includes
detectionsarray with detected objects and coordinates - Includes
summarywith percentage-based coordinates for browser automation
Examples:
- Basic object detection:
{
"imageSource": "https://example.com/image.jpg",
"prompt": "Detect all objects in this image"
}- Save annotated image to specific path:
{
"imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
"outputFilePath": "C:\\Users\\username\\Documents\\annotated_image.png"
}- Custom detection prompt:
{
"imageSource": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...",
"prompt": "Detect and label all electronic devices in this image"
}4) audit_design
Audits UI/UX design compliance with pixel-level analysis and AI critique.
This tool provides automated design compliance auditing using pure TypeScript/JavaScript pixel analysis combined with Gemini Vision API critique. It extracts dominant colors, detects visual complexity, validates WCAG contrast ratios, and generates actionable design recommendations.
Inspired by: Automating UX/UI Design Analysis with Python, Machine Learning, and LLMs by Jade Graham
Parameters:
β¦