media-gen-mcp
<p align="center"> <a href="https://www.npmjs.com/package/media-gen-mcp"><img src="https://img.shields.io/npm/v/media-gen-mcp?label=media-gen-mcp&color=brightgreen" alt="media-gen-mcp"></a> <a href="https://www.npmjs.com/package/@modelcontextprotocol/sdk"><img src="https://img.shields.io/npm/v/@modelcontextprotocol/sdk?label=MCP%20SDK&color=blue" alt="MCP SDK"></a> <a href="https://www.npmjs.com/package/openai"><img src="https://img.shields.io/npm/v/openai?label=OpenAI%20SDK&color=blueviolet" alt="OpenAI SDK"></a> <a href="https://github.com/punkpeye/mcp-proxy"><img src="https://img.shields.io/github/stars/punkpeye/mcp-proxy?label=mcp-proxy&style=social" alt="mcp-proxy"></a> <a href="https://github.com/yjacquin/fast-mcp"><img src="https://img.shields.io/github/stars/yjacquin/fast-mcp?label=fast-mcp&style=social" alt="fast-mcp"></a> <a href="https://github.com/strato-space/media-gen-mcp/blob/main/LICENSE"><img src="https://img.shields.io/github/license/strato-space/media-gen-mcp?color=brightgreen" alt="License"></a> <a href="https://github.com/strato-space/media-gen-mcp/stargazers"><img src="https://img.shields.io/github/stars/strato-space/media-gen-mcp?style=social" alt="GitHub stars"></a> <a href="https://github.com/strato-space/media-gen-mcp/actions"><img src="https://img.shields.io/github/actions/workflow/status/strato-space/media-gen-mcp/main.yml?label=build&logo=github" alt="Build Status"></a> </p>Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resource_link vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client.
Design principle: spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients.
- Generate images from text prompts using OpenAI's
gpt-image-1.5model (withgpt-image-1compatibility and DALL·E support planned in future versions). - Edit images (inpainting, outpainting, compositing) from 1 up to 16 images at once, with advanced prompt control.
- Generate videos via OpenAI Videos (
sora-2,sora-2-pro) with job create/remix/list/retrieve/delete and asset downloads. - Generate videos via Google GenAI (Veo) with operation polling and file-first downloads.
- Fetch & compress images from HTTP(S) URLs or local file paths with smart size/quality optimization.
- Fetch documents from HTTP(S) URLs or local file paths and return
resource_link/resourceoutputs. - Debug MCP output shapes with a
test-imagestool that mirrors production result placement (content,structuredContent,toplevel). - Integrates with: fast-agent, Windsurf, Claude Desktop, Cursor, VS Code, and any MCP-compatible client.
✨ Features
-
Strict MCP spec support
Tool outputs are first-classCallToolResultobjects from the latest MCP schema, including:contentitems (text,image,resource_link,resource), optionalstructuredContent, optional top-levelfiles, and theisErrorflag for failures. -
Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)
openai-images-generatemirrors the OpenAI ImagescreateAPI forgpt-image-1.5(andgpt-image-1) (background, moderation, size, quality, output_format, output_compression,n,user, etc.).openai-images-editmirrors the OpenAI ImagescreateEditAPI forgpt-image-1.5(andgpt-image-1) (image, mask,n, quality, size,user).
-
OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)
openai-videos-createmirrorsvideos/createand can optionally wait for completion.openai-videos-remixmirrorsvideos/remix.openai-videos-listmirrorsvideos/list.openai-videos-retrievemirrorsvideos/retrieve.openai-videos-deletemirrorsvideos/delete.openai-videos-retrieve-contentmirrorsvideos/contentand downloadsvideo/thumbnail/spritesheetassets to disk, returning MCPresource_link(default) or embeddedresourceblocks (viatool_result).
-
Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)
google-videos-generatestarts a long-running operation (ai.models.generateVideos) and can optionally wait for completion and download.mp4outputs. Veo model referencegoogle-videos-retrieve-operationpolls an existing operation.google-videos-retrieve-contentdownloads an.mp4from a completed operation, returning MCPresource_link(default) or embeddedresourceblocks (viatool_result).
-
Fetch and process images from URLs or files
fetch-imagestool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images. -
Fetch videos from URLs or files
fetch-videostool lists local videos or downloads remote video URLs to disk and returns MCPresource_link(default) or embeddedresourceblocks (viatool_result). -
Fetch documents from URLs or files
fetch-documenttool downloads remote files or reuses local paths and returns MCPresource_link(default) or embeddedresourceblocks (viatool_result). -
Mix and edit up to 16 images
openai-images-editacceptsimageas a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (gpt-image-1.5,gpt-image-1) image edits. -
Smart image compression
Built-in compression using sharp — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality. -
Resource-aware file output with
resource_link- Automatic switch from inline base64 to
filewhen the total response size exceeds a safe threshold. - Outputs are written to disk using
output_<time_t>_media-gen__<tool>_<id>.<ext>filenames (images/documents use a generated UUID; videos use the OpenAIvideo_id) and exposed to MCP clients viacontent[]depending ontool_result(resource_link/imagefor images,resource_link/resourcefor video/document downloads).
- Automatic switch from inline base64 to
-
Built-in test-images tool for MCP client debugging
test-imagesreads sample images from a configured directory and returns them using the same result-building logic as production tools. Usetool_resultandresponse_formatparameters to test how different MCP clients handlecontent[]andstructuredContent. -
Structured MCP error handling
All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors withisError: trueandcontent: [{ type: "text", text: <error message> }], making failures easy to parse and surface in MCP clients.
🚀 Installation
git clone https://github.com/strato-space/media-gen-mcp.git
cd media-gen-mcp
npm install
npm run buildBuild modes:
npm run build– strict TypeScript build with all strict flags enabled, includingskipLibCheck: false. Incremental builds via.tsbuildinfo(~2-3s on warm cache).npm run esbuild– fast bundling via esbuild (no type checking, useful for rapid iteration).
Development mode (no build required)
For development or when TypeScript compilation fails due to memory constraints:
npm run dev # Uses tsx to run TypeScript directlyQuality checks
npm run lint # ESLint with typescript-eslint
npm run typecheck # Strict tsc --noEmit
npm run test # Unit tests (vitest)
npm run test:watch # Watch mode for TDD
npm run ci # lint + typecheck + testUnit tests
The project uses vitest for unit testing. Tests are located in test/.
Covered modules:
| Module | Tests | Description |
|---|---|---|
compression | 12 | Image format detection, buffer processing, file I/O |
helpers | 31 | URL/path validation, output resolution, result placement, resource links |
env | 19 | Configuration parsing, env validation, defaults |
logger | 10 | Structured logging + truncation safety |
pricing | 5 | Sora pricing estimate helpers |
schemas | 69 | Zod schema validation for all tools, type inference |
fetch-images (integration) | 3 | End-to-end MCP tool call behavior |
fetch-videos (integration) | 3 | End-to-end MCP tool call behavior |
Test categories:
- compression —
isCompressionAvailable,detectImageFormat,processBufferWithCompression,readAndProcessImage - helpers —
isHttpUrl,isAbsolutePath,isBase64Image,ensureDirectoryWritable,resolveOutputPath,getResultPlacement,buildResourceLinks - env — config loading and validation for
MEDIA_GEN_*/MEDIA_GEN_MCP_*settings - logger — truncation and error formatting behavior
- schemas — validation for
openai-images-*,openai-videos-*,fetch-images,fetch-videos,test-imagesinputs, boundary testing (prompt length, image count limits, path validation)
npm run test
# ✓ test/compression.test.ts (12 tests)
# ✓ test/helpers.test.ts (31 tests)
# ✓ test/env.test.ts (19 tests)
# ✓ test/logger.test.ts (10 tests)
# ✓ test/pricing.test.ts (5 tests)
# ✓ test/schemas.test.ts (69 tests)
# ✓ test/fetch-images.integration.test.ts (3 tests)
# ✓ test/fetch-videos.integration.test.ts (3 tests)
# Tests: 152 passedRun directly via npx (no local clone)
You can also run the server straight from a remote repo using npx:
npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.envThe --env-file argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain OPENAI_API_KEY, optional Azure variables, and any MEDIA_GEN_MCP_* settings.
secrets.yaml (optional)
You can keep API keys (and optional Google Vertex AI settings) in a secrets.yaml file (compatible with the fast-agent secrets template):
openai:
api_key: <your-api-key-here>
anthropic:
api_key: <your-api-key-here>
google:
api_key: <your-api-key-here>
vertex_ai:
enabled: true
project_id: your-gcp-project-id
location: europe-west4media-gen-mcp loads `secr
…