Name: Media Gen
Author: strato-space

media-gen-mcp

Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resource_link vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client.

Design principle: spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients.

Generate images from text prompts using OpenAI's gpt-image-1.5 model (with gpt-image-1 compatibility and DALL·E support planned in future versions).
Edit images (inpainting, outpainting, compositing) from 1 up to 16 images at once, with advanced prompt control.
Generate videos via OpenAI Videos (sora-2, sora-2-pro) with job create/remix/list/retrieve/delete and asset downloads.
Generate videos via Google GenAI (Veo) with operation polling and file-first downloads.
Fetch & compress images from HTTP(S) URLs or local file paths with smart size/quality optimization.
Fetch documents from HTTP(S) URLs or local file paths and return resource_link/resource outputs.
Debug MCP output shapes with a test-images tool that mirrors production result placement (content, structuredContent, toplevel).
Integrates with: fast-agent, Windsurf, Claude Desktop, Cursor, VS Code, and any MCP-compatible client.

✨ Features

Strict MCP spec support
Tool outputs are first-class CallToolResult objects from the latest MCP schema, including: content items (text, image, resource_link, resource), optional structuredContent, optional top-level files, and the isError flag for failures.
Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)
- openai-images-generate mirrors the OpenAI Images create API for gpt-image-1.5 (and gpt-image-1) (background, moderation, size, quality, output_format, output_compression, n, user, etc.).
- openai-images-edit mirrors the OpenAI Images createEdit API for gpt-image-1.5 (and gpt-image-1) (image, mask, n, quality, size, user).
OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)
- openai-videos-create mirrors videos/create and can optionally wait for completion.
- openai-videos-remix mirrors videos/remix.
- openai-videos-list mirrors videos/list.
- openai-videos-retrieve mirrors videos/retrieve.
- openai-videos-delete mirrors videos/delete.
- openai-videos-retrieve-content mirrors videos/content and downloads video / thumbnail / spritesheet assets to disk, returning MCP resource_link (default) or embedded resource blocks (via tool_result).
Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)
- google-videos-generate starts a long-running operation (ai.models.generateVideos) and can optionally wait for completion and download .mp4 outputs. Veo model reference
- google-videos-retrieve-operation polls an existing operation.
- google-videos-retrieve-content downloads an .mp4 from a completed operation, returning MCP resource_link (default) or embedded resource blocks (via tool_result).
Fetch and process images from URLs or files
fetch-images tool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images.
Fetch videos from URLs or files
fetch-videos tool lists local videos or downloads remote video URLs to disk and returns MCP resource_link (default) or embedded resource blocks (via tool_result).
Fetch documents from URLs or files
fetch-document tool downloads remote files or reuses local paths and returns MCP resource_link (default) or embedded resource blocks (via tool_result).
Mix and edit up to 16 images
openai-images-edit accepts image as a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (gpt-image-1.5, gpt-image-1) image edits.
Smart image compression
Built-in compression using sharp — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality.
Resource-aware file output with resource_link
- Automatic switch from inline base64 to file when the total response size exceeds a safe threshold.
- Outputs are written to disk using output_<time_t>_media-gen__<tool>_<id>.<ext> filenames (images/documents use a generated UUID; videos use the OpenAI video_id) and exposed to MCP clients via content[] depending on tool_result (resource_link/image for images, resource_link/resource for video/document downloads).
Built-in test-images tool for MCP client debugging
test-images reads sample images from a configured directory and returns them using the same result-building logic as production tools. Use tool_result and response_format parameters to test how different MCP clients handle content[] and structuredContent.
Structured MCP error handling
All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors with isError: true and content: [{ type: "text", text: <error message> }], making failures easy to parse and surface in MCP clients.

🚀 Installation

git clone https://github.com/strato-space/media-gen-mcp.git
cd media-gen-mcp

npm install
npm run build

Build modes:

npm run build – strict TypeScript build with all strict flags enabled, including skipLibCheck: false. Incremental builds via .tsbuildinfo (~2-3s on warm cache).
npm run esbuild – fast bundling via esbuild (no type checking, useful for rapid iteration).

Development mode (no build required)

For development or when TypeScript compilation fails due to memory constraints:

npm run dev  # Uses tsx to run TypeScript directly

Quality checks

npm run lint        # ESLint with typescript-eslint
npm run typecheck   # Strict tsc --noEmit
npm run test        # Unit tests (vitest)
npm run test:watch  # Watch mode for TDD
npm run ci          # lint + typecheck + test

Unit tests

The project uses vitest for unit testing. Tests are located in test/.

Covered modules:

Module	Tests	Description
`compression`	12	Image format detection, buffer processing, file I/O
`helpers`	31	URL/path validation, output resolution, result placement, resource links
`env`	19	Configuration parsing, env validation, defaults
`logger`	10	Structured logging + truncation safety
`pricing`	5	Sora pricing estimate helpers
`schemas`	69	Zod schema validation for all tools, type inference
`fetch-images` (integration)	3	End-to-end MCP tool call behavior
`fetch-videos` (integration)	3	End-to-end MCP tool call behavior

Test categories:

compression — isCompressionAvailable, detectImageFormat, processBufferWithCompression, readAndProcessImage
helpers — isHttpUrl, isAbsolutePath, isBase64Image, ensureDirectoryWritable, resolveOutputPath, getResultPlacement, buildResourceLinks
env — config loading and validation for MEDIA_GEN_* / MEDIA_GEN_MCP_* settings
logger — truncation and error formatting behavior
schemas — validation for openai-images-*, openai-videos-*, fetch-images, fetch-videos, test-images inputs, boundary testing (prompt length, image count limits, path validation)

npm run test
# ✓ test/compression.test.ts (12 tests)
# ✓ test/helpers.test.ts (31 tests)
# ✓ test/env.test.ts (19 tests)
# ✓ test/logger.test.ts (10 tests)
# ✓ test/pricing.test.ts (5 tests)
# ✓ test/schemas.test.ts (69 tests)
# ✓ test/fetch-images.integration.test.ts (3 tests)
# ✓ test/fetch-videos.integration.test.ts (3 tests)
# Tests: 152 passed

Run directly via npx (no local clone)

You can also run the server straight from a remote repo using npx:

npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env

The --env-file argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain OPENAI_API_KEY, optional Azure variables, and any MEDIA_GEN_MCP_* settings.

`secrets.yaml` (optional)

You can keep API keys (and optional Google Vertex AI settings) in a secrets.yaml file (compatible with the fast-agent secrets template):

openai:
  api_key: <your-api-key-here>
anthropic:
  api_key: <your-api-key-here>
google:
  api_key: <your-api-key-here>
  vertex_ai:
    enabled: true
    project_id: your-gcp-project-id
    location: europe-west4

media-gen-mcp loads `secr

…

Media Gen

Installation

Configuration

How to use

README

media-gen-mcp

✨ Features

🚀 Installation

Development mode (no build required)

Quality checks

Unit tests

Run directly via npx (no local clone)

`secrets.yaml` (optional)

You might also like

Media Gen

Installation

Configuration

How to use

README

media-gen-mcp

✨ Features

🚀 Installation

Development mode (no build required)

Quality checks

Unit tests

Run directly via npx (no local clone)

secrets.yaml (optional)

You might also like

`secrets.yaml` (optional)