Back to Plugins

Speed Run

Token-efficient code generation pipeline - parallel implementation with hosted LLM (Cerebras) for ~60% token savings. Includes MCP server.

llm
By 2389-research
1Updated 2 months agoJavaScriptMIT

Installation

/plugin marketplace add 2389-research/claude-plugins && /plugin install speed-run@2389-research

How to install

  1. Open Claude Code in your terminal
  2. Run the installation command above
  3. The plugin will be enabled automatically
  4. Use the plugin's features in your Claude Code sessions

Speed-Run

Token-efficient code generation pipeline. Uses hosted LLM (Cerebras) for fast, cheap first-pass generation with Claude handling architecture and surgical fixes.

SkillDescriptionBest For
speed-run:turboDirect hosted codegenSingle task, algorithmic code, boilerplate
speed-run:showdownSame design, parallel runners competeMedium-high complexity, want best implementation
speed-run:any-percentDifferent approaches explored in parallelUnsure of architecture, want to compare designs

Installation

/plugin marketplace add 2389-research/claude-plugins
/plugin install speed-run@2389-research

Prerequisites

Speed-run requires a Cerebras API key for hosted code generation. Free tier includes ~1M tokens/day.

  1. Get a key at cloud.cerebras.ai
  2. Add to ~/.claude/settings.json:
{
  "env": {
    "CEREBRAS_API_KEY": "your-key-here"
  }
}
  1. Restart Claude Code

Flow

User: "speed-run" / "turbo build" / "fast build"
    ↓
Check: Cerebras API key
    ↓
┌─────────────────────────────────────────┐
│  Route Selection                        │
│                                         │
│  1. Turbo     - Direct codegen          │
│  2. Showdown  - Parallel competition    │
│  3. Any%      - Parallel exploration    │
└─────────────────────────────────────────┘

Quick Examples

Turbo (Direct Code Generation)

User: "Use speed-run to build a rate limiter"

Claude writes a contract prompt:
  - DATA CONTRACT (exact models, types)
  - API CONTRACT (exact routes, responses)
  - ALGORITHM (step-by-step logic)
  - RULES (framework, storage, error handling)

Cerebras generates code → written to disk (~0.5s)
Claude runs tests → surgical fixes if needed (1-4 lines)

The contract prompt pattern is like speccing a ticket for a junior dev — explicit inputs, outputs, types, and behavior. That specificity is what makes hosted LLMs reliable at 80-95% first-pass accuracy.

Showdown (Parallel Competition)

User: "Use showdown for the auth system"

Claude assesses complexity → spawns 3 runners
Each runner:
  1. Reads the shared design doc
  2. Creates their OWN implementation plan
  3. Generates code via Cerebras
  4. Runs tests, fixes failures

All runners dispatched in parallel.
Fresh-eyes review → judge scores all → winner selected.

Key insight: each runner creates their own plan from the design doc. No shared implementation plan means genuine variation emerges naturally.

Any% (Parallel Exploration)

User: "Not sure whether to use SQLite or Postgres, try both"

Claude generates 2-3 architectural approaches
Each variant:
  1. Gets its own worktree and branch
  2. Creates implementation plan for its approach
  3. Generates code via Cerebras
  4. Runs tests

Same scenario tests run against all variants.
Fresh-eyes review → judge scores all → winner selected.

When to use it

ScenarioSpeed-run?
Algorithmic code, data transformsYes, turbo
Boilerplate, scaffoldingYes, turbo
Comparing multiple implementationsYes, showdown
Exploring different architecturesYes, any-percent
Complex business logic that needs reasoningNo, use Claude directly
One-liner fixesNo, overkill

How It Compares to Test Kitchen

Speed-run mirrors test-kitchen's parallel patterns but shifts code generation to a hosted LLM:

Test KitchenSpeed-Run
Code generationClaude writes everythingCerebras generates, Claude fixes
Token costStandard~60-70% savings
Generation speed~10s per file~0.5s per file
First-pass quality~100%80-95%
External dependencyNoneCerebras API key

The most direct comparison: test-kitchen's cookoff vs speed-run's showdown — same concept (multiple agents implement the same design), different execution strategy.

Available Models

ModelSpeedNotes
gpt-oss-120b~3000 t/sDefault — best value, clean output
llama-3.3-70b~2100 t/sReliable fallback
qwen-3-32b~2600 t/sHas verbose <think> tags
llama3.1-8b~2200 t/sCheapest, may need more fixes

Dependencies

Speed-run orchestrates these skills (uses fallbacks if not installed):

  • superpowers:dispatching-parallel-agents
  • superpowers:using-git-worktrees
  • superpowers:writing-plans
  • superpowers:executing-plans
  • superpowers:test-driven-development
  • superpowers:verification-before-completion
  • fresh-eyes-review:skills
  • scenario-testing:skills
  • superpowers:finishing-a-development-branch

Documentation

Origin

Speed-run was born from test-kitchen's token cost problem. Running 3-5 parallel Claude agents generates a lot of expensive output tokens. By shifting first-pass code generation to Cerebras (~3000 tokens/second), we keep the same parallel exploration patterns at a fraction of the cost — Claude focuses on what it's best at: architecture, orchestration, and surgical fixes.


If Speed Run saved you tokens and time, a ⭐ helps us know it's landing.

Built by 2389 · Part of the Claude Code plugin marketplace

View source on GitHub