speed-run

v1.0.0

Token-efficient code generation pipeline - parallel implementation with hosted LLM (Cerebras) for ~60% token savings. Includes MCP server.

codegenspeed-runparallelhosted-llmcerebrastoken-efficientturboracemcp
Install Command /plugin install speed-run@2389-research
View Source
01

Documentation

Full plugin documentation and usage guide

Speed-Run

Token-efficient code generation pipeline. Uses hosted LLM (Cerebras) for fast, cheap first-pass generation with Claude handling architecture and surgical fixes.

SkillDescriptionBest For
speed-run:turboDirect hosted codegenSingle task, algorithmic code, boilerplate
speed-run:showdownSame design, parallel runners competeMedium-high complexity, want best implementation
speed-run:any-percentDifferent approaches explored in parallelUnsure of architecture, want to compare designs

Installation

/plugin install speed-run@2389-research

Prerequisites

Speed-run requires a Cerebras API key for hosted code generation. Free tier includes ~1M tokens/day.

  1. Get a key at cloud.cerebras.ai
  2. Add to ~/.claude/settings.json:
{
  "env": {
    "CEREBRAS_API_KEY": "your-key-here"
  }
}
  1. Restart Claude Code

Flow

User: "speed-run" / "turbo build" / "fast build"
    ↓
Check: Cerebras API key
    ↓
┌─────────────────────────────────────────┐
│  Route Selection                        │
│                                         │
│  1. Turbo     - Direct codegen          │
│  2. Showdown  - Parallel competition    │
│  3. Any%      - Parallel exploration    │
└─────────────────────────────────────────┘

Quick Examples

Turbo (Direct Code Generation)

User: "Use speed-run to build a rate limiter"

Claude writes a contract prompt:
  - DATA CONTRACT (exact models, types)
  - API CONTRACT (exact routes, responses)
  - ALGORITHM (step-by-step logic)
  - RULES (framework, storage, error handling)

Cerebras generates code → written to disk (~0.5s)
Claude runs tests → surgical fixes if needed (1-4 lines)

The contract prompt pattern is like speccing a ticket for a junior dev — explicit inputs, outputs, types, and behavior. That specificity is what makes hosted LLMs reliable at 80-95% first-pass accuracy.

Showdown (Parallel Competition)

User: "Use showdown for the auth system"

Claude assesses complexity → spawns 3 runners
Each runner:
  1. Reads the shared design doc
  2. Creates their OWN implementation plan
  3. Generates code via Cerebras
  4. Runs tests, fixes failures

All runners dispatched in parallel.
Fresh-eyes review → judge scores all → winner selected.

Key insight: each runner creates their own plan from the design doc. No shared implementation plan means genuine variation emerges naturally.

Any% (Parallel Exploration)

User: "Not sure whether to use SQLite or Postgres, try both"

Claude generates 2-3 architectural approaches
Each variant:
  1. Gets its own worktree and branch
  2. Creates implementation plan for its approach
  3. Generates code via Cerebras
  4. Runs tests

Same scenario tests run against all variants.
Fresh-eyes review → judge scores all → winner selected.

When to use it

ScenarioSpeed-run?
Algorithmic code, data transformsYes, turbo
Boilerplate, scaffoldingYes, turbo
Comparing multiple implementationsYes, showdown
Exploring different architecturesYes, any-percent
Complex business logic that needs reasoningNo, use Claude directly
One-liner fixesNo, overkill

How It Compares to Test Kitchen

Speed-run mirrors test-kitchen's parallel patterns but shifts code generation to a hosted LLM:

Test KitchenSpeed-Run
Code generationClaude writes everythingCerebras generates, Claude fixes
Token costStandard~60-70% savings
Generation speed~10s per file~0.5s per file
First-pass quality~100%80-95%
External dependencyNoneCerebras API key
The most direct comparison: test-kitchen's cookoff vs speed-run's showdown — same concept (multiple agents implement the same design), different execution strategy.

Available Models

ModelSpeedNotes
gpt-oss-120b~3000 t/sDefault — best value, clean output
llama-3.3-70b~2100 t/sReliable fallback
qwen-3-32b~2600 t/sHas verbose <think> tags
llama3.1-8b~2200 t/sCheapest, may need more fixes

Dependencies

Speed-run orchestrates these skills (uses fallbacks if not installed):

  • superpowers:dispatching-parallel-agents
  • superpowers:using-git-worktrees
  • superpowers:writing-plans
  • superpowers:executing-plans
  • superpowers:test-driven-development
  • superpowers:verification-before-completion
  • fresh-eyes-review:skills
  • scenario-testing:skills
  • superpowers:finishing-a-development-branch

Documentation

Origin

Speed-run was born from test-kitchen's token cost problem. Running 3-5 parallel Claude agents generates a lot of expensive output tokens. By shifting first-pass code generation to Cerebras (~3000 tokens/second), we keep the same parallel exploration patterns at a fraction of the cost — Claude focuses on what it's best at: architecture, orchestration, and surgical fixes.

02

Quick Install

Get started in seconds

1
Add the marketplace (if not already added) /plugin marketplace add 2389-research/claude-plugins
2
Install this plugin /plugin install speed-run
3
You're good to go Skills auto-trigger when relevant
Back to Marketplace