Skip to content

G0d2i11a/demosmith-mcp

Repository files navigation

demosmith-mcp

An MCP (Model Context Protocol) server for automated demo recording with video, documentation, and screenshot generation. Perfect for creating product demos, tutorials, and documentation with AI agents.

Demo

GitHub Login Demo

Demo: GitHub login flow with animated cursor, click effects, and auto-generated documentation

Features

  • Video Recording - Automatic screen recording of browser sessions
  • Screenshot Capture - Automatic screenshots at each step
  • Animated Cursor - Smooth cursor animations with click effects and sounds
  • TTS Narration - AI-powered voiceover with multiple providers (OpenAI, ElevenLabs, Azure, Edge)
  • Multiple Output Formats:
    • Video (WebM)
    • Video with Audio (MP4)
    • Playwright Trace (interactive replay)
    • Markdown Guide
    • JSON Steps
    • Narration Script + JSON (with timestamps)
    • Subtitles (SRT/VTT)
    • Interactive HTML Tutorial
    • GIF Preview
  • Multi-language Support - English and Chinese
  • Multi-tab Support - Work with multiple browser tabs
  • Flexible Element Selection - By ref, text, label, placeholder, CSS, XPath

Installation

npm install demosmith-mcp
npx playwright install chromium

Usage

As MCP Server

Add to your Claude Code MCP configuration (~/.claude/mcp.json):

{
  "mcpServers": {
    "demosmith": {
      "command": "npx",
      "args": ["demosmith-mcp"]
    }
  }
}

CLI Mode

# Replay a recorded demo
demosmith replay ./steps.json -o ./output --video

# Generate documentation from steps
demosmith generate ./steps.json -l zh -o ./docs

# Serve generated files locally
demosmith serve ./output

MCP Tools

Session Management

Tool Description
demosmith_start Start a new demo recording session
demosmith_end End session and generate all deliverables
demosmith_status Get current session status

Navigation & Discovery

Tool Description
demosmith_navigate Navigate to a URL
demosmith_snapshot Get accessibility tree snapshot for element refs

Core Actions

Tool Description
demosmith_click Click an element (with animated cursor)
demosmith_fill Fill a text input (with typing animation)
demosmith_select Select from dropdown
demosmith_press_key Press keyboard key or combination
demosmith_hover Hover over element (for tooltips/menus)
demosmith_drag Drag and drop
demosmith_upload Upload file

Page Actions

Tool Description
demosmith_scroll Scroll page or element
demosmith_wait Wait for condition
demosmith_screenshot Take manual screenshot

Verification

Tool Description
demosmith_assert Verify conditions (text, visibility, URL, etc.)

Tab Management

Tool Description
demosmith_new_tab Open new browser tab
demosmith_switch_tab Switch to different tab
demosmith_close_tab Close a tab
demosmith_list_tabs List all open tabs

Element Selectors

demosmith supports multiple ways to locate elements:

# By ref (from snapshot)
"1", "2", "3"

# By visible text
"text:Submit"
"text:/Submit|Cancel/"  # regex

# By label
"label:Email"

# By placeholder
"placeholder:Enter your name"

# By role and name
"role:button:Submit"
"role:textbox"

# By test ID
"testid:submit-btn"

# By CSS selector
"css:.btn-primary"

# By XPath
"xpath://button[@type='submit']"

# By alt text
"alt:Logo"

# By title
"title:Close"

Example Workflow

1. demosmith_start(url="https://example.com/login", title="Login Demo")
2. demosmith_snapshot()  → Get element refs
3. demosmith_fill(ref="label:Email", value="user@example.com", description="Enter email")
4. demosmith_fill(ref="label:Password", value="password123", description="Enter password")
5. demosmith_click(ref="text:Sign In", description="Click sign in button")
6. demosmith_assert(type="url", expected="/dashboard", description="Verify redirect")
7. demosmith_end()  → Returns all deliverables

Output Files

After ending a session, the following files are generated:

output/
├── demo.webm              # Screen recording video
├── demo-with-audio.mp4    # Video with TTS narration (if TTS enabled)
├── demo.gif               # Animated GIF preview
├── trace.zip              # Playwright trace (interactive replay)
├── guide.md               # Markdown documentation
├── steps.json             # Structured step data
├── narration.txt          # Voiceover script
├── narration.json         # Timed narration for TTS APIs
├── narration.mp3          # Generated audio (if TTS enabled)
├── subtitles.srt          # SRT subtitles
├── subtitles.vtt          # VTT subtitles
├── tutorial.html          # Interactive HTML tutorial
├── animated-preview.html  # HTML preview (fallback)
└── assets/
    ├── step-001.png
    ├── step-002.png
    └── ...

See examples/github-login-demo/ for a complete example output.

Configuration Options

Start Session Options

Option Type Default Description
title string required Demo title
startUrl string required Starting URL
outputDir string temp dir Output directory
video boolean true Record video
trace boolean true Record Playwright trace
screenshotOnStep boolean true Auto-screenshot each step
headless boolean false Run browser headless
viewport object 1280x720 Browser viewport size
storageState string - Path to login state file

Animation Options

Click and fill actions support animation options:

Option Type Default Description
animated boolean true Enable cursor animation
moveDuration number 500 Cursor movement duration (ms)
typeDelay number 50 Delay between keystrokes (ms)

Assert Types

The demosmith_assert tool supports these verification types:

Type Description
text Check element text content
visible Check element is visible
hidden Check element is hidden
url Check current URL
title Check page title
value Check input value
checked Check checkbox is checked
enabled Check element is enabled
disabled Check element is disabled
count Check number of matching elements

Multi-language Support

Generated content supports English and Chinese. Set via CLI:

demosmith generate ./steps.json -l zh  # Chinese
demosmith generate ./steps.json -l en  # English (default)

Custom Templates

You can provide custom templates for output generation using Mustache-like syntax:

# {{session.title}}

{{#each steps}}
## Step {{this.id}}: {{this.description}}

{{#if this.screenshotRelative}}
![Screenshot]({{this.screenshotRelative}})
{{/if}}
{{/each}}

Login Session Support

Save a login session with Playwright:

await context.storageState({ path: 'auth.json' });

Use in demo:

demosmith_start(url="...", title="...", storageState="auth.json")

TTS Narration

Generate AI voiceover for your demos by passing TTS options to demosmith_end:

demosmith_end(tts={
  provider: "openai",
  apiKey: "sk-...",
  voice: "alloy"
})

Supported TTS Providers

Provider API Key Required Voices Notes
openai Yes alloy, echo, fable, onyx, nova, shimmer Best quality
elevenlabs Yes Various voice IDs Most natural
azure Yes en-US-JennyNeural, etc. SSML support
edge No en-US-AriaNeural, etc. Free, requires edge-tts CLI

TTS Options

Option Type Description
provider string TTS provider (openai, elevenlabs, azure, edge)
apiKey string API key (not needed for edge)
voice string Voice ID or name
language string Language code (e.g., en-US, zh-CN)
speed number Speech speed multiplier

Environment Variables

For Azure TTS, set the region:

export AZURE_SPEECH_REGION=eastus

Narration JSON Format

The generated narration.json contains timed segments for custom TTS integration:

{
  "title": "Login Demo",
  "totalDurationMs": 15000,
  "segments": [
    {
      "stepId": 1,
      "startMs": 2000,
      "endMs": 4500,
      "durationMs": 2500,
      "text": "Click the login button"
    }
  ]
}

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run MCP server
pnpm start

# Run CLI
pnpm cli help

License

MIT

About

MCP server for AI agents to record browser demos with animated cursor, auto-generated video, screenshots, and documentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors