Performance Testing Philosophy

This document describes the methodology and philosophy behind performance testing across all packages in the dx-toolkit monorepo.

Overview

Processing lag tests measure the overhead introduced by our abstraction layers compared to raw You.com API calls. The goal is to quantify what processing lag our code adds while wrapping APIs.

Important: We cannot improve the You.com API performance itself. These tests measure our code's overhead, not the underlying API speed.

Core Metrics

Processing Lag (Absolute Time)

Definition: The absolute time difference between raw API calls and our abstraction layer.

Measured in: milliseconds (ms)
Formula: Abstraction time - Raw API time
Example: If raw API takes 500ms and our wrapper takes 535ms, processing lag is 35ms
Interpretation:
- Negative values mean our code is faster (caching, optimizations)
- Positive values mean our code adds overhead (validation, transformation)

Overhead Percentage (Relative Time)

Definition: The relative overhead as a percentage of raw API time.

Measured in: percentage (%)
Formula: (Processing lag / Raw API time) × 100
Example: 35ms lag on 500ms raw API = 7% overhead
Interpretation:
- Fast APIs (50-100ms) will show high % overhead due to fixed costs
- Slow APIs (500ms+) will show low % overhead
- Absolute lag (ms) is often more meaningful than relative %

Memory Overhead (Heap Growth)

Definition: The heap memory growth from our abstraction layer.

Measured in: kilobytes (KB)
Method: Compare heap size before and after operations with forced GC
Includes: Memory for data transformation, validation, schemas, buffers
Interpretation: Sustained memory usage, not temporary allocations

Methodology

1. Warmup Phase

Purpose: Eliminate cold start effects before measurements.

Actions:

Run each operation once before measuring
Ensures JIT compilation is complete
Loads all modules into memory
Initializes connection pools

Why: First-run operations are 2-10x slower due to:

JIT compilation
Module loading
DNS resolution
TLS handshake establishment

2. Measurement Phase

Purpose: Collect statistically reliable timing data.

Process:

for (let i = 0; i < iterations; i++) {
  // Raw API call (baseline)
  const rawStart = performance.now();
  await rawApiCall();
  const rawTime = performance.now() - rawStart;

  // Abstraction layer call (with overhead)
  const wrapperStart = performance.now();
  await wrapperCall();
  const wrapperTime = performance.now() - wrapperStart;
}

// Calculate averages
const avgRaw = average(rawTimes);
const avgWrapper = average(wrapperTimes);
const processingLag = avgWrapper - avgRaw;

Iterations:

Fast operations (< 200ms): 10 iterations
Slow operations (> 500ms): 5 iterations
Reason: More iterations smooth out network variability

Timing Tool: performance.now() provides microsecond precision

3. Memory Measurement

Purpose: Track sustained heap growth from abstraction overhead.

Process:

// Force GC and wait for stabilization
Bun.gc(true);
await new Promise(resolve => setTimeout(resolve, 100));

const heapBefore = heapStats().heapSize;

// Run operations
for (let i = 0; i < iterations; i++) {
  await operation();
}

// Force GC again
Bun.gc(true);
await new Promise(resolve => setTimeout(resolve, 100));

const heapAfter = heapStats().heapSize;
const heapGrowth = heapAfter - heapBefore;

Why Force GC: Ensures we measure sustained memory, not temporary allocations.

Interpreting Results

Good Results

Processing lag: 15ms
Overhead: 3%
Memory: 50KB

✅ Minimal overhead, efficient implementation

Acceptable Results

Processing lag: 80ms
Overhead: 40%
Memory: 350KB

✅ Within thresholds, acceptable for complex abstractions

Warning Signs

Processing lag: 150ms
Overhead: 75%
Memory: 500KB

⚠️ Investigate potential optimizations

Critical Issues

Processing lag: 500ms
Overhead: 200%
Memory: 2MB

🚨 Significant overhead requiring immediate investigation

Threshold Setting Guidelines

Each package should set thresholds based on its architecture. Different package types have different overhead characteristics:

Threshold Guidelines by Package Type

Package Type	Lag Threshold	Overhead Threshold	Memory Threshold	Rationale
Thin library wrappers	< 50ms	< 10%	< 300KB	Minimal transformation, no transport overhead
SDK integrations	< 80ms	< 35%	< 350KB	Moderate data transformation and validation
MCP servers	< 100ms	< 50%	< 400KB	Includes stdio/JSON-RPC transport + protocol overhead
Complex frameworks	< 150ms	< 75%	< 500KB	Multiple abstraction layers, state management

Why MCP Servers Have Higher Thresholds

MCP servers (like @youdotcom-oss/mcp) have higher thresholds (100ms/50%/400KB) compared to library packages (50ms/10%/300KB) because of architectural differences:

MCP Server Overhead Sources:

Stdio transport: Process IPC adds 20-40ms latency
JSON-RPC protocol: Serialization/deserialization adds 10-20ms
Client SDK: @modelcontextprotocol/sdk protocol overhead
Process spawning: Each test spawns MCP server as subprocess
State management: Client state, connection pools, schemas

Library Package Overhead Sources:

Data transformation: Converting between formats
Validation: Zod schema validation (5-15ms)
Error handling: Try/catch and error formatting

Example Comparison:

// Library wrapper (50ms threshold)
export const callApi = async (params) => {
  // Just validation + fetch + transform
  const validated = schema.parse(params);  // 5ms
  const response = await fetch(url, validated);  // API time (not counted)
  return transform(response);  // 10ms
  // Total overhead: ~15ms
};

// MCP server (100ms threshold)
const result = await client.callTool({ name: 'api', arguments: params });
// Overhead includes:
// - Stdio IPC: 30ms
// - JSON-RPC: 15ms
// - Validation: 5ms
// - Transform: 10ms
// - Client SDK: 20ms
// Total overhead: ~80ms

Fixed vs Proportional Overhead

Fixed overhead: Serialization, validation, setup (same regardless of data size)
Proportional overhead: Data transformation, parsing (grows with data size)
Note: Fixed overhead shows high % on fast operations

Perception Thresholds

Imperceptible: < 50ms
Perceivable: 50-150ms
Noticeable: 150-300ms
Annoying: > 300ms

Aim for thresholds that keep total lag below perception thresholds.

Latest Test Results

Last Updated: 2026-05-04T13:32:09.948Z Workflow Run: View Results

Package	Processing Lag	Overhead %	Memory	Status
@youdotcom-oss/mcp	✅ 18.00ms (< 100.00ms)	✅ 4.48% (< 50.00%)	✅ 105.24KB (< 400.00KB)	✅ Pass
@youdotcom-oss/ai-sdk-plugin	✅ 38.31ms (< 80.00ms)	✅ 10.25% (< 35.00%)	✅ 15.62KB (< 350.00KB)	✅ Pass

View detailed metrics

@youdotcom-oss/mcp

Timestamp: 2026-05-04T13:31:46.880Z

Processing Lag:

Raw API avg: 401.59ms
Wrapper avg: 419.59ms
Processing lag: 18.00ms
Threshold: < 100.00ms
Status: ✅ Pass

Overhead:

Percentage: 4.48%
Threshold: < 50.00%
Status: ✅ Pass

Memory:

Heap before: 11804.10KB
Heap after: 11909.34KB
Growth: 105.24KB
Threshold: < 400.00KB
Status: ✅ Pass

Test Configuration:

Iterations: 20

@youdotcom-oss/ai-sdk-plugin

Timestamp: 2026-05-04T13:32:09.205Z

Processing Lag:

Raw API avg: 373.85ms
Wrapper avg: 412.16ms
Processing lag: 38.31ms
Threshold: < 80.00ms
Status: ✅ Pass

Overhead:

Percentage: 10.25%
Threshold: < 35.00%
Status: ✅ Pass

Memory:

Heap before: 11376.56KB
Heap after: 11392.18KB
Growth: 15.62KB
Threshold: < 350.00KB
Status: ✅ Pass

Test Configuration:

Iterations: 20

Running Performance Measurements

To measure performance for all packages manually:

# From repository root
bun scripts/performance/measure.ts > results.json

This measures all packages in a single run and outputs results as JSON. The weekly workflow uses this same script.

Package Performance Thresholds

@youdotcom-oss/mcp

Processing lag: < 100ms
Overhead percentage: < 50%
Memory overhead: < 400KB

@youdotcom-oss/ai-sdk-plugin

Processing lag: < 80ms
Overhead percentage: < 35%
Memory overhead: < 350KB

Common Troubleshooting

High Processing Lag

Symptoms: Consistently exceeds threshold by 2x or more

Common Causes:

Unnecessary data transformations
Redundant validation passes
Inefficient serialization
Synchronous blocking operations
N+1 query patterns

Debug Approach:

# Profile with CPU profiler
bun --cpu-prof test src/tests/processing-lag.spec.ts

# Identify hotspots in generated .cpuprofile file

High Memory Overhead

Symptoms: Heap growth exceeds threshold

Common Causes:

Large schema objects not being reused
Response data not being garbage collected
Caching without bounds
Circular references preventing GC
Memory leaks in event listeners

Debug Approach:

# Profile with heap profiler
bun --heap-prof test src/tests/processing-lag.spec.ts

# Analyze .heapprofile for large objects

Inconsistent Results

Symptoms: Large variance between test runs (±50% or more)

Common Causes:

Network variability (use stable connection)
System load (close other applications)
Insufficient iterations (increase iteration count)
Skipped warmup phase
Background processes interfering

Solutions:

Run tests on stable network without VPN
Close resource-intensive applications
Increase iterations (10 → 20 or 5 → 10)
Verify warmup phase executes
Run multiple times and compare results

Test Timeouts

Symptoms: Tests fail with timeout errors

Common Causes:

AI processing APIs (Express agent)
Rate limiting from too many requests
Network latency spikes
Insufficient timeout configuration

Solutions:

test.serial('Slow operation', async () => {
  // test body
}, { timeout: 60000 }); // 60s timeout

When to Add Performance Tests

Add processing lag tests to packages that:

✅ Wrap You.com APIs - Quantify abstraction overhead ✅ Provide SDK interfaces - Track integration costs ✅ Transform data - Measure transformation overhead ✅ Add middleware layers - Quantify middleware costs

Skip processing lag tests for packages that:

❌ Pure utilities - No API wrapping ❌ CLI tools - User interaction dominates timing ❌ Documentation - No runtime code ❌ Configuration - Static data only

Best Practices

Test Design

Always include warmup phase
Use serial execution for consistency (test.serial())
Run multiple iterations to smooth variance
Measure both time and memory
Add delays between iterations for rate limiting

Threshold Setting

Start conservative, adjust based on empirical data
Document rationale for each threshold
Review thresholds quarterly
Update when architecture changes significantly

Maintenance

Run tests in CI on every PR
Alert on threshold violations
Review failures as optimization opportunities
Keep tests updated with API changes

FilesExpand file tree

PERFORMANCE.md

Latest commit

History

PERFORMANCE.md

File metadata and controls

Performance Testing Philosophy

Overview

Core Metrics

Processing Lag (Absolute Time)

Overhead Percentage (Relative Time)

Memory Overhead (Heap Growth)

Methodology

1. Warmup Phase

2. Measurement Phase

3. Memory Measurement

Interpreting Results

Good Results

Acceptable Results

Warning Signs

Critical Issues

Threshold Setting Guidelines

Threshold Guidelines by Package Type

Why MCP Servers Have Higher Thresholds

Fixed vs Proportional Overhead

Perception Thresholds

Latest Test Results

@youdotcom-oss/mcp

@youdotcom-oss/ai-sdk-plugin

Running Performance Measurements

Package Performance Thresholds

@youdotcom-oss/mcp

@youdotcom-oss/ai-sdk-plugin

Common Troubleshooting

High Processing Lag

High Memory Overhead

Inconsistent Results

Test Timeouts

When to Add Performance Tests

Best Practices

Test Design

Threshold Setting

Maintenance

Further Reading