A comprehensive benchmarking framework to evaluate the efficiency of Rust and TinyGo when compiled to WebAssembly across various computational workloads.
This project provides a rigorous performance comparison between:
| Language | Target | Runtime | Optimization |
|---|---|---|---|
| Rust 1.90 | wasm32-unknown-unknown |
Zero-cost abstractions, no GC | -O3, fat LTO |
| TinyGo 0.39 | wasm |
Garbage collected runtime | -opt=2, no scheduler |
Test Environment:
- Hardware: AWS EC2 c7g.2xlarge (4 CPU, 16GB RAM)
- OS: Ubuntu 22.04 (Linux/x86_64)
- Runtime: Headless Chromium v140+ (Puppeteer)
- Toolchain: Rust 1.90, Go 1.25.1, TinyGo 0.39, Node.js 24.7, Python 3.11+
# 1. Install dependencies
make check deps # Verify required tools
make init # Install project dependencies
# 2. Build WebAssembly modules
make build # Build all WASM modules for both languages
# 3. Run benchmarks
make run quick # Fast development test (~5-10s)
make run # Full benchmark suite (~30+ minutes)
# 4. Analyze results
make qc # Quality control validation
make analyze # Statistical analysis and visualization
# All-in-one execution
make all quick # Complete pipeline with quick settings
make all # Complete research-grade pipeline- Linux Ubuntu 22.04+ (AWS EC2 c7g instances recommended)
- Hardware: 4 CPU cores, 16GB RAM minimum for benchmark execution
| Tool | Version | Purpose |
|---|---|---|
| Rust | 1.90+ | WASM compilation with wasm32-unknown-unknown target |
| TinyGo | 0.39+ | Go-to-WASM compilation |
| Node.js | 24+ | Test harness and automation |
| Python | 3.11+ | Statistical analysis with uv |
| Binaryen | Latest | wasm-opt optimization |
| WABT | Latest | wasm-strip binary processing |
# Quick validation (~5-10s)
make run quick
# With visible browser (debugging)
make run quick headed
# Full benchmark suite (~30+ minutes)
make run
# Complete experimental pipeline
make all # build → run → qc → analyze → plots| Command | Purpose | Duration | Use Case |
|---|---|---|---|
make run quick |
Fast development test | 5-10s | Development validation |
make run |
Full benchmark suite | 30+ min | Research analysis |
make run quick headed |
Debug with visible browser | 5-10s | Troubleshooting |
make all quick |
Complete pipeline (quick) | 5-8 min | CI/CD validation |
make all |
Complete research pipeline | 45-60 min | Publication data |
# Custom benchmark execution
node scripts/run_bench.js --parallel --max-concurrent=4
node scripts/run_bench.js --verbose
node scripts/run_bench.js --timeout=120000
# Configuration editing
# Edit configs/bench.yaml or configs/bench-quick.yamlFor the easiest setup experience, use the provided Docker containerization that provides a fully isolated, pre-configured development and benchmarking environment.
- Docker Desktop installed and running (or Docker Engine on Linux)
- Minimum Resources: 4GB RAM allocated to Docker, 10GB free disk space
- Recommended: 8GB+ RAM for optimal benchmark performance
The Docker container includes:
- Pre-installed toolchains: Rust 1.90, TinyGo 0.39, Go 1.25.1, Node.js 24.7, Python 3.11+
- All dependencies: uv, pnpm packages, system libraries
- Isolated workspace: Persistent data volumes for results and builds
- Development tools: Full development environment with debugging capabilities
# step-by-step approach for development:
make docker start # Build and start container with health checks
make docker init # Initialize development environment
make docker build # Build WebAssembly modules
make docker run # Execute benchmarks
make docker analyze # Run statistical analysis and generate reports
# One-command complete pipeline
make docker full# Enter container for interactive development
make docker shell
# Build with specific options
make docker build rust # Build only Rust modules
make docker build tinygo # Build only TinyGo modules
make docker build parallel # Parallel build for faster compilation
# Run benchmarks with different configurations
make docker run quick # Fast development testing (~5-10s)
make docker run # Full benchmark suite (~30+ minutes)
# Quality control and analysis
make docker qc # Quality control validation
make docker stats # Statistical analysis
make docker plots # Generate visualization charts
# Testing and validation
make docker test # Run test suite
make docker validate # Cross-language validation# Container lifecycle
make docker start # Start container with health verification
make docker stop # Gracefully stop container
make docker restart # Restart container
make docker status # Show container status and resource usage
make docker logs # Display recent container logs
# Information and cleanup
make docker info # Show system information from container
make docker clean # Clean containers and images
make docker help # Display help information- Results: Benchmark data persists in
results/directory - Builds: Compiled WASM modules saved in
builds/directory - Reports: Analysis outputs available in
reports/directory - Environment: Toolchain versions locked in container
- Zero configuration: All dependencies pre-installed and configured
- Consistent environment: Same setup across different host systems
- Isolated development: No conflicts with host system packages
- Performance optimization: Containerized environment tuned for benchmarks
- Data persistence: Results and builds survive container restarts
- Easy cleanup: Complete environment isolation with simple cleanup
# Check container status
make docker status
# View container logs for debugging
make docker logs
# Restart if issues occur
make docker restart
# Clean and rebuild if needed
make docker clean
make docker startSee docs/docker-setup.md for detailed Docker documentation and advanced configuration options.
Three representative WebAssembly workloads designed to compare language performance:
- Type: CPU-intensive floating-point operations
- Purpose: Test scalar math performance and loop optimization
- Scales:
- micro: 64×64, 100 iterations
- small: 256×256, 500 iterations
- medium: 512×512, 1000 iterations
- large: 1024×1024, 2000 iterations
- Verification: FNV-1a hash on iteration counts
- Type: String processing and memory allocation
- Purpose: Test parsing performance and garbage collection impact
- Scales:
- micro: 500 records
- small: 5,000 records
- medium: 15,000 records
- large: 30,000 records
- Schema: Fields: id, value, flag, name with sequential/random/derived patterns
- Verification: Hash of aggregated field values
- Type: Dense numerical computation with memory access patterns
- Purpose: Test compute-intensive algorithms and cache utilization
- Scales:
- micro: 64×64 matrices
- small: 256×256 matrices
- medium: 384×384 matrices
- large: 576×576 matrices
- Verification: Hash of result matrix elements
🎯 Design Principle: Identical algorithms and verification across languages ensure fair comparison.
wasm-benchmark/
├── 📦 tasks/ # Benchmark implementations
│ ├── mandelbrot/
│ │ ├── rust/ # Rust WASM implementation
│ │ │ ├── src/lib.rs # Main benchmark entry point
│ │ │ ├── src/mandelbrot.rs # Core algorithm
│ │ │ ├── src/hash.rs # FNV-1a hashing
│ │ │ └── Cargo.toml # Rust configuration
│ │ └── tinygo/ # TinyGo WASM implementation
│ │ ├── main.go # Main benchmark entry point
│ │ ├── main_test.go # Unit tests
│ │ └── go.mod # Go module configuration
│ ├── json_parse/ # JSON parsing benchmark
│ │ ├── rust/src/ # Rust parser, generator, types
│ │ └── tinygo/ # TinyGo implementation
│ └── matrix_mul/ # Matrix multiplication
│ ├── rust/src/ # Rust matrix operations
│ └── tinygo/ # TinyGo implementation
├── 🔧 scripts/ # Build and automation
│ ├── build_all.sh # Complete build pipeline
│ ├── build_rust.sh # Rust-specific builds
│ ├── build_tinygo.sh # TinyGo-specific builds
│ ├── build_config.js # Configuration file generation
│ ├── run_bench.js # Main benchmark orchestrator
│ ├── fingerprint.sh # Environment fingerprinting
│ ├── validate_*.sh # Task validation scripts
│ ├── services/ # Service architecture (DI pattern)
│ │ ├── BrowserService.js # Browser automation
│ │ ├── ResultsService.js # Result collection
│ │ └── LoggingService.js # Structured logging
│ └── interfaces/ # Service interfaces
├── 🧪 harness/web/ # Browser test environment
│ ├── bench.html # Test execution page
│ ├── bench.js # Browser-side test logic
│ ├── config_loader.js # Configuration management
│ └── wasm_loader.js # WebAssembly module loading
├── 📊 analysis/ # Statistical analysis pipeline
│ ├── qc.py # Quality control & outlier detection
│ ├── statistics.py # Welch's t-test, Cohen's d analysis
│ ├── plots.py # Matplotlib visualization generation
│ ├── decision.py # Decision support analysis
│ ├── data_models.py # Data structure definitions
│ ├── validation.py # Cross-language validation
│ └── common.py # Shared utilities
├── 🧪 tests/ # Comprehensive test suite
│ ├── unit/ # Unit tests (Vitest)
│ │ ├── config-parser.test.js
│ │ └── statistics.test.js
│ ├── integration/ # Cross-language validation
│ │ └── cross-language.test.js
│ ├── utils/ # Test utilities
│ └── setup.js # Test configuration
├── ⚙️ configs/ # Configuration management
│ ├── bench.yaml # Production benchmark config
│ ├── bench.json # Compiled JSON configuration
│ ├── bench-quick.yaml # Development/CI config
│ └── bench-quick.json # Quick test configuration
├── 📈 results/ # Benchmark output data
├── 📋 reports/ # Generated reports and analysis
│ ├── plots/ # Visualization outputs
│ └── statistics/ # Statistical analysis reports
├── 🗂️ data/ # Reference data and validation
│ └── reference_hashes/ # Cross-language validation hashes
├── 🏗️ builds/ # Compiled WebAssembly modules
│ ├── rust/ # Optimized Rust WASM files
│ └── tinygo/ # Optimized TinyGo WASM files
├── meta.json # Experiment metadata
├── versions.lock # Toolchain version lock
├── pyproject.toml # Python dependencies (uv)
├── package.json # Node.js dependencies
├── vitest.config.js # Test framework configuration
├── eslint.config.js # Code quality configuration
└── Makefile # Automated build and workflow
This project includes comprehensive documentation in both English and Chinese to support different developer audiences:
| Document | Language | Description | Link |
|---|---|---|---|
| Command Reference | English | Complete guide to all available commands, workflows, and troubleshooting | command-reference_en.md |
| Statistical Terminology | English | Comprehensive statistical concepts and methods used in the project | statistical-terminology_en.md |
| Statistical Design Implementation | English | Detailed architecture and implementation of statistical analysis system | statistical-design-impl_en.md |
| Document | Language | Description | Link |
|---|---|---|---|
| Development TODO | English | Project development roadmap and implementation status | development-todo_en.md |
| Experiment Plan | English | Research methodology and experimental design | experiment-plan_en.md |
| Quick Flow Guide | English | Fast development and testing workflow | run-quick-flow_en.md |
| Document | Language | Description | Link |
|---|---|---|---|
| Timeout Configuration | English | Timeout settings and configuration guide | timeout-configuration_en.md |
💡 Tip: Start with the Command Reference for practical usage, then explore Statistical Terminology for methodology understanding.
Both languages export a unified C-style interface for fair comparison:
void init(uint32_t seed); // Initialize PRNG
uint32_t alloc(uint32_t n_bytes); // Allocate memory
uint32_t run_task(uint32_t params_ptr); // Execute & return result hash| Language | Target | Flags | Post-processing |
|---|---|---|---|
| Rust | wasm32-unknown-unknown |
-O3, fat LTO, 1 codegen unit |
wasm-strip, wasm-opt -O3 |
| TinyGo | wasm |
-opt=2, panic trap, no debug |
wasm-strip, wasm-opt -Oz |
To ensure implementation correctness and cross-language consistency in this performance comparison, the project employs a comprehensive validation framework. Since compiler optimizations and language-specific runtime behaviors can potentially affect computational results, establishing algorithmic equivalence is critical before comparing performance metrics.
Cross-Language Validation Mechanism:
The framework uses FNV-1a Hash algorithm to verify that both Rust and TinyGo implementations produce identical computational results across all benchmark tasks. This validation strategy includes 449 reference test vectors systematically generated to cover diverse computational scenarios:
- 320 Mandelbrot vectors: Covering various grid sizes (2×2 to 256×256), iteration counts (10-2000), complex plane regions, and zoom scales to validate floating-point computation consistency
- 112 JSON Parse vectors: Testing different record counts (0-65,535), seed variations, and edge cases to ensure parsing logic and data structure handling equivalence
- 17 Matrix Mul vectors: Spanning matrix dimensions (1×1 to 128×128) with varied seeds to validate numerical computation and memory access patterns
Purpose: This comprehensive validation ensures that any observed performance differences stem purely from language/compiler efficiency rather than algorithmic discrepancies, providing a fair and scientifically rigorous foundation for the benchmark comparison.
- Outlier Detection: IQR-based filtering (Q1-1.5×IQR, Q3+1.5×IQR)
- Stability Validation: Coefficient of Variation < 15% threshold
- Sample Size: Minimum 30 valid measurements per condition
- Cross-Language Verification: Hash-based result consistency
Significance Testing:
- Welch's t-test for unequal variances (more robust than Student's t-test)
- p-value interpretation: p < 0.05 for statistical significance
- 95% Confidence Intervals for mean difference estimation
Effect Size Analysis:
- Cohen's d for practical significance assessment
- Thresholds: |d| < 0.2 (negligible), 0.2-0.5 (small), 0.5-0.8 (medium), ≥0.8 (large)
- Decision Framework: Statistical significance + effect size → language recommendation
| Metric | Threshold | Purpose |
|---|---|---|
| Coefficient of Variation | ≤ 15% | Measurement stability |
| Minimum Sample Size | ≥ 30 | Statistical power |
| Success Rate | ≥ 80% | Data reliability |
| Timeout Rate | ≤ 10% | System stability |
# Generate reproducible environment snapshot
./scripts/fingerprint.sh
# Outputs: meta.json, versions.lockLocked Versions:
- All toolchain versions recorded (
versions.lock) - Dependency versions pinned (
pnpm-lock.yaml,uv.lock) - System information captured (
meta.json) - Random seeds fixed for deterministic data generation
- ✅ Hash Verification: FNV-1a algorithm ensures implementation correctness
- ✅ Cross-Language Consistency: 449 reference test vectors
- ✅ Statistical Validation: Automated quality control checks
- ✅ Audit Trail: Complete logging of benchmark execution
Two-Layer Validation Strategy:
-
Build Validation (
make test validate/./scripts/validate-tasks.sh)- Validates WASM build artifacts and reference hashes exist
- Uses TinyGo compiler to verify algorithm consistency
- Comprehensive test vectors (449 vectors across 3 tasks)
- Note:
matrix_mulshows partial compatibility (6/17 vectors pass)- Small matrices (≤4x4): Full consistency ✅
- Larger matrices: Floating-point precision differences due to compiler optimization variations
- This is a known limitation and does not affect benchmark validity
-
Runtime Validation (
make test/ browser integration tests)- Validates actual WASM execution in browser environment
- Tests with benchmark-specific parameters (seed=12345)
- All tasks including
matrix_mul(256×256, 384×384, 576×576) achieve 100% hash consistency ✅ - Validates memory management, performance characteristics, and error handling
Why Both Are Necessary:
- Build validation ensures implementation correctness across diverse inputs
- Runtime validation confirms production environment behavior
- Together they provide comprehensive quality assurance
| Component | Status | Implementation |
|---|---|---|
| Statistical Analysis | ✅ Complete | Welch's t-test, Cohen's d, confidence intervals |
| Quality Control | ✅ Complete | IQR outlier detection, CV validation |
| Cross-Language Validation | ✅ Complete | 449 reference test vectors |
| Visualization System | ✅ Complete | Bar charts, box plots, statistical tables |
| Test Suite | ✅ Complete | Unit, integration |
| Build System | ✅ Complete | Rust/TinyGo optimized builds |
| Documentation | ✅ Complete | Comprehensive guides and references |
- Command Reference:
make help - System Check:
make statusfor environment validation - Dependency Verification:
make check depsfor toolchain validation - Issue Reporting: Include output from
make infoandmake status
The benchmark generates comprehensive statistical visualizations to support engineering decision-making. All charts are automatically generated in reports/plots/:
Execution Time Comparison: Bar charts with means, medians, error bars, and statistical significance markers (p < 0.05) comparing Rust vs TinyGo performance.
Memory Usage Comparison: Memory consumption patterns highlighting GC impact analysis and memory management efficiency differences between languages.
Distribution & Variance Analysis: Box plots showing performance consistency, variance patterns, and outlier detection across all benchmark tasks and scales.
Effect Size Heatmap: Cohen's d visualization with color-coded significance levels showing practical significance of performance differences.
View Interactive Analysis Dashboard →
The decision_summary.html provides an interactive engineering decision support system with:
- Executive performance summary with stability scores
- Statistical significance analysis (Welch's t-test, Cohen's d)
- Interactive chart tabs for all visualizations
- Detailed comparison tables with language recommendations
- Methodology documentation and quality standards
The analysis system provides comprehensive statistical evaluation:
# Step-by-step analysis
make qc # Quality control: IQR outlier detection, CV validation
make stats # Statistical analysis: Welch's t-tests, Cohen's d effect sizes
make plots # Visualization generation: 4 chart types + HTML dashboard
make analyze # Complete pipeline: qc → stats → plots → decision support
# Direct analysis execution
python analysis/qc.py # Outlier detection and quality validation
python analysis/statistics.py # Statistical significance testing
python analysis/plots.py # Matplotlib chart generation✨ Analysis Features:
- Quality Control: Coefficient of variation analysis, outlier detection, success rate validation
- Statistical Testing: Welch's t-tests for unequal variances, 95% confidence intervals
- Effect Size Analysis: Cohen's d calculation with practical significance thresholds
- Visualization Suite: 4 chart types supporting engineering decision-making
- Decision Support: Automated language recommendation based on statistical evidence
- Single-platform Testing: Results from AWS EC2 Linux/Chromium environment
- System Interference: Background processes may affect timing precision
- Browser Variations: Results specific to Chromium V8 WebAssembly implementation
- Limited Task Coverage: Three computational patterns (not comprehensive)
- No I/O Testing: Focus on CPU/memory intensive workloads only
- Single-threaded: No multi-threading evaluation
- Memory Model: JavaScript-WASM boundary overhead not isolated
- Variance Heterogeneity: Welch's t-test accounts for unequal variances between languages
- GC Impact: TinyGo's garbage collector introduces measurement variability (higher CV thresholds)
- Matrix Multiplication (
matrix_mul):- Rust and TinyGo implementations show compiler-specific floating-point behavior
- Small matrices (≤4×4): Perfect consistency across all test vectors
- Medium-to-large matrices: Precision differences emerge with certain random seeds
- Benchmark validation (256×256+ with seed=12345): 100% consistent ✅
- Comprehensive test vectors (17 cases, varied seeds/sizes): 35% match rate
- This reflects real-world compiler optimization differences, not implementation errors
- Benchmark results remain valid for production use cases with fixed parameters
This project is licensed under the MIT License - see the LICENSE file for details.
Keywords: WebAssembly, WASM, Rust, Go, TinyGo, Performance, Benchmark, Comparison, Statistical Analysis



