Skip to content

Add superoperators and interpreter performance optimizations#305

Merged
fglock merged 9 commits intomasterfrom
feature/superoperators
Mar 12, 2026
Merged

Add superoperators and interpreter performance optimizations#305
fglock merged 9 commits intomasterfrom
feature/superoperators

Conversation

@fglock
Copy link
Owner

@fglock fglock commented Mar 12, 2026

Summary

This PR adds superoperators to the bytecode interpreter and implements several performance optimizations.

Superoperators (Phase 3)

  • HASH_DEREF_FETCH (opcode 381): Combines DEREF_HASH + LOAD_STRING + HASH_GET for $hashref->{key}
  • ARRAY_DEREF_FETCH (opcode 382): Combines DEREF_ARRAY + LOAD_INT + ARRAY_GET for $arrayref->[n]
  • NONSTRICT variants (opcodes 383, 384): Handle symbolic references when no strict 'refs'
  • Bytecode reduction: $v[1]{a}{b}{c}->[2] reduced from 50 shorts to 32 shorts (36% reduction)

Interpreter Performance Optimizations (Phase 4)

  • Stack to ArrayDeque: Removed synchronization overhead in DynamicVariableManager and InterpreterState
  • usesLocalization flag: Skip DynamicVariableManager calls for code that does not use local
  • Cached InterpreterFrame: Pre-create frames to avoid allocation per call
  • Bug fix: withCapturedVars() now preserves optimization flags for closures

Benchmark Results

Simple closure benchmark (without Benchmark.pm overhead):

  • Original: ~274/s
  • After optimizations: ~430/s (+57% improvement)
  • JVM backend: ~1380/s (3.2x faster than interpreter)

Files Changed

  • Opcodes.java: Added opcodes 381-384
  • BytecodeCompiler.java: Pattern detection and usesLocalization tracking
  • BytecodeInterpreter.java: Superoperator handlers and optimization checks
  • InterpretedCode.java: usesLocalization flag, cachedFrame, getOrCreateFrame()
  • InterpreterState.java: Stack to ArrayDeque, frame caching
  • DynamicVariableManager.java: Stack to ArrayDeque
  • Disassemble.java: Disassembly support for new opcodes

Test plan

  • All existing tests pass (./gradlew test)
  • Benchmark shows 57% improvement
  • JFR profiling confirms DynamicVariableManager overhead eliminated

Generated with Devin

@fglock fglock force-pushed the feature/superoperators branch 2 times, most recently from 98303fd to 1627223 Compare March 12, 2026 19:43
fglock and others added 7 commits March 12, 2026 20:43
Performance optimizations for the bytecode interpreter:

1. DynamicVariableManager: Change Stack to ArrayDeque (no synchronization)

2. Add usesLocalization flag to InterpretedCode
   - BytecodeCompiler tracks when LOCAL_* or PUSH_LOCAL_VARIABLE opcodes are emitted
   - BytecodeInterpreter skips getLocalLevel/popToLocalLevel/RegexState.save
     when the code doesn't use localization

This reduces overhead for subroutines that don't use `local` variables.

Benchmark (without Benchmark.pm overhead): 357/s interpreter vs 1250/s JVM

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
- Stack → ArrayDeque changes in DynamicVariableManager and InterpreterState
- usesLocalization flag to skip unnecessary DVM calls
- Benchmark results: 357/s interpreter vs 1250/s JVM

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
- Add cachedFrame field to InterpretedCode
- Add getOrCreateFrame() method that returns cached frame when names match
- Modify InterpreterState.push() to use pre-created frame
- Add pushFrame() method for direct frame reuse

This avoids allocating a new InterpreterFrame record on every subroutine call.
The frame is cached on first use and reused for subsequent calls with the
same package/subroutine names.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
Bug fix: When creating closures via withCapturedVars(), the usesLocalization
flag and cachedFrame were not copied to the new InterpretedCode instance.
This caused the DynamicVariableManager optimization to not work for closures.

Performance improvement: 330/s → 384/s (16% faster) on closure benchmark

Profiler now shows:
- No more DynamicVariableManager.popToLocalLevel calls
- No more RegexState.save calls
- No more DynamicVariableManager.getLocalLevel calls

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
- Interpreter now 57% faster (274/s → 430/s)
- JVM backend is 3.2x faster than interpreter
- Document remaining hotspots from JFR profiling

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
- JFR profiling workflow with jperl and jfr command analysis
- Common interpreter overhead sources (synchronized collections,
  DynamicVariableManager, object allocations, ThreadLocal)
- Optimization flag pattern with important note about preserving
  flags when copying objects
- Pre-allocation pattern for reusable objects
- Profiler-guided optimization workflow
- Example results from Phase 4 work (+57% improvement)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
…necks

Changed "Common Interpreter Overhead Sources" to "What to Look For in
Profiler Output" - focuses on patterns to identify rather than telling
users what the bottlenecks are. Bottlenecks should be discovered through
profiling, not assumed.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
@fglock fglock force-pushed the feature/superoperators branch from 1627223 to c7f19ba Compare March 12, 2026 19:43
fglock and others added 2 commits March 12, 2026 21:09
The usesLocalization optimization was skipping DynamicVariableManager
calls for code without local variables, but this broke defer blocks
and regex state save/restore which also use DVM.

Added PUSH_DEFER and SAVE_REGEX_STATE to the list of opcodes that
set usesLocalization = true.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
This reverts commit 73955d9.

The commit caused a regression in re/pat.t (1064 vs 1065). Copying
usesLocalization from parent to closure incorrectly skipped DVM
cleanup for closures that use regex state via (?{...}) code blocks.

The safe default (usesLocalization = true) should be used for closures
until a proper fix that accounts for regex state in closures.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
@fglock fglock merged commit 4b0f138 into master Mar 12, 2026
2 checks passed
@fglock fglock deleted the feature/superoperators branch March 12, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant