Add superoperators and interpreter performance optimizations by fglock · Pull Request #305 · fglock/PerlOnJava

fglock · 2026-03-12T15:16:04Z

Summary

This PR adds superoperators to the bytecode interpreter and implements several performance optimizations.

Superoperators (Phase 3)

HASH_DEREF_FETCH (opcode 381): Combines DEREF_HASH + LOAD_STRING + HASH_GET for $hashref->{key}
ARRAY_DEREF_FETCH (opcode 382): Combines DEREF_ARRAY + LOAD_INT + ARRAY_GET for $arrayref->[n]
NONSTRICT variants (opcodes 383, 384): Handle symbolic references when no strict 'refs'
Bytecode reduction: $v[1]{a}{b}{c}->[2] reduced from 50 shorts to 32 shorts (36% reduction)

Interpreter Performance Optimizations (Phase 4)

Stack to ArrayDeque: Removed synchronization overhead in DynamicVariableManager and InterpreterState
usesLocalization flag: Skip DynamicVariableManager calls for code that does not use local
Cached InterpreterFrame: Pre-create frames to avoid allocation per call
Bug fix: withCapturedVars() now preserves optimization flags for closures

Benchmark Results

Simple closure benchmark (without Benchmark.pm overhead):

Original: ~274/s
After optimizations: ~430/s (+57% improvement)
JVM backend: ~1380/s (3.2x faster than interpreter)

Files Changed

Opcodes.java: Added opcodes 381-384
BytecodeCompiler.java: Pattern detection and usesLocalization tracking
BytecodeInterpreter.java: Superoperator handlers and optimization checks
InterpretedCode.java: usesLocalization flag, cachedFrame, getOrCreateFrame()
InterpreterState.java: Stack to ArrayDeque, frame caching
DynamicVariableManager.java: Stack to ArrayDeque
Disassemble.java: Disassembly support for new opcodes

Test plan

All existing tests pass (./gradlew test)
Benchmark shows 57% improvement
JFR profiling confirms DynamicVariableManager overhead eliminated

Generated with Devin

Performance optimizations for the bytecode interpreter: 1. DynamicVariableManager: Change Stack to ArrayDeque (no synchronization) 2. Add usesLocalization flag to InterpretedCode - BytecodeCompiler tracks when LOCAL_* or PUSH_LOCAL_VARIABLE opcodes are emitted - BytecodeInterpreter skips getLocalLevel/popToLocalLevel/RegexState.save when the code doesn't use localization This reduces overhead for subroutines that don't use `local` variables. Benchmark (without Benchmark.pm overhead): 357/s interpreter vs 1250/s JVM Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

- Stack → ArrayDeque changes in DynamicVariableManager and InterpreterState - usesLocalization flag to skip unnecessary DVM calls - Benchmark results: 357/s interpreter vs 1250/s JVM Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

- Add cachedFrame field to InterpretedCode - Add getOrCreateFrame() method that returns cached frame when names match - Modify InterpreterState.push() to use pre-created frame - Add pushFrame() method for direct frame reuse This avoids allocating a new InterpreterFrame record on every subroutine call. The frame is cached on first use and reused for subsequent calls with the same package/subroutine names. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

Bug fix: When creating closures via withCapturedVars(), the usesLocalization flag and cachedFrame were not copied to the new InterpretedCode instance. This caused the DynamicVariableManager optimization to not work for closures. Performance improvement: 330/s → 384/s (16% faster) on closure benchmark Profiler now shows: - No more DynamicVariableManager.popToLocalLevel calls - No more RegexState.save calls - No more DynamicVariableManager.getLocalLevel calls Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

- Interpreter now 57% faster (274/s → 430/s) - JVM backend is 3.2x faster than interpreter - Document remaining hotspots from JFR profiling Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

- JFR profiling workflow with jperl and jfr command analysis - Common interpreter overhead sources (synchronized collections, DynamicVariableManager, object allocations, ThreadLocal) - Optimization flag pattern with important note about preserving flags when copying objects - Pre-allocation pattern for reusable objects - Profiler-guided optimization workflow - Example results from Phase 4 work (+57% improvement) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

…necks Changed "Common Interpreter Overhead Sources" to "What to Look For in Profiler Output" - focuses on patterns to identify rather than telling users what the bottlenecks are. Bottlenecks should be discovered through profiling, not assumed. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

The usesLocalization optimization was skipping DynamicVariableManager calls for code without local variables, but this broke defer blocks and regex state save/restore which also use DVM. Added PUSH_DEFER and SAVE_REGEX_STATE to the list of opcodes that set usesLocalization = true. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

This reverts commit 73955d9. The commit caused a regression in re/pat.t (1064 vs 1065). Copying usesLocalization from parent to closure incorrectly skipped DVM cleanup for closures that use regex state via (?{...}) code blocks. The safe default (usesLocalization = true) should be used for closures until a proper fix that accounts for regex state in closures. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

fglock force-pushed the feature/superoperators branch 2 times, most recently from 98303fd to 1627223 Compare March 12, 2026 19:43

fglock and others added 7 commits March 12, 2026 20:43

docs: Update Phase 4 with final benchmark results

7581a85

- Interpreter now 57% faster (274/s → 430/s) - JVM backend is 3.2x faster than interpreter - Document remaining hotspots from JFR profiling Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>

fglock force-pushed the feature/superoperators branch from 1627223 to c7f19ba Compare March 12, 2026 19:43

fglock and others added 2 commits March 12, 2026 21:09

fglock merged commit 4b0f138 into master Mar 12, 2026
2 checks passed

fglock deleted the feature/superoperators branch March 12, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add superoperators and interpreter performance optimizations#305

Add superoperators and interpreter performance optimizations#305
fglock merged 9 commits intomasterfrom
feature/superoperators

fglock commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented Mar 12, 2026

Summary

Superoperators (Phase 3)

Interpreter Performance Optimizations (Phase 4)

Benchmark Results

Files Changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant