Eliminate redundant associative container lookups across the compiler#16606
Eliminate redundant associative container lookups across the compiler#16606k06a wants to merge 2 commits intoargotorg:developfrom
Conversation
…pass access (+added code style rule)
|
Thank you for your contribution to the Solidity compiler! A team member will follow up shortly. If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother. If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix. |
…ser pipeline from std::map/set to std::unordered_map/set
clonker
left a comment
There was a problem hiding this comment.
Thanks! Very cool, this brings indeed some nice performance gains:
Benchmark Pipeline Metric Base Target Delta
------------------ -------- ------------- --------------- --------------- -------
openzeppelin-5.6.1 evmasm bytecode_size 746,436 746,682 +0.03%
cpu_time 9.41s 8.89s -5.6%
cycles 51,343,509,758 48,467,228,284 -5.6%
instructions 139,638,007,045 136,854,559,330 -1.99%
peak_rss 1027 MiB 1028 MiB +0.09%
wall_time 9.46s 8.93s -5.61%
openzeppelin-5.6.1 ir bytecode_size 696,529 696,775 +0.04%
cpu_time 28.20s 24.38s -13.54%
cycles 155,040,244,671 134,366,041,115 -13.33%
instructions 319,299,139,415 301,667,743,203 -5.52%
peak_rss 1530 MiB 1531 MiB +0.1%
wall_time 28.36s 24.50s -13.61%
Note the bytecode size increase is just metadata. Beyond the results are bytecode-identical.
Beyond that, please go through the changes and check that
- includes are fine (by changing the containers some
setandmapincludes might have gone stale. i think i found some of them but maybe not all) - you have applied
constwhere possible. many of the iterators (or tuple of iterator and bool whether something was emplaced) can beconsted.clang-tidycan help you here. - if-with-initializer is formatted differently. it does introduce a bit of churn but increases readability and makes the set of changes more uniform with the rest of the codebase
- please restore the rationale comments that the refactors removed (e.g. on storeInStorage/storeInMemory short-circuits)
- you can probably get away with a few of these not changing their container type. especially the ones that have a low amount of elements and/or are in cold codepaths are perhaps better off if they are left as
map/set. the biggest wins are presumably from optimizer-internal containers that are iterated over and/or looked up per (subset of) AST node. that should shrink the diff significantly while maintaining the reported performance figures.
|
|
||
| int CSECodeGenerator::classElementPosition(Id _id) const | ||
| { | ||
| auto it = m_classPositions.find(_id); |
There was a problem hiding this comment.
| auto it = m_classPositions.find(_id); | |
| auto const it = m_classPositions.find(_id); |
| case BasicBlock::EndType::JUMPI: | ||
| case BasicBlock::EndType::HANDOVER: | ||
| { | ||
| auto it = blockByBeginPos.find(block.end); |
There was a problem hiding this comment.
| auto it = blockByBeginPos.find(block.end); | |
| auto const it = blockByBeginPos.find(block.end); |
| continue; | ||
| BlockId nextId(push.data()); | ||
| if (m_blocks.count(nextId) && m_blocks.at(nextId).prev) | ||
| auto nextIt = m_blocks.find(nextId); |
There was a problem hiding this comment.
| auto nextIt = m_blocks.find(nextId); | |
| auto const nextIt = m_blocks.find(nextId); |
| //@todo we might have to do something like incrementing the sequence number for each JUMPDEST | ||
| assertThrow(!!item.blockId, OptimizerException, ""); | ||
| if (!m_blocks.count(item.blockId)) | ||
| auto blockIt = m_blocks.find(item.blockId); |
There was a problem hiding this comment.
| auto blockIt = m_blocks.find(item.blockId); | |
| auto const blockIt = m_blocks.find(item.blockId); |
| int stackDiff = m_stackHeight - _other.m_stackHeight; | ||
| for (auto it = m_stackElements.begin(); it != m_stackElements.end();) | ||
| if (_other.m_stackElements.count(it->first - stackDiff)) | ||
| if (auto otherIt = _other.m_stackElements.find(it->first - stackDiff); otherIt != _other.m_stackElements.end()) |
There was a problem hiding this comment.
| if (auto otherIt = _other.m_stackElements.find(it->first - stackDiff); otherIt != _other.m_stackElements.end()) | |
| if ( | |
| auto otherIt = _other.m_stackElements.find(it->first - stackDiff); | |
| otherIt != _other.m_stackElements.end() | |
| ) |
|
|
||
| #pragma once | ||
|
|
||
| #include <libyul/Builtins.h> |
There was a problem hiding this comment.
I don't love this, this pulls in a bunch of stuff which, imo, has no place in a forwarding header (esp YulName.h). From my point of view it would be better to move FunctionHandle (and its hash specialization) out of this header into its own thing that is only included where needed. The AST itself doesn't require it. That leaves it with the BuiltinHandle POD in here and you don't have to drag the whole YulName machinery into ASTForward.
| } | ||
| }; | ||
|
|
||
| template<> struct hash<std::variant<solidity::yul::YulName, solidity::yul::BuiltinHandle>> |
There was a problem hiding this comment.
Ideally, this would be hash<FunctionHandle> and together with the typedef, see https://github.com/argotorg/solidity/pull/16606/changes#r3195152704
|
|
||
| std::vector<std::shared_ptr<ObjectNode>> subObjects; | ||
| std::map<std::string, size_t, std::less<>> subIndexByName; | ||
| std::unordered_map<std::string, size_t> subIndexByName; |
There was a problem hiding this comment.
This loses the transparent comparator, perhaps you can do something here akin to the change in EVMDialect. Anyways these members should be quite cold, I don't expect any performance win from even touching them.
| if (auto it = m_map.find(key); it != m_map.end()) | ||
| use(it->second); |
There was a problem hiding this comment.
| if (auto it = m_map.find(key); it != m_map.end()) | |
| use(it->second); | |
| if ( | |
| auto const it = m_map.find(key); | |
| it != m_map.end() | |
| ) | |
| use(it->second); |
| if (auto [it, inserted] = m_set.insert(value); inserted) | ||
| onNewElement(); |
There was a problem hiding this comment.
| if (auto [it, inserted] = m_set.insert(value); inserted) | |
| onNewElement(); | |
| if ( | |
| auto const [it, inserted] = m_set.insert(value); | |
| inserted | |
| ) | |
| onNewElement(); |
Summary
Bottom-up optimization of compiler internals. No compilation logic, code generation, or optimizer behavior changes. Bytecode is bit-for-bit identical to the baseline (verified with
--metadata-hash none).Commit 1 — Eliminate redundant associative container lookups
count/find+at/operator[]) with singlefind+ iterator reuse,try_emplace, orinsert().secondacross 42 files inlibevmasm,libyul,libsolidity,libsolutil.std::set/std::maptostd::unordered_set/std::unordered_mapin 8 files where iteration order is never used (MultiUseYulFunctionCollector,FullInliner,InlinableExpressionFunctionFinder,EVMCodeTransform,AsmAnalysis,PathGasMeter,AST.cpp,TypeSystem.h).std::erase_if(C++20) inKnownState.cpp.CODING_STYLE.mdcodifying these best practices.Commit 2 — Convert hot YulName/FunctionHandle-keyed containers in the Yul optimiser pipeline
Extends the same refactor into the Yul optimiser's hot path, specifically the data-flow / semantics / control-flow analysis layers:
std::hash<BuiltinHandle>(inBuiltins.h) andstd::hash<FunctionHandle>(std::variant<YulName, BuiltinHandle>). RouteBuiltins.hthroughASTForward.hso the specialisation is visible whereverFunctionHandleis used.State::value,Environment::keccak(with a customYulNamePairHashusingboost::hash_combine),Scope::variables,m_functionSideEffects→ unordered.m_offsets,m_lastKnownValue→ unordered.m_groupMembersleft asstd::map<YulName, std::set<YulName>>on purpose —*group->begin()picks the minimum representative, which must stay deterministic.std::map<FunctionHandle, SideEffects>→std::unordered_mapend-to-end (SideEffectsPropagator,SideEffectsCollector,MovableChecker,DataFlowAnalyzer,CommonSubexpressionEliminator,EqualStoreEliminator,LoadResolver,UnusedStoreEliminator,UnusedPruner,LoopInvariantCodeMotion, plusFunctionSideEffectstest). Same forstd::map<YulName, ControlFlowSideEffects>inControlFlowSideEffectsCollector::functionSideEffectsNamed()and all its consumers.ReferencesCounter::countReferences()andVariableReferencesCounter::countReferences()now returnstd::unordered_map; all call sites updated.Substitution,FunctionCopier::m_translations,NameSimplifier::m_translations,VarNameCleaner(m_namesToKeep,m_usedNames,m_translatedNames),SyntacticalEquality::m_identifiers{LHS,RHS},ExpressionJoiner::m_references,EquivalentFunction{Combiner,Detector}::m_duplicates,Disambiguator::m_translations,AssignmentCounter::m_assignmentCounters,Rematerialiser(m_referenceCounts,m_varsToAlwaysRematerialize),SSAValueTracker::ssaVariables()→ unordered.EVMDialect::m_reserved→std::unordered_setwith a transparent hasher (heterogeneousstring_viewlookup preserved),Object::{objectPaths, dataPaths, subIndexByName},CompilerContext::m_externallyUsedYulFunctions,Assembly::m_namedTags→ unordered. Localstd::set<YulName>inCodeTransform::assignedFunctionNamesandLoopInvariantCodeMotion::{ssaVars, varsDefinedInScope}→ unordered.count()+at()patterns inControlFlowSideEffectsCollector,ObjectOptimizer,Semantics::containsNonContinuingFunctionCall,Assembly::namedTag,NameSimplifier::findSimplification.Containers that require ordered iteration (
NameCollector::m_names,AssignmentsSinceContinue::m_names,assignedVariableNames(),KnowledgeBase::m_groupMembersinner set,IRGenerationContext::m_usedSourceNames,NameDispenser::usedNames()public API,OptimiserStepContext::reservedIdentifiers,OptimiserSuite::run(..., _externallyUsedIdentifiers)) are deliberately left untouched.Motivation
Profiling via-IR compilation of OpenZeppelin 5.0.2 shows
YulString::operator<at ~2.6% of samples andstd::__treeoperations onYulName/FunctionHandleat ~4.4% combined — pure overhead from using ordered containers for keys whose iteration order never mattered. Each lookup is O(log n) with O(key length) comparisons; unordered variants make it amortized O(1) with a single hash of the 64-bit handle.These are mechanical, behavior-preserving changes — the compiler works exactly the same way, it just uses its data structures more efficiently.
Benchmark (via-IR,
--optimize, 10 runs each, first run discarded as cold, median of 9)(Absolute times are lower than in the previous revision because this run was done on a cooler machine — the relative deltas are what matters. Compared to v1 of this PR, the second commit adds roughly -1.7 p.p. on OZ 5.0.2 and -7 p.p. on Uniswap V4 / OZ 4.9.0 — the new optimizations target DataFlowAnalyzer / KnowledgeBase / Semantics, which are on the hot path of every via-IR compilation rather than project-specific hot paths.)
Run-to-run spread is also tight (≤ 0.5 s on all three projects), consistent with eliminating cache-miss-heavy tree traversals.
Test plan
--metadata-hash noneon several OpenZeppelin contracts)-Werrorenabled)