fix: compiler outputs might exceed the max buffer size #6411
+432
−84
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves: #6336
Resolves: #6328
In this PR, I start using streams when handling the solc compiler outputs to support the compilation of very large codebases where the compilation outputs might exceed the maximum buffer size/string length.
Changes to the build system and cache
I added two new file system access functions:
readJsonFileAsStream
: opens the file as a stream, passes it to the streaming JSON parser, picks the full parsed object from the JSON values streamwriteJsonFileAsStream
: parses the JSON object as a stream of tokens, writes it to the fileThese functions allow us to avoid storing the full compiler output as a string, which can exceed the maximum string length at times.
To implement these functions, I added two new dependencies:
These functions are used when interacting with compiler output objects, in the cache and when emitting the build info output.
Changes to the compiler
Previously, the compiler would execute the underlying solc compiler using the
execFile
function. With that function, we would end up waiting for the entire output to be returned to us as a string. Unfortunately, for very large compiler outputs, the resulting output would exceed the max buffer size and if we increased the buffer size further, we would eventually hit the maximum string length limit.I removed the usage of
execFile
from the compiler and replaced it withspawn
, where we pipe the output to a temporary file which we later read using the newly addedreadJsonFileAsStream
function. This allows us to avoid creating a string of the compiler output.Follow-ups
Deduplicate compiler output file creation
At the moment, we create a file with compiler output three times, in the Compiler, when saving the output to the cache, and when emitting the build info output. We should consider creating it only once and then copying/moving it when needed. This will require changing the build info output format which currently has some extra information apart from the compiler output.
Testing
I rerun the solidity testing testing suite using this branch which cleared all the 143s and max buffer size exceeded errors. My hypothesis on why the 143 were cleared as well is that in these cases, we weren't exceeding the max buffer size, but we were running very near it, which caused the memory pressure.
https://github.com/galargh/solidity-testing-testing/actions/runs/13542801435