[FLINK-39754][core] Fix int overflow in DataOutputSerializer.resize#28252
[FLINK-39754][core] Fix int overflow in DataOutputSerializer.resize#28252sauliusvl wants to merge 2 commits into
Conversation
buffer.length * 2 uses int arithmetic and overflows to a negative value once the buffer crosses Integer.MAX_VALUE / 2 (~1.07 GB). Math.max then picks buffer.length + minCapacityAdd, so every subsequent resize grows the buffer by a handful of bytes instead of doubling, doing a full System.arraycopy of the ~1+ GB buffer each call. On large heaps this manifests as a silent O(n^2) hang until buffer.length + minCapacityAdd itself overflows and the existing NegativeArraySizeException catch translates it to an IOException. Extract the size computation into a package-private static helper that uses long arithmetic, caps at Integer.MAX_VALUE - 8 (matching java.util.ArrayList), and jumps to the cap once doubling would overflow so serializations that just barely fit under 2 GB still complete.
| public class DataOutputSerializer implements DataOutputView, MemorySegmentWritable { | ||
|
|
||
| /** | ||
| * Maximum array length the JVM can allocate. Some VMs reserve a few header bytes, so we cap |
There was a problem hiding this comment.
Do you have a reference to "Some VMs reserve a few header bytes". I would have thought Integer.MAX_VALUE should be the value - I have not heard of needing to only use slightly less.
There was a problem hiding this comment.
I think the LLM learned it from the openjdk source code, this part specifically
| } catch (OutOfMemoryError ee) { | ||
| // still not possible. give an informative exception message that reports the | ||
| // size | ||
| throw new IOException( |
There was a problem hiding this comment.
should we pre allocate this Exception in case cannot create a new IOException as we have no memory? Or just leave out of memory to percolate up
There was a problem hiding this comment.
it's my understanding that we'd end up here if a potentially multi-gigabyte array allocation failed, not because we've accumulated tons of small objects and exhausted the heap, so actually there should be enough memory to create a new exception object here. Pre-allocating would also drop the stack trace and the message
| */ | ||
| @VisibleForTesting | ||
| static int computeNewBufferLength(int currentLength, int minCapacityAdd) throws IOException { | ||
| long requiredLen = (long) currentLength + minCapacityAdd; |
There was a problem hiding this comment.
I think we should check that these values are positive. I am not sure if 0 for either value would not cause issues.
There was a problem hiding this comment.
added precondition checks for good measure, no code path calls these with zeroes or negative numbers currently
Add Preconditions checks that currentLength is non-negative and minCapacityAdd is positive, matching the contract of java.util.ArraysSupport.newLength. Addresses review feedback. Generated-by: Claude (Anthropic, Opus 4.8) via Zed editor
Jackeyzhe
left a comment
There was a problem hiding this comment.
Thanks for the PR, LGTM
What is the purpose of the change
Fixes FLINK-39754.
DataOutputSerializer.resize()usesintarithmetic forbuffer.length * 2. Oncebuffer.lengthcrossesInteger.MAX_VALUE / 2(~1.07 GB), doubling overflows to a negativeint,Math.maxthen picksbuffer.length + minCapacityAdd, and every subsequent resize grows the buffer by a handful of bytes instead of doubling — doing a fullSystem.arraycopyof the ~1+ GB buffer each call. On large heaps this manifests as a silent O(n²) hang untilbuffer.length + minCapacityAdditself overflows and the existingcatch (NegativeArraySizeException)translates it to anIOException.Brief change log
resize(int)into a@VisibleForTestingpackage-private static helpercomputeNewBufferLength(int, int).longarithmetic, validates against a newMAX_ARRAY_SIZE = Integer.MAX_VALUE - 8cap (matchingjava.util.ArrayList), and jumps to the cap when doubling would overflow — so serializations that just barely fit under 2 GB still complete instead of grinding through a linear-step resize loop.catch (NegativeArraySizeException)block fromresize. The existingOutOfMemoryErrorretry path is preserved (it addresses an independent concern — doubled size exceeding available heap).Verifying this change
This change added tests and can be verified as follows:
computeNewBufferLengthinDataInputOutputSerializerTestcovering: normal doubling,minCapacityAdd-dominated growth, jump-to-cap whencurrentLength * 2would overflow, exact-cap boundary, andIOExceptionwhen the required size exceeds the cap. No multi-GB allocations required.DataInputOutputSerializerTesttests continue to pass, confirming the normal write/read paths throughresize()are unchanged.Does this pull request potentially affect one of the following parts:
@Public(Evolving): no (DataOutputSerializeris unannotated / internal)IOExceptioninstead of an opaqueNegativeArraySizeException-derived message.Documentation
Was generative AI tooling used to co-author this PR?
Generated-by: Claude (Anthropic, Opus 4.7) via Zed editor