-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#2060]fix(server): Fix memory leak when reach memory limit #2058
base: master
Are you sure you want to change the base?
Conversation
The error stack:
|
I think we need to first avoid the occurrence of OOM. The occurrence of OOM is either due to unreasonable memory configurations or bugs in the code. As for how the code should handle after OOM, I don't think it's very important, because the server has already malfunctioned at this point. Even if there is a memory leak, it's actually not important anymore. |
So can this PR be accepted ? Or should I just cancel this PR? @rickyma |
I'm OK with this PR. But it's meaningless. When an OOM error occurs, this PR will not help much. |
This PR is not to prevent the OOM exception, but to ensure that the pre-allocated ByteBuf can be released normally. |
You shouldn't catch OOM exception. If it throws OOM, more errors may throw. You can't recover it by just catching it. |
I don't understand what you mean. |
If it will OOM, the java process should exit. |
I need to clarify that it is not OOM but OutOfDirectMemoryError. From the stack trace, we can see that the server did not exit. |
Yeah, it is a Netty's internal OOM error. It's meaningless to catch this exception. On the other hand, this PR is harmless. So I choose to remain neutral. |
What changes were proposed in this pull request?
Fix shuffle server memory leak when reach memory limit.
Why are the changes needed?
Enable netty. One the shuffle server side, Netty will allocate memory for SEND_SHUFFLE_DATA_REQUEST request. However, when the memory limit is reached, an OutOfDirectMemoryError will be shown and the decode for this message will fail. This will cause the bytebuf allocated successfully in the previous batch in this message to not be released, resulting in memory leak.
Fix: #2060
Does this PR introduce any user-facing change?
No.
How was this patch tested?
(Please test your changes, and provide instructions on how to test it:
If you add a feature or fix a bug, add a test to cover your changes.
If you fix a flaky test, repeat it for many times to prove it works.)