-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28803 HBase Master stuck due to improper handling of WALSyncTimeoutException within UncheckedIOException #6254
Conversation
…eoutException within UncheckedIOException
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I don't know if this would go against any rule of Master aborting but I was surprised that Master noticed the WAL sync problem and it was not aborting even though the Procedure store was stuck. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@petersomogyi do you have means to reproduce the issue? Can you try running with this patch and see if it helps? |
Unfortunately, I'm not able to reproduce this issue. It happened once during an upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this seems reasonable to me
…eoutException within UncheckedIOException (apache#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…eoutException within UncheckedIOException (apache#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…eoutException within UncheckedIOException (apache#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…eoutException within UncheckedIOException (apache#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…eoutException within UncheckedIOException (#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…eoutException within UncheckedIOException (#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…eoutException within UncheckedIOException (#6254) Signed-off-by: Peter Somogyi <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
This change enhances the ProcedureExecutor to honor new
WALSyncTimeoutIOException
introduced on HBASE-27230 / #4641. This new behavior is to abort the HMaster.This appears to be the first time we're taking action based on exceptions thrown during procedure execution. I'm not sure if this is the right thing to do or we should be acting similarly for other exceptions.
@petersomogyi @Apache9 @comnetwork please take a look.