Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28803 HBase Master stuck due to improper handling of WALSyncTimeoutException within UncheckedIOException #6254

Merged
merged 1 commit into from
Sep 30, 2024

Conversation

ndimiduk
Copy link
Member

This change enhances the ProcedureExecutor to honor new WALSyncTimeoutIOException introduced on HBASE-27230 / #4641. This new behavior is to abort the HMaster.

This appears to be the first time we're taking action based on exceptions thrown during procedure execution. I'm not sure if this is the right thing to do or we should be acting similarly for other exceptions.

@petersomogyi @Apache9 @comnetwork please take a look.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@petersomogyi
Copy link
Contributor

I'm not sure if this is the right thing to do or we should be acting similarly for other exceptions.

I don't know if this would go against any rule of Master aborting but I was surprised that Master noticed the WAL sync problem and it was not aborting even though the Procedure store was stuck.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 5s master passed
+1 💚 compile 3m 3s master passed
+1 💚 checkstyle 0m 37s master passed
+1 💚 spotbugs 1m 30s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 50s the patch passed
+1 💚 compile 2m 59s the patch passed
+1 💚 javac 2m 59s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 35s the patch passed
+1 💚 spotbugs 1m 37s the patch passed
+1 💚 hadoopcheck 10m 46s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 41s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
35m 37s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6254/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6254
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 79d504591184 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 69d8c03
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6254/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 3s master passed
+1 💚 compile 0m 59s master passed
+1 💚 javadoc 0m 30s master passed
+1 💚 shadedjars 5m 23s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 58s the patch passed
+1 💚 compile 0m 59s the patch passed
+1 💚 javac 0m 59s the patch passed
+1 💚 javadoc 0m 29s the patch passed
+1 💚 shadedjars 5m 17s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 222m 53s hbase-server in the patch passed.
247m 56s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6254/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6254
Optional Tests javac javadoc unit compile shadedjars
uname Linux 1f0dc8e2cf44 5.4.0-195-generic #215-Ubuntu SMP Fri Aug 2 18:28:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 69d8c03
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6254/2/testReport/
Max. process+thread count 4960 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6254/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk
Copy link
Member Author

@petersomogyi do you have means to reproduce the issue? Can you try running with this patch and see if it helps?

@petersomogyi
Copy link
Contributor

Unfortunately, I'm not able to reproduce this issue. It happened once during an upgrade.

Copy link
Contributor

@rmdmattingly rmdmattingly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this seems reasonable to me

@ndimiduk ndimiduk merged commit aa41b3e into apache:master Sep 30, 2024
1 check passed
@ndimiduk ndimiduk deleted the 28803-master branch September 30, 2024 10:50
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Sep 30, 2024
…eoutException within UncheckedIOException (apache#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Sep 30, 2024
…eoutException within UncheckedIOException (apache#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Sep 30, 2024
…eoutException within UncheckedIOException (apache#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Sep 30, 2024
…eoutException within UncheckedIOException (apache#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
ndimiduk added a commit that referenced this pull request Sep 30, 2024
…eoutException within UncheckedIOException (#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
ndimiduk added a commit that referenced this pull request Sep 30, 2024
…eoutException within UncheckedIOException (#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
ndimiduk added a commit that referenced this pull request Oct 1, 2024
…eoutException within UncheckedIOException (#6254)

Signed-off-by: Peter Somogyi <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants