Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-46575][SQL][HIVE] Make HiveThriftServer2.startWithContext Deve…
…lopApi retriable and fix flakiness of ThriftServerWithSparkContextInHttpSuite ### What changes were proposed in this pull request? This PR adds an new param to HiveThriftServer2.startWithContext` to tell the `ThriftCLIService`s whether to call `System exit` or not when encountering errors. When developers call `HiveThriftServer2.startWithContext` and if an error occurs, `System exit` will be performed, stop the existing `SqlContext/SparkContext`, and crash the user app. There is also such a use case in our tests. We intended to retry starting a thrift server three times in total but it might stop the underlying SparkContext early and fail the rest. For example https://github.com/apache/spark/actions/runs/7271496487/job/19812142981 ```java 06:21:12.854 ERROR org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite: Error start hive server with Context org.scalatest.exceptions.TestFailedException: SharedThriftServer.this.tempScratchDir.exists() was true at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:151) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$beforeAll$1(SharedThriftServer.scala:59) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) 06:21:12.854 ERROR org.apache.hive.service.cli.thrift.ThriftCLIService: Error starting HiveServer2: could not start ThriftBinaryCLIService java.lang.NullPointerException: Cannot invoke "org.apache.thrift.server.TServer.serve()" because "this.server" is null at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:135) at java.base/java.lang.Thread.run(Thread.java:840) 06:21:12.941 ERROR org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite: Error start hive server with Context java.lang.IllegalStateException: LiveListenerBus is stopped. at org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:92) at org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:75) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.createListenerAndUI(HiveThriftServer2.scala:74) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:66) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:141) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$beforeAll$4(SharedThriftServer.scala:60) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) 06:21:12.958 WARN org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite: [info] org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite *** ABORTED *** (151 milliseconds) [info] java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. [info] This stopped SparkContext was created at: [info] [info] org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite.beforeAll(ThriftServerWithSparkContextSuite.scala:279) [info] org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) [info] org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69) [info] org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) [info] org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) [info] sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) [info] java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [info] java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] java.base/java.lang.Thread.run(Thread.java:840) [info] [info] The currently active SparkContext was created at: [info] [info] (No active SparkContext.) [info] at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:122) [info] at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:115) [info] at org.apache.spark.sql.SparkSession.newSession(SparkSession.scala:274) [info] at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:130) ``` ### Why are the changes needed? - Improve the programmability of `HiveThriftServer2.startWithContext` - Fix flakiness in tests ### Does this PR introduce _any_ user-facing change? no, developer API change and the default behavior is AS-IS. ### How was this patch tested? Verified ThriftServerWithSparkContextInHttpSuite locally ``` 18:20:02.840 ERROR org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite: A previous Hive's SessionState is leaked, aborting this retry 18:20:02.840 ERROR org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite: Error start hive server with Context java.lang.IllegalStateException: HiveThriftServer2 started in binary mode while the test case is expecting HTTP mode at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$startThriftServer$2(SharedThriftServer.scala:149) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$startThriftServer$2$adapted(SharedThriftServer.scala:144) at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576) at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574) at scala.collection.AbstractIterable.foreach(Iterable.scala:933) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:144) at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$beforeAll$1(SharedThriftServer.scala:60) 18:20:04.114 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 18:20:04.114 WARN org.apache.hadoop.hive.metastore.ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore hzyaoqin127.0.0.1 18:20:04.119 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException [info] - the scratch dir will not be exist (1 millisecond) [info] - SPARK-29911: Uncache cached tables when session closed (376 milliseconds) ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #44575 from yaooqinn/SPARK-46575. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>
- Loading branch information