Skip to content

Commit c4b954c

Browse files
chrisvittalkush-chandra
authored andcommitted
[query] Daemonize Py4JQueryDriver http server (hail-is#15050)
PR hail-is#14698 refactored the startup of the py4j http server so that calling `pyHttpServer` created and started the sever. In the process, it deleted the daemon thread that the server was started from originally. This caused the `httpServer.start` method to spawn a non-daemon thread as it was being called from the main thread py4j was using to do its work. Simply calling `setExecutor(null)` is not sufficent for this purpose. Manually stopping the server, like with `hl.stop()` would solve the issue since the JVM would exit and spark would note that the job was complete after the python process exited as well. We once again create a thread that itself starts the http server, and then start that thread in `pyHttpServer`. This should solve the hanging in dataproc that bisection showed was introduced in hail-is#14698. ## Security Assessment - This change cannot impact the Hail Batch instance as deployed by Broad Institute in GCP
1 parent 4f1065a commit c4b954c

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

hail/hail/src/is/hail/backend/driver/Py4JQueryDriver.scala

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -403,23 +403,40 @@ final class Py4JQueryDriver(backend: Backend) extends Closeable {
403403

404404
// 0 => let the OS pick an available port
405405
private[this] val httpServer = HttpServer.create(new InetSocketAddress(0), 10)
406+
httpServer.createContext("/", runRpc(_: HttpExchange))
406407

407408
@nowarn def port: Int = httpServer.getAddress.getPort
408409
override def close(): Unit = httpServer.stop(10)
409410

410411
// This HTTP server *must not* start non-daemon threads because such threads keep the JVM
411412
// alive. A living JVM indicates to Spark that the job is incomplete. This does not manifest
412-
// when you run jobs in a local pyspark (because you'll Ctrl-C out of Python regardless of
413-
// the JVM's state) nor does it manifest in a Notebook (again, you'll kill the Notebook kernel
413+
// when you run jobs in a local pyspark (because you'll Ctrl-C out of Python regardless of the
414+
// JVM's state) nor does it manifest in a Notebook (again, you'll kill the Notebook kernel
414415
// explicitly regardless of the JVM). It *does* manifest when submitting jobs with
415416
//
416417
// gcloud dataproc submit ...
417418
//
418419
// or
419420
//
420421
// spark-submit
421-
httpServer.createContext("/", runRpc(_: HttpExchange))
422+
//
423+
// setExecutor(null) ensures the server creates no new threads:
424+
//
425+
// > If this method is not called (before start()) or if it is called with a null Executor,
426+
// > then a default implementation is used, which uses the thread which was created by
427+
// > the start() method.
428+
//
429+
/* Source:
430+
* https://docs.oracle.com/en/java/javase/11/docs/api/jdk.httpserver/com/sun/net/httpserver/HttpServer.html#setExecutor(java.util.concurrent.Executor) */
422431
httpServer.setExecutor(null) // ensures the server creates no new threads
423-
httpServer.start()
432+
433+
// Note that simply calling httpServer.start() from a non-daemon thread will spawn a
434+
// non-daemon thread itself.
435+
private[this] val thread = new Thread(new Runnable() {
436+
def run(): Unit = httpServer.start()
437+
})
438+
439+
thread.setDaemon(true)
440+
thread.start()
424441
}
425442
}

0 commit comments

Comments
 (0)