Hi,
I have found a situation where zmq-async would deadlock. The following code will do:
(ns debug.async.core
(:import [org.zeromq ZMQ$Socket])
(:require [com.keminglabs.zmq-async.core :refer [register-socket!]]
[clojure.core.async :refer [chan close! go >! <!]]))
(defn deadlock!
[]
(let [write-in (chan)
read-in (chan)
read-out (chan)]
(register-socket! {:in write-in :socket-type :push
:configurator (fn [^ZMQ$Socket s]
(.bind s "tcp://127.0.0.1:19999"))})
(register-socket! {:in read-in :out read-out :socket-type :pull
:configurator (fn [^ZMQ$Socket s]
(.connect s "tcp://127.0.0.1:19999"))})
;; This and the next go blocks just send message from one socket to another repeatedly
(go
(loop [c 0]
(>! write-in (str "send " c))
(recur (inc c))))
(go
(loop []
(println (String. (<! read-out)))
(recur)))
;; This loop opens and closes sockets repeatedly
(loop [c 0]
(println "open-close " c)
(let [in (chan)]
(register-socket! {:in in :socket-type :pub
:configurator (fn [^ZMQ$Socket s]
)})
(close! in)
(recur (inc c))))))
I also have a rough idea why the deadlock happens: in the above code, first, we have a lot of incoming messages from ZMQ sockets, so it is expected that the zmq-loop will spend a lot of time on this line
(>!! async-control-chan [incoming-sock-id msg])
At the same time, we also have lots of requests for opening/closing sockets, so the async-loop will spend a lot of time putting commands to zmq sockets onto the queue
Now, once the queue gets full, the put to the queue will block. If, at the same time, zmq-loop wants to put into async-control-chan as above, that would block too since async-loop still tries to put message onto the queue. We have a deadlock.
I have also found out that it is of no use either: increasing the queue size , or: give the async-control-chan some buffer, as long as it is not a buffer that drops messages , or doing them both. Doing them both only delays the deadlock by a tiny bit.
Changing pool and alts!! to be deterministic in the sense that they will always take the control channel/socket first if that's available, together with sufficient buffer, may be able to avoid this deadlock. I will investigate further.
I guess the reason that people have not discovered this deadlock is that in real situations it happens rarely, since we don't usually create/close sockets at this speed.
Hi,
I have found a situation where zmq-async would deadlock. The following code will do:
I also have a rough idea why the deadlock happens: in the above code, first, we have a lot of incoming messages from ZMQ sockets, so it is expected that the zmq-loop will spend a lot of time on this line
(>!! async-control-chan [incoming-sock-id msg])At the same time, we also have lots of requests for opening/closing sockets, so the async-loop will spend a lot of time putting commands to zmq sockets onto the queue
(.put queue msg)Now, once the queue gets full, the put to the queue will block. If, at the same time, zmq-loop wants to put into async-control-chan as above, that would block too since async-loop still tries to put message onto the queue. We have a deadlock.
I have also found out that it is of no use either: increasing the queue size , or: give the async-control-chan some buffer, as long as it is not a buffer that drops messages , or doing them both. Doing them both only delays the deadlock by a tiny bit.
Changing pool and alts!! to be deterministic in the sense that they will always take the control channel/socket first if that's available, together with sufficient buffer, may be able to avoid this deadlock. I will investigate further.
I guess the reason that people have not discovered this deadlock is that in real situations it happens rarely, since we don't usually create/close sockets at this speed.