Deadlock!

Hi,

I have found a situation where zmq-async would deadlock. The following code will do:

``` clj
(ns debug.async.core
  (:import [org.zeromq ZMQ$Socket])
  (:require [com.keminglabs.zmq-async.core :refer [register-socket!]]
            [clojure.core.async :refer [chan close! go >! <!]]))

(defn deadlock!
  []
  (let [write-in (chan)
        read-in (chan)
        read-out (chan)]
    (register-socket! {:in write-in :socket-type :push
                       :configurator (fn [^ZMQ$Socket s]
                                       (.bind s "tcp://127.0.0.1:19999"))})
    (register-socket! {:in read-in :out read-out :socket-type :pull
                       :configurator (fn [^ZMQ$Socket s]
                                       (.connect s "tcp://127.0.0.1:19999"))})

    ;; This and the next go blocks just send message from one socket to another repeatedly
    (go
      (loop [c 0]
        (>! write-in (str "send " c))
        (recur (inc c))))
    (go
      (loop []
        (println (String. (<! read-out)))
        (recur)))

    ;; This loop opens and closes sockets repeatedly
    (loop [c 0]
      (println "open-close " c)
      (let [in (chan)]
        (register-socket! {:in in :socket-type :pub
                           :configurator (fn [^ZMQ$Socket s]
                                           )})
        (close! in)
        (recur (inc c))))))
```

I also have a rough idea why the deadlock happens: in the above code, first, we have a lot of incoming messages from ZMQ sockets, so it is expected that the zmq-loop will spend a lot of time on [this line](https://github.com/lynaghk/zmq-async/blob/69ff7682d38ed964d84a7471a3b51e5492067bc3/src/com/keminglabs/zmq_async/core.clj#L110)

``` clj
(>!! async-control-chan [incoming-sock-id msg])
```

At the same time, we also have lots of requests for opening/closing sockets, so the async-loop will spend a lot of time putting commands to zmq sockets onto the [queue](https://github.com/lynaghk/zmq-async/blob/69ff7682d38ed964d84a7471a3b51e5492067bc3/src/com/keminglabs/zmq_async/core.clj#L124)

``` clj
  (.put queue msg)
```

Now, once the queue gets full, the put to the queue will block. If, at the same time, zmq-loop wants to put into async-control-chan as above, that would block too since async-loop still tries to put message onto the queue. We have a deadlock.

I have also found out that it is of no use either: [increasing the queue size](https://github.com/lynaghk/zmq-async/blob/69ff7682d38ed964d84a7471a3b51e5492067bc3/src/com/keminglabs/zmq_async/core.clj#L218) , or: [give the async-control-chan some buffer, as long as it is not a buffer that drops messages](https://github.com/lynaghk/zmq-async/blob/69ff7682d38ed964d84a7471a3b51e5492067bc3/src/com/keminglabs/zmq_async/core.clj#L220) , or doing them both. Doing them both only delays the deadlock by a tiny bit.

Changing [pool](https://github.com/lynaghk/zmq-async/blob/69ff7682d38ed964d84a7471a3b51e5492067bc3/src/com/keminglabs/zmq_async/core.clj#L54) and [alts!!](https://github.com/lynaghk/zmq-async/blob/69ff7682d38ed964d84a7471a3b51e5492067bc3/src/com/keminglabs/zmq_async/core.clj#L144) to be deterministic in the sense that they will always take the control channel/socket first if that's available, together with sufficient buffer, may be able to avoid this deadlock. I will investigate further.

I guess the reason that people have not discovered this deadlock is that in real situations it happens rarely, since we don't usually create/close sockets at this speed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock! #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Deadlock! #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions