ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

emaxx-google · 2024-12-20T01:02:20Z

I was able to reduce it down to the following example:

from concurrent.futures import FIRST_COMPLETED, wait
import time
import unittest

from pebble import ProcessPool
from pebble.common.types import CONSTS

def function(argument, sleep_interval):
    time.sleep(sleep_interval)
    return argument

class TestProcessPoolGeneric(unittest.TestCase):
    def test_big_values(self):
        BIG_VALUE = [0] * 1000 * 1000
        CNT = 50
        INITIAL_SLEEP = 1
        EPS = CONSTS.sleep_unit / 10

        futures = []
        with ProcessPool(max_workers=CNT) as pool:
            for i in range(CNT):
                futures.append(pool.schedule(function, args=[BIG_VALUE, INITIAL_SLEEP + i * EPS]))
            wait(futures, return_when=FIRST_COMPLETED)
            for f in futures:
                f.cancel()
            time.sleep(EPS * CNT / 2)
            pool.stop()
            pool.join()

emaxx-google · 2024-12-20T01:15:31Z

If I understand correctly, the occasional 60-second delay observed in join() is essentially the LOCK_TIMEOUT from channel.py.

The root cause seems to be:

pool.stop() causes the message_manager_loop thread to terminate.
Still, we have active workers that try to write their (big) results to the IPC pipe. As no one reads from the pipe anymore (due to # 1), at some point one of the workers gets stuck on a send() call - while still holding the ChannelMutex.writer_mutex.
In the main process, the pool_manager_loop thread is still running and tries to apply the changes from cancelled_tasks, and eventually calls into stop_worker() for the worker that's stuck above in # 2. We get a deadlock because we're trying to acquire the ChannelMutex.writer_mutex as well.
On the main thread, the pool.join() call eventually joins the pool_manager_loop thread, which deadlocks the main thread as well.

The deadlock is recovered due to the LOCK_TIMEOUT delay used by ChannelMutex: after this time interval the # 3 gets unblocked and reports ChannelError which is then wrapped into BrokenProcessPool.

emaxx-google · 2024-12-20T01:21:42Z

As for the ways to fix it, I'm thinking of maybe letting the message_manager_loop thread run a bit longer - to keep pumping the message pipe until the workers are actually shut down by PoolManager?

emaxx-google · 2025-01-15T13:58:03Z

I've added a diagram of what I believe is happening during this hang, and also to clarify the fix I proposed:

Signed-off-by: Matteo Cafasso <[email protected]>

noxdafox · 2025-01-26T17:02:39Z

Hello,

this is a known limitation and it's covered within these test cases.

As the documentation states, Python multiprocessing guidelines fully apply to Pebble as well. In particular, I am addressing the following section:

Avoid shared state

    As far as possible one should try to avoid shifting large amounts of data between processes.

The reason why this is an anti-patter is due to few reason.

It defeats the performance advantages of parallel processing. As the pool workers share a single pipe they will contend it. If the main loop is not fast enough maintaining the pipe empty, your workers will be starving wasting lots of CPU cycles.
Python pipes uses pickle as serialization format for data which might not be optimal (or even functional) for certain data structures.
It can leads to corner cases as the one you are facing.

The recommended way to handle these situations is to use files rather than sending large amount of data back-and-forth between the pipe. What is observed in most of the times is a significant increase of performance within the application itself. (Some references: 1, 2).

Your main loop and your workers would write down the data in dedicated files and only share between each others the files path. This allows an optimal usage of the pipe (an empty pipe is a happy pipe) and leads to few benefits:

I/O in modern computers is way more scalable than mutually exclusive memory
You can choose your serialization format for the data resulting in potentially faster I/O
In case of error, you can inspect the files left behind to better troubleshoot your application
You can significantly reduce the amount of memory consumed by your application as you have direct control of what you load from the files (lazy reads and whatnot)

That said, I do agree that the current implementation is sub-optimal as it's thought around the usage recommendations. Yet the solution you propose cannot be accepted due to security reasons. The expectation over the stop method is that the pool can be reliably stopped upon request. It does not matter if in some cases it might take up to 60 seconds to do so. The main loop can trust that the pool will eventually stop.

Waiting for the pipe to be drained does not provide the same guarantees. A malicious worker process could keep slowly feeding the pipe literally locking the main loop indefinitely.

Hence, a better solution should be devised. Indeed when we ask the pool to stop (compared to the close method), we imply that we don't care about any further computation anymore. This knowledge could be propagated within the components making sure we don't need to lock the internal pipe for example.

I will come with a proper implementation in the upcoming days. In the meantime I have 2 recommendations for you:

You follow the above suggestions and implement a file-based I/O instead of delegating to the pipe the work.
I just added a commit which exposes the mutex timeout as a configurable value. If you can not proceed with 1, you can de-crease the mutex timeout as per will at the moment until a better solution is not implemented.

emaxx-google · 2025-01-27T01:28:39Z

Thank you for the detailed comment. I'll look into the workarounds you mentioned.

However I have a question regarding the concern you expressed regarding the PR:

Yet the #148 cannot be accepted due to security reasons. The expectation over the stop method is that the pool can be reliably stopped upon request. It does not matter if in some cases it might take up to 60 seconds to do so. The main loop can trust that the pool will eventually stop.

Waiting for the pipe to be drained does not provide the same guarantees. A malicious worker process could keep slowly feeding the pipe literally locking the main loop indefinitely.

My understanding is that the PR doesn't introduce waiting for the pipe to be drained. The pipe draining is still executed on the background thread, and this thread's lifetime is only prolonged to (roughly) until PoolManager.stop() completes - which, as before, should complete with the 60-second safeguards at the worst case.

I've also updated the PR to add a unit test for the situation you described (a spammy worker) - if I understood it correctly.

emaxx-google · 2025-01-27T02:08:23Z

Regarding your other suggestion on large amount of data: I've double-checked the application where the problem was originally observed (marxin/cvise#41), and the sizes of the messages are actually moderate: up to 2 KB. Still, the problem is observed with standard parallelization settings (e.g., 64 workers on my machine). Probably the specified deadlock scenario is bounded by 64 KiB divided by the number of workers - https://github.com/python/cpython/blob/a8dc6d6d44a141a8f839deb248a02148dcfb509e/Lib/multiprocessing/connection.py#L43 ? Asking the library consumer to use file-based I/O even for a kilobyte of data seems like an overkill.

noxdafox · 2025-02-02T21:36:49Z

2Kb is definitely too small to be the issue in there but it seems the issue in the linked ticket is not due to that. The user had a dual-socket workstation and this might be a different matter.

The side-effect with the stuffed pipe is known but I've never stumbled across issues with termination of the processes with small files.

In order to stop cancelled or timing out tasks, the Pool Manager loop needs to acquire the channel lock to prevent any worker from accessing either the lock or the pipe during its termination. The previous logic would try to acquire the lock indefinitely. This led to possible deadlocks in case the Pool Manager was trying to stop cancelled tasks with the results channel filled up while the pool was terminated. This was due to the workers holding the channel lock while being unable to push the data as the Message Manager loop was not reading it anymore. The new Pool Manager loop does not try to acquire the lock indefinitely but rather polls it for availability. This allows the loop to continue assessing its state and correctly terminate upon request. The drawback of this implementation is that timing out or cancelled tasks might be stopped a little later in case the pipe is busy transferring lots of data.

The new logic is not affected by those deadlocks anymore.

noxdafox · 2025-02-26T22:08:51Z

Thanks for reporting this issue with proper means to reproduce it!

I opted for an alternative solution and here is why:

The pool is supported by three maintenance loops. These loops are meant to be isolated and independent to minimize the risk of deadlock. You solution breaks this assumption and, although it solves the issue in a simple way, it cannot guarantee the risk of introducing new deadlocks due to the newly added inter-dependence.
Another assumption is that the loops cannot get blocked and can continue assessing their state. This is what your example code proved to be wrong. Indeed, there is a situation in which the Pool Manager loop can get blocked in case of cancelled tasks on a filled up pipe.

The solution I opted for is not to attempt to acquire the channel locks indefinitely but to rather test their availability. This allows the Pool Manager loop to continue assessing the Pool's state and exit in case of need.
The drawback of this solution is that timing out or cancelled tasks might be stopped a little later in case the communication channel is locked. Yet, this should be rare as the IPC should be contained and fast.

Since this change is in the critical path and Pebble has a somewhat considerable adoption, I will need to run some thorough load tests before releasing this fix. Hopefully it's done by mid-March.

emaxx-google linked a pull request Dec 20, 2024 that will close this issue

Avoid deadlock on channel mutex when stopping pool #148

Open

noxdafox added a commit that referenced this issue Jan 25, 2025

issue #147: expose channel lock timeout parameter

036da68

Signed-off-by: Matteo Cafasso <[email protected]>

noxdafox added a commit that referenced this issue Jan 26, 2025

issue #147: fix tests

8f8ac07

Signed-off-by: Matteo Cafasso <[email protected]>

noxdafox added the bug label Feb 26, 2025

noxdafox added a commit that referenced this issue Feb 26, 2025

issue #147: add Pool's deadlock missing tests

a594fd3

noxdafox added a commit that referenced this issue Feb 26, 2025

issue #147: add cancel deadlock test cases

760ba95

noxdafox added a commit that referenced this issue Feb 26, 2025

issue #147: remove deadlock tests

5e5b6cc

The new logic is not affected by those deadlocks anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

emaxx-google commented Dec 20, 2024 •

edited

Loading

emaxx-google commented Dec 20, 2024 •

edited

Loading

emaxx-google commented Dec 20, 2024

emaxx-google commented Jan 15, 2025

noxdafox commented Jan 26, 2025 •

edited

Loading

emaxx-google commented Jan 27, 2025 •

edited

Loading

emaxx-google commented Jan 27, 2025

noxdafox commented Feb 2, 2025

noxdafox commented Feb 26, 2025 •

edited

Loading

ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

ProcessPool join hangs for 60 seconds due to intermittent deadlock #147

Comments

emaxx-google commented Dec 20, 2024 • edited Loading

emaxx-google commented Dec 20, 2024 • edited Loading

emaxx-google commented Dec 20, 2024

emaxx-google commented Jan 15, 2025

noxdafox commented Jan 26, 2025 • edited Loading

emaxx-google commented Jan 27, 2025 • edited Loading

emaxx-google commented Jan 27, 2025

noxdafox commented Feb 2, 2025

noxdafox commented Feb 26, 2025 • edited Loading

emaxx-google commented Dec 20, 2024 •

edited

Loading

emaxx-google commented Dec 20, 2024 •

edited

Loading

noxdafox commented Jan 26, 2025 •

edited

Loading

emaxx-google commented Jan 27, 2025 •

edited

Loading

noxdafox commented Feb 26, 2025 •

edited

Loading