Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing sometimes deadlocks on join() when queue is used (but emptied) #91776

Open
mutax opened this issue Apr 21, 2022 · 1 comment
Open
Labels
type-bug

Comments

@mutax
Copy link

@mutax mutax commented Apr 21, 2022

I experienced deadlocks when using the logging example for multiprocessing, where a QueueHandler is used.

I found that sometimes a multiprocessing process is not terminating, when it has put an element in a Queue,
even if the parent process runs a thread to empty the queue and successfully retrieved the item.

The worker looks like this:

def worker(q):
    q.put(current_process().name)
    return

and while this works:

workers = []
for i in range(15):
    p = Process(target=worker, args=(logq,), name=f"Worker {i+1}")
    workers.append(p)

for p in workers:
    p.start()
    time.sleep(.1)

removing the sleep leads to a very high propability of deadlocking when I then try to join the processes, e.g,:

for w in workers:
    print(f'trying to join on {w.name}, alive={w.is_alive()}, exitcode={w.exitcode}', w.name, w.is_alive(), w.exitcode)
    w.join()

The Process is still marked as alive, hitting Ctrl+C gives this:

trying to join on Worker 14, alive=True, exitcode=None Worker 14 True None

^CTraceback (most recent call last):
  File "/home/fls/pybug/deadlock.py", line 43, in <module>
    w.join()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 43, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

Can reproduce using Python 3.9.10 and 3.10.4 on Linux:

import time
import threading
from multiprocessing import Process, Queue, current_process

def logger_thread(q: Queue):
    while True:
        record = q.get()
        if record is None:
            break
        print("logrecord: ", record)

def worker(q):
    q.put(current_process().name)
    return

logq = Queue()

lp = threading.Thread(target=logger_thread, args=(logq,), daemon=True)
lp.start()

workers = []
for i in range(15):
    p = Process(target=worker, args=(logq,), name=f"Worker {i+1}")
    workers.append(p)

print("starting workers")

for p in workers:
    p.start()
    # no deadlock when added:
    # time.sleep(.1)

print("waiting a bit")
time.sleep(1)
print("trying to join workers")

for w in workers:
    print(f'trying to join on {w.name}, alive={w.is_alive()}, exitcode={w.exitcode}', w.name, w.is_alive(), w.exitcode)
    w.join()
    print(f'joined on {w.name}', w.name)

logq.put(None)
lp.join()
@mutax mutax added the type-bug label Apr 21, 2022
@mutax
Copy link
Author

@mutax mutax commented Apr 21, 2022

When enabling debugging in the multiprocessing logger, I get

2022-04-21 05:22:04,373 [    INFO] [Worker 6   ]:189454 util.py:54 process exiting with exitcode 0

but the pid is still running and join() thus hangs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug
Projects
None yet
Development

No branches or pull requests

1 participant