Skip to content

multiprocessing with maxtasksperchild can hang if unpickling causes import #93580

Closed as not planned
@bmerry

Description

@bmerry

Bug report

This seems like another specific instance of the general issue identified in #50970.

If multiprocessing.Pool.map_async is used with maxtasksperchild and a value returned by a task is of a class not currently imported by the calling process, it can lead to a hang. Here is an example that reliably hangs for me, but which exits cleanly if ElementTree is imported at the top level.

#!/usr/bin/env python

import os
import multiprocessing

def worker(num: int):
    from xml.etree.ElementTree import ElementTree
    print(f"Worker {num} with pid {os.getpid()}")
    return ElementTree()

def main(cores: int = 4, num: int = 6):
    pool = multiprocessing.Pool(processes=cores, maxtasksperchild=1)
    barList = list(pool.map_async(worker, list(range(num))).get())
    print(barList)

if __name__ == "__main__":
    main()

Running py-spy dump on one of the workers shows this backtrace:

Process 47102: python ./demo_core.py
Python v3.10.4 (/usr/bin/python3.10)

Thread 47102 (idle): "Thread-1 (_handle_workers)"
    acquire (<frozen importlib._bootstrap>:120)
    __enter__ (<frozen importlib._bootstrap>:171)
    _find_and_load (<frozen importlib._bootstrap>:1024)
    worker (demo_core.py:8)
    mapstar (multiprocessing/pool.py:48)
    worker (multiprocessing/pool.py:125)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:71)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _repopulate_pool_static (multiprocessing/pool.py:326)
    _maintain_pool (multiprocessing/pool.py:337)
    _handle_workers (multiprocessing/pool.py:513)
    run (threading.py:946)
    _bootstrap_inner (threading.py:1009)
    _bootstrap (threading.py:966)

My guess (without any further proof) is that the main process receives a pickled ElementTree and starts importing the module. Concurrently, another thread realises it needs to start a new worker, so does a fork(). The child process has a half-imported, locked ElementTree module, and tries to import it again, leading to a deadlock.

Note that this is nothing to do with ElementTree - I get the same behaviour with numpy. I chose ElementTree as a reasonably complex module (to maximise the window for the race condition) with a picklable class.

Personally I consider the fork model of multiprocessing to be dangerous and requiring of care to ensure all worker tasks are created before doing anything that can conceivably create threads, and definitely a bad combination with maxtasksperchild. So I won't shed any tears if the resolution is "won't fix, don't do that". But #50970 (comment) seems to suggest that @vstinner has some appetite for addressing such issues and hence I'm filing this.

Your environment

  • CPython versions tested on: 3.8.10, 3.10.4
  • Operating system and architecture: Ubuntu 20.04, x86_64

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions