multiprocessing's default posix start method of `'fork'` is broken: change to `'spawn'` #84559

itamarst · 2020-04-24T18:22:23Z

BPO	40379
Nosy	@pitrou, @mgorny, @Julian, @wimglenn, @applio, @itamarst

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-04-24.18:22:23.389>
labels = ['3.8', 'type-bug', '3.7', '3.9']
title = "multiprocessing's default start method of fork()-without-exec() is broken"
updated_at = <Date 2022-02-11.16:13:53.872>
user = 'https://bugs.python.org/itamarst'

bugs.python.org fields:

activity = <Date 2022-02-11.16:13:53.872>
actor = 'mgorny'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2020-04-24.18:22:23.389>
creator = 'itamarst'
dependencies = []
files = []
hgrepos = []
issue_num = 40379
keywords = []
message_count = 11.0
messages = ['367210', '367211', '368173', '380478', '392358', '392501', '392503', '392506', '392507', '392508', '413081']
nosy_count = 8.0
nosy_names = ['pitrou', 'mgorny', 'Julian', 'wim.glenn', 'itamarst', 'davin', 'itamarst2', 'aduncan']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue40379'
versions = ['Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

Linked PRs

The text was updated successfully, but these errors were encountered:

itamarst · 2020-04-24T18:22:23Z

By default, multiprocessing uses fork() without exec() on POSIX. For a variety of reasons this can lead to inconsistent state in subprocesses: module-level globals are copied, which can mess up logging, threads don't survive fork(), etc..

The end results vary, but quite often are silent lockups.

In real world usage, this results in users getting mysterious hangs they do not have the knowledge to debug.

The fix for these people is to use "spawn" by default, which is the default on Windows.

Just a small sample:

Today I talked to a scientist who spent two weeks stuck, until she found my article on the subject (https://codewithoutrules.com/2018/09/04/python-multiprocessing/). Basically multiprocessing locked up, doing nothing forever. Switching to "spawn" fixed it.
Default multiprocessing context is broken and should never be used dask/dask#3759 (comment) is someone who had issues fixed by "spawn".
matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn numpy/numpy#15973 is a NumPy issue which apparently impacted scikit-learn.

I suggest changing the default on POSIX to match Windows.

itamarst · 2020-04-24T18:31:06Z

Looks like as of 3.8 this only impacts Linux/non-macOS-POSIX, so I'll amend the above to say this will also make it consistent with macOS.

itamarst · 2020-05-05T15:35:22Z

Just got an email from someone for whom switching to "spawn" fixed a problem. Earlier this week someone tweeted about this fixing things. This keeps hitting people in the real world.

itamarst · 2020-11-06T22:02:53Z

Another person with the same issue: https://twitter.com/volcan01010/status/1324764531139248128

aduncan · 2021-04-29T23:10:59Z

I just ran into and fixed (thanks to itamarst's blog post) a problem likely related to this.

Multiprocessing workers performing work and sending a logging message back with success/fail info. I had a few intermittent deadlocks that became a recurring problem when I sped up the process that skipped tasks which had previously completed (I think this shortened the time between forking and attempting to send messages causing the third process to deadlock). After changing that it deadlocked *every time*.

Switching to "spawn" at the top of the main function has fixed it.

pitrou · 2021-04-30T18:54:13Z

The problem with changing the default is that this will break any application that depends on passing non-picklable data to the child process (in addition to the potentially unexpected performance impact).

The docs already contain a significant elaboration on the matter, but feel free to submit a PR that would make the various caveats more explicit:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

itamarst · 2021-04-30T19:00:00Z

This change was made on macOS at some point, so why not Linux? "spawn" is already the default on macOS and Windows.

pitrou · 2021-04-30T19:15:18Z

The macOS change was required before "fork" simply ceased to work.
Windows has always used "spawn", because no other method can be implemented on Windows.

itamarst · 2021-04-30T19:27:07Z

Given people's general experience, I would not say that "fork" works on Linux either. More like "99% of the time it works, 1% it randomly breaks in mysterious way".

pitrou · 2021-04-30T19:31:25Z

Agreed, but again, changing will break some applications.

We could switch to forkserver, but we should have a transition period where a FutureWarning will be displayed if people didn't explicitly set a start method.

mgorny · 2022-02-11T16:13:54Z

After updating PyPy3 to use Python 3.9's stdlib, we hit very bad hangs because of this — literally compiling a single file with "parallel" compileall could hang. In the end, we had to revert the change in how Python 3.9 starts workers because otherwise multiprocessing would be impossible to use:

https://foss.heptapod.net/pypy/pypy/-/commit/c594b6c48a48386e8ac1f3f52d4b82f9c3e34784

This is a very bad default and what's even worse is that it often causes deadlocks that are hard to reproduce or debug. Furthermore, since "fork" is the default, people are unintentionally relying on its support for passing non-pickleable projects and are creating non-portable code. The code often becomes complex and hard to change before they discover the problem.

Before we managed to figure out how to workaround the deadlocks in PyPy3, we were experimenting with switching the default to "spawn". Unfortunately, we've hit multiple projects that didn't work with this method, precisely because of pickling problems. Furthermore, they were surprised to learn that their code wouldn't work on macOS (in the end, many people perceive Python as a language for writing portable software).

Finally, back in 2018 I've made one of my projects do parallel work using multiprocessing. It gave its users great speedup but for some it caused deadlocks that I couldn't reproduce nor debug. In the end, I had to revert it. Now that I've learned about this problem, I'm wondering if this wasn't precisely because of "fork" method.

Provide a way for the calling code to specify which "multiprocessing context" to use to spawn subprocesses. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods I'm using this to allow us to mock out multiprocessing with multithreading in doctests. This will also let you more easily test differences between "spawn" and "fork" modes. I'm defaulting to using "spawn" because I think "fork" mode was the cause of some mysterious hanging in tests. General consensus seems to be "spawn" is less buggy: python/cpython#84559 I've felt like tests are consistently faster with it. Also uses the `multiprocessing.Manager` as a context manager so it gets cleaned up correctly. This might have been the cause of other hanging in local cluster execution.

itamarst · 2022-09-21T22:03:51Z

Another example: Nelson Elhage reports that "as of recently(?) pytorch silently deadlocks (even without GPUs involved at all) using method=fork so that's been fun to debug".

Examples he provided:

Tensor operation hangs when used with multiprocessing pytorch/pytorch#82843
Dataloader hangs. Potential deadlock with set_num_threads in worker processes? pytorch/pytorch#75147
Deadlock with multiprocessing (using fork) and OpenMP / PyTorch should warn after OMP and fork that multithreading may be broken pytorch/pytorch#17199

ravwojdyla · 2022-12-06T22:15:28Z

After updating a couple of libraries in a project we are working on, the code would hang without much explanation. After much debugging, I think one of the reasons for our issues is the forking default (this issue). Our business logic does not use multiprocessing, but the underlying execution engine does (in our case Luigi). Turns out that gRPC client (which was buried deep into one of our dependencies) can hang in some cases when forked grpc/grpc#18075. This was the case for us, and was very tricky to debug.

gpshead · 2022-12-13T21:21:32Z

general plan:

A DeprecationWarning in 3.12 and 3.13 when the default not-explicitly-specified start method of fork is used on platforms where that is the default.
3.14: flip the default for all platforms to spawn.

per https://discuss.python.org/t/switching-default-multiprocessing-context-to-spawn-on-posix-as-well/21868

gpshead · 2023-02-03T19:03:20Z

Early feedback is exactly what we want, thanks! 😃 ❤️

Are the warnings being attributed to code you do not control? How are they un-actionable? Not wanting to take action is not the same as un-actionable.

If the warnings appear attributed to code you do not control and have no feedback channel into (bugs, PRs, etc), that could count. But these are DeprecationWarning, those are filtered by default. You should only be seeing them from unittests or __main__ code as those are contexts that imply "developer or code owner" where appropriate actions can be taken.

If you add an explicit multiprocessing.set_start_method call to your __main__ program the warning will go away:

import multiprocessing
multiprocessing.set_start_method("spawn")

If you want to declare "forkserver" or "fork" for better performance when possible while keeping your code cross-platform safe, the logic probably looks like:

import multiprocessing, sys
multiprocessing.set_start_method("forkserver" if sys.platform not in ("darwin", "win32"))

I don't like raw sys.platform values needing to exist in people's code. No-one should. But we have no way of explicitly saying "faster than spawn if possible please". If we added an ability to do that (not a bad idea), it would still be ugly for people supporting multiple Python versions:

import multiprocessing
try:
    multiprocessing.set_start_method("faster-than-spawn-if-safe")  # (A possible 3.14 default)
except ValueError:  # Python versions before 3.12
    multiprocessing.set_start_method("spawn")  # always safe

Caveat: I assume adding any of these would get tediously annoying to do within every single *_test.py file in a mp-heavy project. Which is why it's preferable to do it at the presumed-fewer multiprocessing/concurrent.futures call-sites themselves.

Put on a hat of someone who's code is not going to work after the start method changes. Instead of getting a warning to force them to acknowledge it and make their code's intent explicit, they'll suddenly be broken in a future release such as 3.14. We don't like treating our users that way when it can be avoided.

We effectively did that to people suddenly in 3.8 on macOS with it's change to 'spawn'. Because the platform broke so we had no choice in how to deal with the emergency there (see the long #77906 thread).

Some people do write code that depends on 'fork' sharing semantics for Linux+BSD, thus the deprecation period with a warning we're attempting to implement in this PR. We want people who depend on it to declare their dependency with an explicit "fork" specification. It'd be ideal to only warn "when needed", but there is no practical way to detect if somebody's code is relying on fork specific semantics.

The quiet alternative is to disable this new warning and have it be a documentation notice-only deprecation. That gives us an ability to smugly say "we told you so". But doesn't leave anyone who's code gets broken happy.

Which is less disruptive on the whole (not just to you)?

A: Guaranteeing that some people requiring "fork" will have an unhappy surprise to debug in 3.14?
B: Making more people annoyed in 3.12 that they're required to make their code's intent explicit.

I do realize that a consequence of this warning is that we're trying to force people into explicit is better than implicit use of the API during the transition period. It is hard to see being explicit about intent as a problem though.

asottile · 2023-02-03T20:01:03Z

Are the warnings being attributed to code you do not control? How are they un-actionable? Not wanting to take action is not the same as un-actionable.

it's both code I control and code I don't control. some of it is not wanting to take action, some of it I cannot control.

the root of it for me is, I have already done my due diligence to create cross platform software that works correctly given either default (mostly by nature of targetting windows (and now macos)) -- I should not be punished by a DeprecationWarning for doing so.

since I work on popular software, OS packagers with -Werror are going to be knocking on my door telling me my software is broken when in reality I've already done the hard work to ensure it is correct. I'll get drive-by (WRONG) PRs "following directions" from the warnings and forcing fork or spawn. this wastes my already limited time for open source on noise when I could be building cool new useful things

if the code today is correct and the code after 3.14 is correct, I shouldn't be getting a warning telling me to change it for 3.12 and 3.13

if I have to write a bunch of ugly code to re-introduce the default just so I don't get a bunch of annoying noisy issues / PRs for a DeprecationWarning that imo is wrong and that I don't care about I'm not happy

additionally all (except the ones specifically about a context type) of the documentation examples for multiprocessing will now fail with -Werror -- the documented way to do things should not be deprecated

gpshead · 2023-02-03T20:38:34Z

I understand your frustration.

"If ... the code after 3.14 is correct" is impossible to detect and issue a warning about. That is what we want maintainers of code to manually verify and explicitly declare in their code.

How do you propose to get people to do this without a warning?

The root of the problem is that multiprocessing ever had a default in the first place - or at least that it wasn't the guaranteed safest method. (This mistake was made when pulling the original third party library in to become multiprocessing in ~2.6)

asottile · 2023-02-03T20:40:21Z

my assumption without data is that the vast majority of people have already made the necessary changes after 3.8 and that the warning is unnecessary

gpshead · 2023-02-03T21:04:38Z

I suspect that's the best we can do. Apple's popularity means the majority of widely used things already dealt with start method compatibility? I'll remove the new warning.

pitrou · 2023-02-03T21:18:43Z

my assumption without data is that the vast majority of people have already made the necessary changes after 3.8 and that the warning is unnecessary

I'm extremely skeptical about that. There's a lot of software that doesn't care about running on macOS or Windows.

asottile · 2023-02-03T21:22:29Z

I doubt it's more than a fraction of a percent of python users or pypi packages -- anything that people actually use has already been updated or is abandoned

This reverts the core of python#100618 while leaving relevant documentation improvements and minor refactorings in place.

ppwwyyxx · 2023-02-03T22:49:58Z

If we remove the warning it follows that "some people requiring "fork" will have an unhappy surprise to debug in 3.14":

Here is one suggestion that might make their debugging experience slightly better: detect any pickling exception in ForkingPickler and print a helpful message that says "if your code used to work, then the change of default start method is likely the reason. Here is how to fix it..".

In my experience pickling has been the biggest source of incompatibilities between fork and spawn/forkserver.

…01551) This reverts the core of #100618 while leaving relevant documentation improvements and minor refactorings in place.

We drop 'fork' in favor of 'forkserver' or 'spawn'. See the issue for details.

hafidhrendyanto · 2023-04-22T11:26:38Z

Had this exact issue. I wrote custom script to train a Reinforcement Learning model using Tensorflow 2 on multiple process. The code works well on mac but unexpectedly hangs when I upload the code to my linux server. Waste multiple hours just to debug this. I'm happy the community is working to fix this issue!

guillaumematheron · 2023-06-05T06:48:01Z

This issue can cause duplicate uuids to be generated when using uuid1.
It looks like the first call to uuid1 sets up a global state, which is then copied to all processes when spawning a multiprocessing Pool.
Then all processes generate the same sequence of uuids.

See ClickHouse/clickhouse-connect#194 for more context on the issue and an example.

pitrou · 2023-06-05T06:59:18Z

@guillaumematheron Which issue? The fact that fork is being used?

Regardless, uuid is an inefficient way to generate random ids.

guillaumematheron · 2023-06-05T07:08:16Z

Yes, as outlined in this comment, switching to spawn instead of fork prevents duplicate uuids from being generated.

Of course uuid1 is not a good way to generate random ids, since it's not random at all if a network address and counter are available.

But it should be safe to assume that it is unique, especially if it explicitly returns a flag saying that the value is "generated by the platform in a multiprocessing-safe way". Maybe it's worth adding a caveat to the uuid1 documentation ?

pitrou · 2023-06-05T07:10:41Z

Ping @warsaw on the UUID multiprocessing-safety issue.

gpshead · 2023-06-05T14:43:04Z

Please file a separate issue for the uuid module. It could be a reasonable decision to refresh global state like that upon fork, but that isn't going to happen buried in this issue. You could probably do it yourself today via os.register_at_fork().

itamarst mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes type-bug An unexpected behavior, bug, or error labels Apr 24, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner added the topic-multiprocessing label Jun 6, 2022

ravwojdyla mentioned this issue Dec 6, 2022

Add support/option to pick TaskProcess worker process context spotify/luigi#3212

Open

gpshead changed the title ~~multiprocessing's default start method of fork()-without-exec() is broken~~ multiprocessing's default posix start method of fork()-without-exec() is broken: change the default so spawn Dec 13, 2022

gpshead added type-feature A feature request or enhancement and removed 3.9 only security fixes 3.8 only security fixes 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Dec 13, 2022

gpshead self-assigned this Dec 13, 2022

gpshead reopened this Feb 3, 2023

python deleted a comment from asottile Feb 3, 2023

python deleted a comment from pitrou Feb 3, 2023

gpshead added a commit to gpshead/cpython that referenced this issue Feb 3, 2023

pythongh-84559: Remove the new mp warning, too disruptive.

20cec00

This reverts the core of python#100618 while leaving relevant documentation improvements and minor refactorings in place.

bedevere-bot mentioned this issue Feb 3, 2023

gh-84559: Remove the new multiprocessing warning, too disruptive. #101551

Merged

gpshead added a commit that referenced this issue Feb 3, 2023

gh-84559: Remove the new multiprocessing warning, too disruptive. (#1…

d4c410f

…01551) This reverts the core of #100618 while leaving relevant documentation improvements and minor refactorings in place.

gpshead added a commit to gpshead/cpython that referenced this issue Feb 4, 2023

pythongh-84559: Change the default multiprocessing start method.

c5277b7

We drop 'fork' in favor of 'forkserver' or 'spawn'. See the issue for details.

bedevere-bot mentioned this issue Feb 4, 2023

gh-84559: multiprocessing start method default away from fork #101556

Draft

5 tasks

arhadthedev mentioned this issue Feb 22, 2023

Deadlock when using fork whilst multiprocessing.resource_tracker._resource_tracker._lock is held #96971

Closed

benclifford mentioned this issue Apr 3, 2023

Multiprocessing 'fork' method is unsafe in the presence of threads Parsl/parsl#2343

Open

arhadthedev mentioned this issue May 25, 2023

Idle: use pipes instead of sockets to talk with user subprocess #63023

Open

genzgd mentioned this issue Jun 2, 2023

uuid1 is not a thread-safe way of generating session_id and can result in SESSION_IS_LOCKED ClickHouse/clickhouse-connect#194

Closed

guillaumematheron mentioned this issue Jun 5, 2023

uuid1 generates identical UUIDs after a fork, even though it returns SafeUUID.safe #105337

Open

pcmoritz mentioned this issue Jul 7, 2023

[Core] Using multiprocessing to start a child process in a ray worker and starting a grandchild process in the child process again causes a hang ray-project/ray#31263

Open

benclifford mentioned this issue Jul 19, 2023

TaskVineExecutor: add new features Parsl/parsl#2809

Merged

martinpakosch mentioned this issue Oct 13, 2023

Fundamental incompatibility of Threading and Multiprocessing (Python) knipknap/exscript#230

Open

multiprocessing's default posix start method of `'fork'` is broken: change to `'spawn'` #84559

multiprocessing's default posix start method of `'fork'` is broken: change to `'spawn'` #84559

itamarst mannequin commented Apr 24, 2020 •

edited by bedevere-bot

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented May 5, 2020

itamarst mannequin commented Nov 6, 2020

aduncan mannequin commented Apr 29, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

mgorny mannequin commented Feb 11, 2022

itamarst commented Sep 21, 2022

ravwojdyla commented Dec 6, 2022

gpshead commented Dec 13, 2022

gpshead commented Feb 3, 2023

asottile commented Feb 3, 2023

gpshead commented Feb 3, 2023

asottile commented Feb 3, 2023

gpshead commented Feb 3, 2023

pitrou commented Feb 3, 2023

asottile commented Feb 3, 2023

ppwwyyxx commented Feb 3, 2023

hafidhrendyanto commented Apr 22, 2023

guillaumematheron commented Jun 5, 2023

pitrou commented Jun 5, 2023

guillaumematheron commented Jun 5, 2023

pitrou commented Jun 5, 2023

gpshead commented Jun 5, 2023

multiprocessing's default posix start method of 'fork' is broken: change to 'spawn' #84559

multiprocessing's default posix start method of 'fork' is broken: change to 'spawn' #84559

Comments

itamarst mannequin commented Apr 24, 2020 • edited by bedevere-bot

Linked PRs

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented May 5, 2020

itamarst mannequin commented Nov 6, 2020

aduncan mannequin commented Apr 29, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

mgorny mannequin commented Feb 11, 2022

itamarst commented Sep 21, 2022

ravwojdyla commented Dec 6, 2022

gpshead commented Dec 13, 2022

gpshead commented Feb 3, 2023

asottile commented Feb 3, 2023

gpshead commented Feb 3, 2023

asottile commented Feb 3, 2023

gpshead commented Feb 3, 2023

pitrou commented Feb 3, 2023

asottile commented Feb 3, 2023

ppwwyyxx commented Feb 3, 2023

hafidhrendyanto commented Apr 22, 2023

guillaumematheron commented Jun 5, 2023

pitrou commented Jun 5, 2023

guillaumematheron commented Jun 5, 2023

pitrou commented Jun 5, 2023

gpshead commented Jun 5, 2023

multiprocessing's default posix start method of `'fork'` is broken: change to `'spawn'` #84559

multiprocessing's default posix start method of `'fork'` is broken: change to `'spawn'` #84559

itamarst mannequin commented Apr 24, 2020 •

edited by bedevere-bot