-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe #11671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall I think this PR is looking good.
When used to run a new executable image, fork() is not a good choice for process creation, especially if the parent has a large working set: fork() needs to copy page tables, which is slow, and may fail on systems where overcommit is disabled, despite that the child is not going to touch most of its address space. Currently, subprocess is capable of using posix_spawn() instead, which normally provides much better performance. However, posix_spawn() does not support many of child setup operations exposed by subprocess.Popen(). Most notably, it's not possible to express `close_fds=True`, which happens to be the default, via posix_spawn(). As a result, most users can't benefit from faster process creation, at least not without changing their code. However, Linux provides vfork() system call, which creates a new process without copying the address space of the parent, and which is actually used by C libraries to efficiently implement posix_spawn(). Due to sharing of the address space and even the stack with the parent, extreme care is required to use vfork(). At least the following restrictions must hold: * No signal handlers must execute in the child process. Otherwise, they might clobber memory shared with the parent, potentially confusing it. * Any library function called after vfork() in the child must be async-signal-safe (as for fork()), but it must also not interact with any library state in a way that might break due to address space sharing and/or lack of any preparations performed by libraries on normal fork(). POSIX.1 permits to call only execve() and _exit(), and later revisions remove vfork() specification entirely. In practice, however, almost all operations needed by subprocess.Popen() can be safely implemented on Linux. * Due to sharing of the stack with the parent, the child must be careful not to clobber local variables that are alive across vfork() call. Compilers are normally aware of this and take extra care with vfork() (and setjmp(), which has a similar problem). * In case the parent is privileged, special attention must be paid to vfork() use, because sharing an address space across different privilege domains is insecure[1]. This patch adds support for using vfork() instead of fork() on Linux when it's possible to do safely given the above. In particular: * vfork() is not used if credential switch is requested. The reverse case (simple subprocess.Popen() but another application thread switches credentials concurrently) is not possible for pure-Python apps because subprocess.Popen() and functions like os.setuid() are mutually excluded via GIL. We might also consider to add a way to opt-out of vfork() (and posix_spawn() on platforms where it might be implemented via vfork()) in a future PR. * vfork() is not used if `preexec_fn != None`. With this change, subprocess will still use posix_spawn() if possible, but will fallback to vfork() on Linux in most cases, and, failing that, to fork(). [1] https://ewontfix.com/7
@gpshead Would you mind to take a look? |
It isn't clear if setsid() is safe after vfork(); better safe than sorry. Also adds a couple comments and some debug build assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work! Waiting on CI and doing some test runs myself. I expect to merge it shortly if that goes well.
|
|
…fe (pythonGH-11671) * bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe When used to run a new executable image, fork() is not a good choice for process creation, especially if the parent has a large working set: fork() needs to copy page tables, which is slow, and may fail on systems where overcommit is disabled, despite that the child is not going to touch most of its address space. Currently, subprocess is capable of using posix_spawn() instead, which normally provides much better performance. However, posix_spawn() does not support many of child setup operations exposed by subprocess.Popen(). Most notably, it's not possible to express `close_fds=True`, which happens to be the default, via posix_spawn(). As a result, most users can't benefit from faster process creation, at least not without changing their code. However, Linux provides vfork() system call, which creates a new process without copying the address space of the parent, and which is actually used by C libraries to efficiently implement posix_spawn(). Due to sharing of the address space and even the stack with the parent, extreme care is required to use vfork(). At least the following restrictions must hold: * No signal handlers must execute in the child process. Otherwise, they might clobber memory shared with the parent, potentially confusing it. * Any library function called after vfork() in the child must be async-signal-safe (as for fork()), but it must also not interact with any library state in a way that might break due to address space sharing and/or lack of any preparations performed by libraries on normal fork(). POSIX.1 permits to call only execve() and _exit(), and later revisions remove vfork() specification entirely. In practice, however, almost all operations needed by subprocess.Popen() can be safely implemented on Linux. * Due to sharing of the stack with the parent, the child must be careful not to clobber local variables that are alive across vfork() call. Compilers are normally aware of this and take extra care with vfork() (and setjmp(), which has a similar problem). * In case the parent is privileged, special attention must be paid to vfork() use, because sharing an address space across different privilege domains is insecure[1]. This patch adds support for using vfork() instead of fork() on Linux when it's possible to do safely given the above. In particular: * vfork() is not used if credential switch is requested. The reverse case (simple subprocess.Popen() but another application thread switches credentials concurrently) is not possible for pure-Python apps because subprocess.Popen() and functions like os.setuid() are mutually excluded via GIL. We might also consider to add a way to opt-out of vfork() (and posix_spawn() on platforms where it might be implemented via vfork()) in a future PR. * vfork() is not used if `preexec_fn != None`. With this change, subprocess will still use posix_spawn() if possible, but will fallback to vfork() on Linux in most cases, and, failing that, to fork(). [1] https://ewontfix.com/7 Co-authored-by: Gregory P. Smith [Google LLC] <gps@google.com>
By default python 3.10 will use the fork() which has to copy all the memory of the parent process (in our case this can be huge since Home Assistant core can use hundreds of megabytes of RAM). By using posix_spawn this is avoided. In python 3.11 vfork will also be available python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is considered safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745 The constant is documented as being able to be flipped here: https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
By default python 3.10 will use the fork() which has to copy memory of the parent process (in our case this can be huge since Home Assistant core can use hundreds of megabytes of RAM). By using posix_spawn this is avoided and subprocess creation does not get discernibly slow the larger the Home Assistant python process grows. In python 3.11 vfork will also be available python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is considered safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745 The constant is documented as being able to be flipped here: https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
) * use posix spawn on alpine * Avoid subprocess memory copy when c library supports posix_spawn By default python 3.10 will use the fork() which has to copy all the memory of the parent process (in our case this can be huge since Home Assistant core can use hundreds of megabytes of RAM). By using posix_spawn this is avoided. In python 3.11 vfork will also be available python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is considered safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745 The constant is documented as being able to be flipped here: https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn * Avoid subprocess memory copy when c library supports posix_spawn By default python 3.10 will use the fork() which has to copy memory of the parent process (in our case this can be huge since Home Assistant core can use hundreds of megabytes of RAM). By using posix_spawn this is avoided and subprocess creation does not get discernibly slow the larger the Home Assistant python process grows. In python 3.11 vfork will also be available python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is considered safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745 The constant is documented as being able to be flipped here: https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn * missed some * adjust more tests * coverage
…e-assistant#87958) * use posix spawn on alpine * Avoid subprocess memory copy when c library supports posix_spawn By default python 3.10 will use the fork() which has to copy all the memory of the parent process (in our case this can be huge since Home Assistant core can use hundreds of megabytes of RAM). By using posix_spawn this is avoided. In python 3.11 vfork will also be available python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is considered safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745 The constant is documented as being able to be flipped here: https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn * Avoid subprocess memory copy when c library supports posix_spawn By default python 3.10 will use the fork() which has to copy memory of the parent process (in our case this can be huge since Home Assistant core can use hundreds of megabytes of RAM). By using posix_spawn this is avoided and subprocess creation does not get discernibly slow the larger the Home Assistant python process grows. In python 3.11 vfork will also be available python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is considered safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745 The constant is documented as being able to be flipped here: https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn * missed some * adjust more tests * coverage
When used to run a new executable image, fork() is not a good choice
for process creation, especially if the parent has a large working set:
fork() needs to copy page tables, which is slow, and may fail on systems
where overcommit is disabled, despite that the child is not going to
touch most of its address space.
Currently, subprocess is capable of using posix_spawn() instead, which
normally provides much better performance. However, posix_spawn() does not
support many of child setup operations exposed by subprocess.Popen().
Most notably, it's not possible to express
close_fds=True
, whichhappens to be the default, via posix_spawn(). As a result, most users
can't benefit from faster process creation, at least not without
changing their code.
However, Linux provides vfork() system call, which creates a new process
without copying the address space of the parent, and which is actually
used by C libraries to efficiently implement posix_spawn(). Due to sharing
of the address space and even the stack with the parent, extreme care
is required to use vfork(). At least the following restrictions must hold:
No signal handlers must execute in the child process. Otherwise, they
might clobber memory shared with the parent, potentially confusing it.
Any library function called after vfork() in the child must be
async-signal-safe (as for fork()), but it must also not interact with any
library state in a way that might break due to address space sharing
and/or lack of any preparations performed by libraries on normal fork().
POSIX.1 permits to call only execve() and _exit(), and later revisions
remove vfork() specification entirely. In practice, however, almost all
operations needed by subprocess.Popen() can be safely implemented on
Linux.
Due to sharing of the stack with the parent, the child must be careful
not to clobber local variables that are alive across vfork() call.
Compilers are normally aware of this and take extra care with vfork()
(and setjmp(), which has a similar problem).
In case the parent is privileged, special attention must be paid to vfork()
use, because sharing an address space across different privilege domains
is insecure[1].
This patch adds support for using vfork() instead of fork() on Linux
when it's possible to do safely given the above. In particular:
vfork() is not used if credential switch is requested. The reverse case
(simple subprocess.Popen() but another application thread switches
credentials concurrently) is not possible for pure-Python apps because
subprocess.Popen() and functions like os.setuid() are mutually excluded
via GIL. We might also consider to add a way to opt-out of vfork() (and
posix_spawn() on platforms where it might be implemented via vfork()) in
a future PR.
vfork() is not used if
preexec_fn != None
.With this change, subprocess will still use posix_spawn() if possible, but
will fallback to vfork() on Linux in most cases, and, failing that,
to fork().
[1] https://ewontfix.com/7
https://bugs.python.org/issue35823