[2.7] bpo-38106: Fix race in PyThread_release_lock that can lead to memory corruption and deadlock #16047

navytux · 2019-09-12T13:24:04Z

On Darwin, even though this is considered as POSIX, Python uses
mutex+condition variable to implement its lock, and, as of 20190828, Py2.7
implementation, even though similar issue was fixed for Py3 in 2012, contains
synchronization bug: the condition is signalled after mutex unlock while the
correct protocol is to signal condition from under mutex:

https://github.com/python/cpython/blob/v2.7.16-127-g0229b56d8c0/Python/thread_pthread.h#L486-L506
187aa54 (py3 fix)

PyPy has the same bug for both pypy2 and pypy3:

https://bitbucket.org/pypy/pypy/src/578667b3fef9/rpython/translator/c/src/thread_pthread.c#lines-443:465
https://bitbucket.org/pypy/pypy/src/5b42890d48c3/rpython/translator/c/src/thread_pthread.c#lines-443:465

Signalling condition outside of corresponding mutex is considered OK by
POSIX, but in Python context it can lead to at least memory corruption if we
consider the whole lifetime of python level lock. For example the following
logical scenario:

      T1                                          T2

  sema = Lock()
  sema.acquire()

                                              sema.release()

  sema.acquire()
  free(sema)

  ...

can translate to the next C-level calls:

      T1                                          T2

  # sema = Lock()
  sema = malloc(...)
  sema.locked = 0
  pthread_mutex_init(&sema.mut)
  pthread_cond_init (&sema.lock_released)

  # sema.acquire()
  pthread_mutex_lock(&sema.mut)
  # sees sema.locked == 0
  sema.locked = 1
  pthread_mutex_unlock(&sema.mut)

                                              # sema.release()
                                              pthread_mutex_lock(&sema.mut)
                                              sema.locked = 0
                                              pthread_mutex_unlock(&sema.mut)

                      # OS scheduler gets in and relinquishes control from T2
                      # to another process
                                              ...

  # second sema.acquire()
  pthread_mutex_lock(&sema.mut)
  # sees sema.locked == 0
  sema.locked = 1
  pthread_mutex_unlock(&sema.mut)

  # free(sema)
  pthread_mutex_destroy(&sema.mut)
  pthread_cond_destroy (&sema.lock_released)
  free(sema)

  # ...
  e.g. malloc() which returns memory where sema was

                                              ...
                      # OS scheduler returns control to T2
                      # sema.release() continues
                      #
                      # BUT sema was already freed and writing to anywhere
                      # inside sema block CORRUPTS MEMORY. In particular if
                      # _another_ python-level lock was allocated where sema
                      # block was, writing into the memory can have effect on
                      # further synchronization correctness and in particular
                      # lead to deadlock on lock that was next allocated.
                                              pthread_cond_signal(&sema.lock_released)

Note that T2.pthread_cond_signal(&sema.lock_released) CORRUPTS MEMORY as it
is called when sema memory was already freed and is potentially
reallocated for another object.

The fix is to move pthread_cond_signal to be done under corresponding mutex:

  # sema.release()
  pthread_mutex_lock(&sema.mut)
  sema.locked = 0
  pthread_cond_signal(&sema.lock_released)
  pthread_mutex_unlock(&sema.mut)

To do so this patch cherry-picks thread_pthread.h part of the following 3.2 commit:

commit 187aa545165d8d5eac222ecce29c8a77e0282dd4
Author: Kristján Valur Jónsson <kristjan@ccpgames.com>
Date:   Tue Jun 5 22:17:42 2012 +0000

    Signal condition variables with the mutex held.  Destroy condition variables
    before their mutexes.

 Python/ceval_gil.h      |  9 +++++----
 Python/thread_pthread.h | 15 +++++++++------
 2 files changed, 14 insertions(+), 10 deletions(-)

(ceval_gil.h is Python3 specific and does not apply to Python2.7)

Bug history

The bug was there since 1994 - since at least [1]. It was discussed in 2001
with original code author[2], but the code was still considered to be
race-free. In 2010 the place where pthread_cond_signal should be - before or
after pthread_mutex_unlock - was discussed with the rationale to avoid
threads bouncing[3,4,5], and in 2012 pthread_cond_signal was moved to be
called from under mutex, but only for CPython3[6,7].

In 2019 the bug was (re-)discovered while testing Pygolang[8] on macOS with
CPython2 and PyPy2 and PyPy3.

[1] 2c8cb9f
[2] https://bugs.python.org/issue433625
[3] https://bugs.python.org/issue8299#msg103224
[4] https://bugs.python.org/issue8410#msg103313
[5] https://bugs.python.org/issue8411#msg113301
[6] https://bugs.python.org/issue15038#msg163187
[7] 187aa54
[8] https://pypi.org/project/pygolang

https://bugs.python.org/issue38106

the-knights-who-say-ni · 2019-09-12T13:24:07Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@navytux

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

navytux · 2019-09-12T13:26:05Z

I have signed PSF CLA, but only 30 minutes ago.

navytux · 2019-09-12T13:39:44Z

I've amended the patch to make bedevere/news happy with the following Misc/News.d/ interdiff:

--- /dev/null
+++ b/Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.[bpo-38106](https://bugs.python.org/issue38106).4pApn7.rst       
@@ -0,0 +1,2 @@
+Fix race in PyThread_release_lock that was leading to memory corruption and
+deadlocks.


        libgolang/thread: Add links to upstream PyThread_release_lock bug

I reported the bug (see 69db91b "libgolang: Add internal semaphores") to CPython and PyPy upstreams. PyPy people already fixed it and the fix, if I understood correctly, should be available as part of PyPy 7.2 . Let's hope that CPython 2.7 will be also fixed. - https://bugs.python.org/issue38106 -> python/cpython#16047 - https://bitbucket.org/pypy/pypy/issues/3072 Added Python-level test triggers the bug with high probability. Dedicated C-level Semaphore-only test is pending.

vstinner · 2019-10-02T07:53:13Z

Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.bpo-38106.4pApn7.rst

@@ -0,0 +1,2 @@
+Fix race in PyThread_release_lock that was leading to memory corruption and
+deadlocks.


Oh! I just created a very similar local change to prepare a pull request... and then I saw that you created a PR! Oops.

I wrote a different description for the change:

bpo-38106: Fix a race condition in pthread locks Fix a race condition in locks of the pthread implementation using mutex with condition variable. Signal condition variables with the mutex held. Destroy condition variables before their mutexes.

Please mention which exact implementation is changed. We do have 2 implementations for pthread :-)

navytux · 2019-10-02T10:32:12Z

@vstinner thanks for feedback and for noticing my PR. I agree it makes sense to clarify in NEWS about which systems are affected. I've amended the patch with the following interdiff:

--- a/Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.[bpo-38106](https://bugs.python.org/issue38106).4pApn7.rst       
+++ b/Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.[bpo-38106](https://bugs.python.org/issue38106).4pApn7.rst       
@@ -1,2 +1,6 @@
 Fix race in PyThread_release_lock that was leading to memory corruption and
 deadlocks.
+
+The fix applies to POSIX systems where Python locks are implemented with mutex
+and condition variable because POSIX semaphores are either not provided, or are
+known to be broken. One particular example of such system is macOS.

Is it ok?

vstinner

LGTM. I would prefer to have a review of a second core dev.

@pablogsal, @serhiy-storchaka, @methane, @benjaminp: Would you mind to have a look at this fix for a lock race condition?

vstinner · 2019-10-02T10:55:10Z

Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.bpo-38106.4pApn7.rst

@@ -0,0 +1,6 @@
+Fix race in PyThread_release_lock that was leading to memory corruption and
+deadlocks.


I'm not sure how it is rendered. I suggest to merge the sentences into a single paragraph.

Ok, I've amended the patch with the following interdiff:

--- a/Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.bpo-38106.4pApn7.rst +++ b/Misc/NEWS.d/next/Core and Builtins/2019-09-12-16-36-16.bpo-38106.4pApn7.rst @@ -1,6 +1,5 @@ Fix race in PyThread_release_lock that was leading to memory corruption and -deadlocks. - -The fix applies to POSIX systems where Python locks are implemented with mutex -and condition variable because POSIX semaphores are either not provided, or are -known to be broken. One particular example of such system is macOS. +deadlocks. The fix applies to POSIX systems where Python locks are implemented +with mutex and condition variable because POSIX semaphores are either not +provided, or are known to be broken. One particular example of such system is +macOS.

For the reference sometimes changelog entries are added as several paragraphs and it renders e.g. like this (search for "bpo-33185: Fixed regression when running pydoc with the -m switch" there).


        [2.7] bpo-38106: Fix race in PyThread_release_lock that can lead to m…

…emory corruption and deadlock On Darwin, even though this is considered as POSIX, Python uses mutex+condition variable to implement its lock, and, as of 20190828, Py2.7 implementation, even though similar issue was fixed for Py3 in 2012, contains synchronization bug: the condition is signalled after mutex unlock while the correct protocol is to signal condition from under mutex: https://github.com/python/cpython/blob/v2.7.16-127-g0229b56d8c0/Python/thread_pthread.h#L486-L506 187aa54 (py3 fix) PyPy has the same bug for both pypy2 and pypy3: https://bitbucket.org/pypy/pypy/src/578667b3fef9/rpython/translator/c/src/thread_pthread.c#lines-443:465 https://bitbucket.org/pypy/pypy/src/5b42890d48c3/rpython/translator/c/src/thread_pthread.c#lines-443:465 Signalling condition outside of corresponding mutex is considered OK by POSIX, but in Python context it can lead to at least memory corruption if we consider the whole lifetime of python level lock. For example the following logical scenario: T1 T2 sema = Lock() sema.acquire() sema.release() sema.acquire() free(sema) ... can translate to the next C-level calls: T1 T2 # sema = Lock() sema = malloc(...) sema.locked = 0 pthread_mutex_init(&sema.mut) pthread_cond_init (&sema.lock_released) # sema.acquire() pthread_mutex_lock(&sema.mut) # sees sema.locked == 0 sema.locked = 1 pthread_mutex_unlock(&sema.mut) # sema.release() pthread_mutex_lock(&sema.mut) sema.locked = 0 pthread_mutex_unlock(&sema.mut) # OS scheduler gets in and relinquishes control from T2 # to another process ... # second sema.acquire() pthread_mutex_lock(&sema.mut) # sees sema.locked == 0 sema.locked = 1 pthread_mutex_unlock(&sema.mut) # free(sema) pthread_mutex_destroy(&sema.mut) pthread_cond_destroy (&sema.lock_released) free(sema) # ... e.g. malloc() which returns memory where sema was ... # OS scheduler returns control to T2 # sema.release() continues # # BUT sema was already freed and writing to anywhere # inside sema block CORRUPTS MEMORY. In particular if # _another_ python-level lock was allocated where sema # block was, writing into the memory can have effect on # further synchronization correctness and in particular # lead to deadlock on lock that was next allocated. pthread_cond_signal(&sema.lock_released) Note that T2.pthread_cond_signal(&sema.lock_released) CORRUPTS MEMORY as it is called when sema memory was already freed and is potentially reallocated for another object. The fix is to move pthread_cond_signal to be done under corresponding mutex: # sema.release() pthread_mutex_lock(&sema.mut) sema.locked = 0 pthread_cond_signal(&sema.lock_released) pthread_mutex_unlock(&sema.mut) To do so this patch cherry-picks thread_pthread.h part of the following 3.2 commit: commit 187aa54 Author: Kristján Valur Jónsson <kristjan@ccpgames.com> Date: Tue Jun 5 22:17:42 2012 +0000 Signal condition variables with the mutex held. Destroy condition variables before their mutexes. Python/ceval_gil.h | 9 +++++---- Python/thread_pthread.h | 15 +++++++++------ 2 files changed, 14 insertions(+), 10 deletions(-) (ceval_gil.h is Python3 specific and does not apply to Python2.7) -------- Bug history The bug was there since 1994 - since at least [1]. It was discussed in 2001 with original code author[2], but the code was still considered to be race-free. In 2010 the place where pthread_cond_signal should be - before or after pthread_mutex_unlock - was discussed with the rationale to avoid threads bouncing[3,4,5], and in 2012 pthread_cond_signal was moved to be called from under mutex, but only for CPython3[6,7]. In 2019 the bug was (re-)discovered while testing Pygolang[8] on macOS with CPython2 and PyPy2 and PyPy3. [1] 2c8cb9f [2] https://bugs.python.org/issue433625 [3] https://bugs.python.org/issue8299#msg103224 [4] https://bugs.python.org/issue8410#msg103313 [5] https://bugs.python.org/issue8411#msg113301 [6] https://bugs.python.org/issue15038#msg163187 [7] 187aa54 [8] https://pypi.org/project/pygolang

navytux · 2019-10-02T13:01:38Z

LGTM

Thanks for review.

vstinner · 2019-10-03T07:08:16Z

Thank you very much for this amazing commit message @navytux :-D I made some minor edits: I added manually "Co-Authored-By" and "(cherry picked from commit 187aa54)" markups.

navytux · 2019-10-03T07:23:48Z

@vstinner, @methane, thanks again for feedback and merging. The edits also look good - thanks for adding them. Python2.7 on macOS should be now, hopefully, in a better shape.

Thanks again,
Kirill


        golang: Expose Sema and Mutex as public Python and Cython/nogil API

Until now libgolang had semaphore and mutex functionality, but implemented only internally and not exposed to users. Since for its pinner thread wendelin.core v2 needs not only nogil channels, but also nogil mutexes, timers and so on, it is now time to start providing that. We start with mutexes: - Expose Mutex from insides of libgolang.cpp to public API in libgolang.h . We actually expose both Mutex and Sema because semaphores are sometimes also useful for low-level synchronization, and because it is easier to export Mutex and Sema both at the same time. - Provide pyx/nogil API to those as sync.Mutex and sync.Sema. - Provide corresponding python wrappers. - Add Pyx-level test for semaphore wakeup when wakeup is done not by the same thread which did the original acquire. This is the test that was promised in 5142460 (libgolang/thread: Add links to upstream PyThread_release_lock bug), and it used to corrupt memory and deadlock on macOS due to CPython & PyPy runtime bugs: https://bugs.python.org/issue38106 -> python/cpython#16047 https://bitbucket.org/pypy/pypy/issues/3072 Note about python wrappers: At present, due to cython/cython#3165, C-level panic is not properly translated into Py-level exception in (Py)Sema/Mutex constructors. This should not pose a real problem in practice though.

navytux · 2019-12-11T11:32:20Z

For the reference: a test for this issue is included in pygolang: https://lab.nexedi.com/kirr/pygolang/commit/34b7a1f4.

the-knights-who-say-ni added the CLA not signed label Sep 12, 2019

bedevere-bot added the awaiting review label Sep 12, 2019

navytux force-pushed the navytux:2.7-pylock-release-fix branch from 2445211 to 9b135c0 Sep 12, 2019

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Sep 12, 2019

vstinner reviewed Oct 2, 2019

View changes

navytux force-pushed the navytux:2.7-pylock-release-fix branch from 9b135c0 to 5164ee7 Oct 2, 2019

vstinner approved these changes Oct 2, 2019

View changes

bedevere-bot added awaiting merge and removed awaiting review labels Oct 2, 2019

navytux force-pushed the navytux:2.7-pylock-release-fix branch from 5164ee7 to 7c11834 Oct 2, 2019

methane approved these changes Oct 3, 2019

View changes

bedevere-bot removed the awaiting merge label Oct 3, 2019

navytux deleted the navytux:2.7-pylock-release-fix branch Oct 3, 2019

python/cpython

[2.7] bpo-38106: Fix race in PyThread_release_lock that can lead to memory corruption and deadlock #16047

[2.7] bpo-38106: Fix race in PyThread_release_lock that can lead to memory corruption and deadlock #16047

navytux commented Sep 12, 2019 •

edited by bedevere-bot

This comment has been minimized.

the-knights-who-say-ni commented Sep 12, 2019

This comment has been minimized.

navytux commented Sep 12, 2019

This comment has been minimized.

navytux commented Sep 12, 2019 •

edited by bedevere-bot

This comment has been minimized.

This comment has been minimized.

navytux commented Oct 2, 2019 •

edited by bedevere-bot

vstinner left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

navytux commented Oct 2, 2019

This comment has been minimized.

vstinner commented Oct 3, 2019

This comment has been minimized.

navytux commented Oct 3, 2019 •

edited

This comment has been minimized.

navytux commented Dec 11, 2019

Sponsor python/cpython

python/cpython

Join GitHub today

[2.7] bpo-38106: Fix race in PyThread_release_lock that can lead to memory corruption and deadlock #16047

[2.7] bpo-38106: Fix race in PyThread_release_lock that can lead to memory corruption and deadlock #16047

Conversation

navytux commented Sep 12, 2019 • edited by bedevere-bot

This comment has been minimized.

the-knights-who-say-ni commented Sep 12, 2019

CLA Missing

This comment has been minimized.

navytux commented Sep 12, 2019

This comment has been minimized.

navytux commented Sep 12, 2019 • edited by bedevere-bot

This comment has been minimized.

vstinner Oct 2, 2019 • edited

This comment has been minimized.

navytux commented Oct 2, 2019 • edited by bedevere-bot

vstinner left a comment

This comment has been minimized.

vstinner Oct 2, 2019

This comment has been minimized.

navytux Oct 2, 2019

This comment has been minimized.

navytux commented Oct 2, 2019

This comment has been minimized.

vstinner commented Oct 3, 2019

This comment has been minimized.

navytux commented Oct 3, 2019 • edited

This comment has been minimized.

navytux commented Dec 11, 2019

navytux commented Sep 12, 2019 •

edited by bedevere-bot

navytux commented Sep 12, 2019 •

edited by bedevere-bot

vstinner Oct 2, 2019 •

edited

navytux commented Oct 2, 2019 •

edited by bedevere-bot

navytux commented Oct 3, 2019 •

edited