New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-42969: Hang non-main threads that attempt to acquire the GIL during finalization #28525
base: main
Are you sure you want to change the base?
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
Can this possibly go into a point release? I think it would be unfortunate for this bug to remain until 3.11, even though presumably it has been around since forever. |
The API deprecation must say 3.11. but the bugfix itself could be backported. We should let this one bake on the main branch for a bit first. |
PS please do the CLA signing dance (see go/patching#cla internally, it's approved). |
I already signed the CLA earlier today, may take until tomorrow to register. |
I'm working on adding a mechanism that extension code can use to safely avoid threads hanging, and updating the documentation. I'll update this PR shortly once that is done. |
There are legit use cases for pthread_exit(). If PyThread_exit_thread() is deprecated, does it mean that developers using it should call directly |
I dislike this PR. When Python is embedded in an application, it means that Python leaks hang threads in some cases, whereas currently these threads exit once they attempt to acquire the GIL.
I get that there are some use cases where calling pthread_exit() is bad, but other cases, calling pthread_exit() sounds less bad than hang these threads.
Would it be possible to make it possible to choose the behavior, and keep the current behavior by default?
{ | ||
dprintf(("PyThread_pause_thread called\n")); | ||
while (1) { | ||
pause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that pause() is the best idea. Using signals with threads is fragile on Unix. Calling pause() in a thread can change the behavior of other threads (waiting for signals). Maybe running a very long sleep is a loop is better. Like select() with a sleep of 1 day, run in a loop (select can fail with EINTR if it's interrupted by a signal).
To not be annoyed by signals, you can use:
#if defined(HAVE_PTHREAD_SIGMASK) && !defined(HAVE_BROKEN_PTHREAD_SIGMASK)
sigset_t set;
/* we don't want to receive any signal */
sigfillset(&set);
pthread_sigmask(SIG_SETMASK, &set, NULL);
#endif
Code copied from faulthandler.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming pause
is subject to the signal mask as well, so I don't think it would change the behavior of other threads --- signals not blocked by the existing signal mask can already be delivered even without calling pause
. It seems like continuing to run signal handlers would be harmless. What would the signal handler do that would be affected by Python finalizing? It already cannot acquire the GIL or other locks. I think blocking them would also probably be harmless, but that seems like potentially more interference with a thread that may not have been created by Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the signal handler do that would be affected by Python finalizing?
If an application embeds Python, it can continue to run and expect signals to continue working, after Python is finalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just use:
sigset_t set; /* we don't want to receive any signal */
sigfillset(&set);
sigsuspend(&set);
Instead of pause() or select() or poll() based sleeps. It effectively combines the above pthread_sigmask with pause in one API call.
Python/thread_nt.h
Outdated
while (1) { | ||
GetMessage(&msg, NULL, 0, 0); | ||
TranslateMessage(&msg); | ||
DispatchMessage(&msg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this be a problem if a window procedure is implemented as a Python function (e.g. via PyWin32 or ctypes)?
Also, there's nothing to be done if it's not a GUI process/thread, i.e. when the thread doesn't even have a message queue. In this case it can just call Sleep(INFINITE)
.
To avoid loading "user32", which converts to a GUI process (something I'd like to avoid/delay in the future, but it's not currently the case), I suggest dynamically checking for a GUI thread as follows:
static BOOL
is_gui_thread()
{
HMODULE user32 = GetModuleHandleW(L"user32");
if (user32 != NULL) {
FARPROC pf = GetProcAddress(user32, "IsGUIThread");
if (pf != NULL) {
typedef BOOL (WINAPI *pfIsGUIThread)(BOOL);
return ((pfIsGUIThread)pf)(FALSE);
}
}
return FALSE;
}
For now, Python implicitly links with "user32", so the dynamic use of GetModuleHandleW()
and GetProcAddress()
doesn't matter. However, the IsGUIThread(FALSE)
call does matter. A thread doesn't get converted to a GUI thread until it uses any GUI related functions, such as creating a window. Currently the main thread is converted at startup during the static import of "user32". But initially new threads are not GUI threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change as you suggest.
You have a good point regarding a window procedure implemented in Python --- I guess it would attempt to re-acquire the GIL, and we would have unbounded recursion back into the hang function, until we overflow the stack.
Ideally PyWin32 would be changed to use PyThread_TryAcquireFinalizeBlock
or PyGILState_TryAcquireFinalizeBlockAndGil
rather than PyGILState_Ensure
. But still we don't want to cause problems for existing code. I suppose one option could be to detect re-entering of _PyThread_hang_thread
and just Sleep in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to guard against re-entrant calls.
099dd01
to
8aab6c0
Compare
This PR becomes more and more problems. I'm not sure if it is will fix more problems than introducing more problems.
I don't understand why Python should now start handling Windows GUI messages. Why only Windows GUI messages? Do you expect that other operating systems will also need special code in "hang threads"?
@@ -306,4 +306,44 @@ PyCOND_BROADCAST(PyCOND_T *cv) | |||
|
|||
#endif /* _POSIX_THREADS, NT_THREADS */ | |||
|
|||
/* Convenience interfaces that terminate the program in case of an error. */ | |||
#define MUTEX_INIT(mut) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is wrong. All names in the C API must be prefixed by "Py". Also, I would prefer to not add this API to the limited C API. At least, move it to the cpython/ directory to exclude it from the limited C API. Read Include/README.rst.
I would strongly recommend to move it to the internal C API: Include/internal/. There is already a file for that: Include/internal/pycore_ceval.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not part of the public API at all, because it is in the Python/
directory not the Include/
directory. I just moved these macros from ceval_gil.h
to condvar.h
, because I needed them outside of ceval_gil.h
, and then added COND_BROADCAST
in the same style as the others.
@@ -1615,6 +1616,47 @@ PyGILState_Release(PyGILState_STATE oldstate) | |||
PyEval_SaveThread(); | |||
} | |||
|
|||
int | |||
PyThread_TryAcquireFinalizeBlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this API internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am specifically intending to add this as a public API that may be used to safely attempt to acquire the GIL without the risk of hanging the thread (or if we continued to call pthread_exit
, crashing the program).
The intended use case is as follows:
Previously, you might code like the following:
// some non-Python created thread that wants to send Python an async notification
PyGILState_STATE state = PyGILState_Ensure(); // may kill thread with pthread_exit
// call `call_soon_threadsafe` on some event loop object
PyGILState_Release(state);
With this new API:
// some non-Python created thread that wants to send Python an async notification
int acquired = PyThread_TryAcquireFinalizeBlock();
if (!acquired) {
// skip sending notification since python is exiting
return;
}
PyGILState_STATE state = PyGILState_Ensure(); // safe now
// call `call_soon_threadsafe` on some event loop object
PyGILState_Release(state);
PyThread_ReleaseFinalizeBlock();
Or with the convenience interface:
// some non-Python created thread that wants to send Python an async notification
PyGILState_TRY_STATE state = PyGILState_TryAcquireFinalizeBlockAndGil();
if (!state) {
// skip sending notification since python is exiting
return;
}
// call `call_soon_threadsafe` on some event loop object
PyGILState_ReleaseGilAndFinalizeBlock(state);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This write-up with examples actually makes some sense to be included in your "Cautions regarding runtime finalization" documentation section introducing the new APIs.
Why do we need both APIs. Why not only offer the "convenience" API as a public API?
Also realize that extension authors and code generators will likely need conditionally compiled code based on the Python version to juse the their existing PyGILState_Ensure and Release calls on <= 3.10 while calling the new convenience API on 3.11+. When introducing something new, including that in the example should help (though maybe that specific transitional example belongs in the What's New doc for 3.11)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some sense PyThread_TryAcquireFinalizeBlock is logically separate from acquiring the GIL. Yes, it is hard to imagine why you would want to acquire a finalize block if you don't also want to acquire the GIL, but you may well release the GIL while holding the finalize block (and in fact if the GIL were guaranteed not to be released, then a separate finalize block wouldn't be needed at all). However, that is a minor point.
Another issue is that the documentation for PyGILState_Ensure
indicates it should not be used when using sub-interpreters. I don't really know much about how sub-interpreters work, but as this convenience API is based on PyGILState_Ensure
, it would have the same limitation.
There is the additional issue of what to do when called from the finalization thread. Currently these APIs will fail, since they don't make an exception for the finalization thread. But since you can safely acquire the GIL on the finalization thread, in some cases you might want to skip acquiring the finaliation block, but proceed with acquiring the GIL, if running on the finalization thread.
{ | ||
dprintf(("PyThread_pause_thread called\n")); | ||
while (1) { | ||
pause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the signal handler do that would be affected by Python finalizing?
If an application embeds Python, it can continue to run and expect signals to continue working, after Python is finalized.
Python/thread_nt.h
Outdated
while (1) { | ||
if (GetMessage(&msg, NULL, 0, 0) == -1) return; | ||
TranslateMessage(&msg); | ||
DispatchMessage(&msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why a thread must handle Windows messages. Currently, DispatchMessage() is not called anywhere in Python. I don't see why it should start calling it. What happens if you just run the Sleep(INFINITE) loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding signals: suppose the application receives a non-blocked signal on a hung thread. I believe what will happen as I have this pull request currently, is that the signal handler is run as normal. Is that a problem?
I'm not an expert on Windows, but my first thought was the same as yours, to just call Sleep(INFINITE);
. But then I looked at the MSDN documentation for Sleep
, and it says that calling Sleep can cause deadlock if the thread receives messages. For that reason I added the message handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think Windows is somewhat special compared to other operating systems, in that it has "GUI"-specific things like message handling integrated deeply in the operating system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a thread owns windows, it needs to execute a message loop to get messages from its message queue and handle/dispatch them. Otherwise a call will hang that sends a message to a window or broadcasts a message to top-level windows, such as SendMessageW(HWND_BROADCAST, WM_, ...)
. Using SendMessageTimeoutW()
avoids hanging due to unresponsive threads.
When a thread exits via PyThread_exit_thread()
, any windows it owns are destroyed. That won't be the case if the code is changed to hang a thread that tries to acquire the GIL at exit. However, I don't know whether it's a problem that's worth addressing. If the process is exiting, the thread will terminate soon. We're talking about hanging a thread for a short time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is getting overly complex for a Windows situation we don't even have evidence of being a problem for this particular case.
These code paths are hanging a thread because the main thread has indicated the python interpreter is exiting. Ordinarily that suggests the process is exiting and everything goes away quickly. But in an embedded Python interpreter case it is technically possible for the application to just finalize the interpreter but not exit the process. I don't know how common that is. I expect not common at all. I would not worry about it. At least not for any first iteration on this. The norm for things embedding CPython is to only initialize one and only finalize when the process is finishing. Given that our interpreters are still not friendly to multiple initialize+finalize cycles (we've been improving that a lot, but are still not there yet, so I don't expect any Windows GUI application to depend on doing that).
I think a Sleep(INFINITE);
is simpler and more appropriate for now. With a comment suggesting we may need to do Windows even handling stuff if this ever presents itself as a problem for anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR to remove the windows message handling.
`PyGILState_ReleaseGilAndFinalizeBlock` with the same value returned by this | ||
function. Calling `PyGILState_ReleaseGilAndFinalizeBlock` with the error | ||
value `PyGILState_TRY_LOCK_FAILED` is safe and does nothing. */ | ||
PyAPI_FUNC(PyGILState_TRY_STATE) PyGILState_TryAcquireFinalizeBlockAndGil(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the purpose of this new API. It's not used in your PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. This provides a race-free way to avoid the thread hanging/exiting.
leaving a few more review comments here. i'll followup on the bug with larger picture PR-existential strategy comments.
{ | ||
dprintf(("PyThread_pause_thread called\n")); | ||
while (1) { | ||
pause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just use:
sigset_t set; /* we don't want to receive any signal */
sigfillset(&set);
sigsuspend(&set);
Instead of pause() or select() or poll() based sleeps. It effectively combines the above pthread_sigmask with pause in one API call.
@@ -1615,6 +1616,47 @@ PyGILState_Release(PyGILState_STATE oldstate) | |||
PyEval_SaveThread(); | |||
} | |||
|
|||
int | |||
PyThread_TryAcquireFinalizeBlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This write-up with examples actually makes some sense to be included in your "Cautions regarding runtime finalization" documentation section introducing the new APIs.
Why do we need both APIs. Why not only offer the "convenience" API as a public API?
Also realize that extension authors and code generators will likely need conditionally compiled code based on the Python version to juse the their existing PyGILState_Ensure and Release calls on <= 3.10 while calling the new convenience API on 3.11+. When introducing something new, including that in the example should help (though maybe that specific transitional example belongs in the What's New doc for 3.11)
Python/thread_nt.h
Outdated
while (1) { | ||
Sleep(INFINITE); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing can interrupt Sleep()
, so the loop isn't needed.
Alternatively, suspend the thread via SuspendThread(GetCurrentThread())
. The latter increments the thread suspend count by one, and ResumeThread(hThread)
decrements it by one. The thread doesn't execute again until the suspend count is decremented to 0.
Or call SleepEx(INFINITE, TRUE)
to wait in an alertable state. In this state, the thread will wake up to execute a queued asynchronous procedure call (APC) from QueueUserAPC()
, or from I/O completion due to a previous ReadFileEx()
or WriteFileEx()
call. A loop is necessary in this case because SleepEx()
returns with STATUS_USER_APC
(i.e. WAIT_IO_COMPLETION
) when an APC is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I have updated the documentation to document the new functions, and changed to use I have not yet made the change suggested by @gpshead to block signals in the pthread version --- I can do that if necessary, but I am not sure what the benefit is. Signal handlers already can't safely attempt to acquire the GIL or use any Python APIs that require the GIL, and therefore I would assume there is no harm in letting them run. Other than that, what are the next steps for this pull request? |
This PR is stale because it has been open for 30 days with no activity. |
https://bugs.python.org/issue42969