-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-46649: Propagate Python thread name to PyThreadState #31142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1551,6 +1551,40 @@ PyDoc_STRVAR(excepthook_doc, | |||
\n\ | |||
Handle uncaught Thread.run() exception."); | |||
|
|||
|
|||
static PyObject* | |||
thread_set_name(PyObject *self, PyObject *args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% ./python.exe
Python 3.11.0a5+ (heads/bpo-412569/propagate-thread-name:9f9acf49be, Feb 5 2022, 17:15:00) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import _thread
>>> _thread.set_name(123434, "x")
zsh: segmentation fault ./python.exe
This function basically just writes to arbitrary memory, which is a security risk. I think the right fix would be to pass the thread state around as some sort of opaque object instead of a raw int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. set_name
is not meant to be called by anything else other than Thread
, but it must be made safe nonetheless! Is there an established pattern for this sort of issue? I'm thinking of making an object, like _PyThreadState_Reference
to pass from C to Python.
9f9acf4
to
0d3212f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you hold on to a PyTStateRef
object after the corresponding thread has ended? I wasn't able to get a segfault this way, but it looks like I'd be writing into freed memory.
/* struct _is is defined in internal/pycore_interp.h */ | ||
typedef struct _is PyInterpreterState; | ||
|
||
PyAPI_DATA(PyTypeObject) PyTStateRef_Type; | ||
|
||
#define PyTStateRef_Check(op) Py_IS_TYPE(op, &PyTStateRef_Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of this needs to go in a public header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put this here so that PyTStateRef_Check(op)
could be used in _testcapi
. But given how simple the definition is I could get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JelleZijlstra I can't get this to work if I remove the PyAPI_DATA
export. Any idea where I could put that to avoid affecting the stable API? I think one thing to consider, though, is that the new PyTStateRef
objects are surfaced to Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too experienced with the C API, but I am pretty confident we don't want to expose this class in the C API without a strong reason. Generally we're trying to restrict the C API so that it is easier to evolve CPython. For example, changes to the PyThreadState have caused trouble several times. https://devguide.python.org/c-api/ says internal symbols should ideally be in Include/internal.
https://docs.python.org/3/c-api/init.html#c.PyThreadState also says that only the "interp" member of this struct is public, so putting it here isn't enough to allow well-behaved external users of the API to access the thread name. I think you should prefix the new struct members with underscores and add a new, documented and public function PyThreadState_GetName()
for extensions to access the thread name.
Hmm yes I think it's still possible to get a segfault this way |
0d3212f
to
96bb7f6
Compare
This change allows Python thread objects to propagate their name to the underlying PyThreadState structure to make it easier for tools that rely on the ABI, like sampling profilers, to retrieve this information and provide reacher profiling data.
96bb7f6
to
3c6192d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Here's a few more comments.
/* struct _is is defined in internal/pycore_interp.h */ | ||
typedef struct _is PyInterpreterState; | ||
|
||
PyAPI_DATA(PyTypeObject) PyTStateRef_Type; | ||
|
||
#define PyTStateRef_Check(op) Py_IS_TYPE(op, &PyTStateRef_Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too experienced with the C API, but I am pretty confident we don't want to expose this class in the C API without a strong reason. Generally we're trying to restrict the C API so that it is easier to evolve CPython. For example, changes to the PyThreadState have caused trouble several times. https://devguide.python.org/c-api/ says internal symbols should ideally be in Include/internal.
https://docs.python.org/3/c-api/init.html#c.PyThreadState also says that only the "interp" member of this struct is public, so putting it here isn't enough to allow well-behaved external users of the API to access the thread name. I think you should prefix the new struct members with underscores and add a new, documented and public function PyThreadState_GetName()
for extensions to access the thread name.
@@ -783,7 +783,7 @@ init_threadstate(PyThreadState *tstate, | |||
tstate->datastack_chunk = NULL; | |||
tstate->datastack_top = NULL; | |||
tstate->datastack_limit = NULL; | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
@@ -1034,6 +1034,11 @@ PyThreadState_Clear(PyThreadState *tstate) | |||
Py_CLEAR(tstate->async_gen_finalizer); | |||
|
|||
Py_CLEAR(tstate->context); | |||
|
|||
if (tstate->ref != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how PyThreadState interacts with the garbage collector. How will the GC know that the thread state ref object is alive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ref object has a life on its own. The PyThreadState creates the ref object. Once the thread state is deallocated, it sets the actual ref inside the ref object to NULL and decrements its refcount. The ref object then lives on until there are references to it. Otherwise, it is GC'd at the next collection cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I am not convinced this is a good idea. We are moving more and more to a private thread state, and we want to be able to modify its ABI, fields and ordering even between patch releases so I am not comfortable assuring that even more fields can be retrieved via a constant offset from the thread state. Also, adding Python objects to the thread state is possible, but makes destruction more tricky (for example, destroying Python objects must be done with the GIL held, but when the thread state is being teared down from an extension module the GIL may be unlocked).
Additionally, I think this PR is adding quite a lot of extra complexity for very restricted gains.
I am not strongly against this, but unless there is strong consensus that this is clearly a win, I am not confident with this.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Closing since there were objections to the change and this has been idle for over a year since then. |
This change allows Python thread objects to propagate their name to the underlying PyThreadState structure to make it easier for tools that rely on the ABI, like sampling profilers, to retrieve this information and provide richer profiling data.
https://bugs.python.org/issue46649