Skip to content

bpo-46649: Propagate Python thread name to PyThreadState #31142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

P403n1x87
Copy link
Contributor

@P403n1x87 P403n1x87 commented Feb 5, 2022

This change allows Python thread objects to propagate their name to the underlying PyThreadState structure to make it easier for tools that rely on the ABI, like sampling profilers, to retrieve this information and provide richer profiling data.

https://bugs.python.org/issue46649

@P403n1x87 P403n1x87 changed the title bpo-412569: Propagate Python thread name to PyThreadState bpo-46649: Propagate Python thread name to PyThreadState Feb 5, 2022
@@ -1551,6 +1551,40 @@ PyDoc_STRVAR(excepthook_doc,
\n\
Handle uncaught Thread.run() exception.");


static PyObject*
thread_set_name(PyObject *self, PyObject *args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 % ./python.exe
Python 3.11.0a5+ (heads/bpo-412569/propagate-thread-name:9f9acf49be, Feb  5 2022, 17:15:00) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import _thread
>>> _thread.set_name(123434, "x")
zsh: segmentation fault  ./python.exe

This function basically just writes to arbitrary memory, which is a security risk. I think the right fix would be to pass the thread state around as some sort of opaque object instead of a raw int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. set_name is not meant to be called by anything else other than Thread, but it must be made safe nonetheless! Is there an established pattern for this sort of issue? I'm thinking of making an object, like _PyThreadState_Reference to pass from C to Python.

@P403n1x87 P403n1x87 force-pushed the bpo-412569/propagate-thread-name branch from 9f9acf4 to 0d3212f Compare February 7, 2022 13:02
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you hold on to a PyTStateRef object after the corresponding thread has ended? I wasn't able to get a segfault this way, but it looks like I'd be writing into freed memory.

/* struct _is is defined in internal/pycore_interp.h */
typedef struct _is PyInterpreterState;

PyAPI_DATA(PyTypeObject) PyTStateRef_Type;

#define PyTStateRef_Check(op) Py_IS_TYPE(op, &PyTStateRef_Type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of this needs to go in a public header file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put this here so that PyTStateRef_Check(op) could be used in _testcapi. But given how simple the definition is I could get rid of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelleZijlstra I can't get this to work if I remove the PyAPI_DATA export. Any idea where I could put that to avoid affecting the stable API? I think one thing to consider, though, is that the new PyTStateRef objects are surfaced to Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too experienced with the C API, but I am pretty confident we don't want to expose this class in the C API without a strong reason. Generally we're trying to restrict the C API so that it is easier to evolve CPython. For example, changes to the PyThreadState have caused trouble several times. https://devguide.python.org/c-api/ says internal symbols should ideally be in Include/internal.

https://docs.python.org/3/c-api/init.html#c.PyThreadState also says that only the "interp" member of this struct is public, so putting it here isn't enough to allow well-behaved external users of the API to access the thread name. I think you should prefix the new struct members with underscores and add a new, documented and public function PyThreadState_GetName() for extensions to access the thread name.

@P403n1x87
Copy link
Contributor Author

What happens if you hold on to a PyTStateRef object after the corresponding thread has ended? I wasn't able to get a segfault this way, but it looks like I'd be writing into freed memory.

Hmm yes I think it's still possible to get a segfault this way ☹️. One solution that I can think of is to cache the object on the PyThreadState instance, but it doesn't feel like a very clean approach to me 🤔

@P403n1x87 P403n1x87 force-pushed the bpo-412569/propagate-thread-name branch from 0d3212f to 96bb7f6 Compare February 12, 2022 12:06
This change allows Python thread objects to propagate their name
to the underlying PyThreadState structure to make it easier for
tools that rely on the ABI, like sampling profilers, to retrieve
this information and provide reacher profiling data.
@P403n1x87 P403n1x87 force-pushed the bpo-412569/propagate-thread-name branch from 96bb7f6 to 3c6192d Compare February 12, 2022 12:10
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here's a few more comments.

/* struct _is is defined in internal/pycore_interp.h */
typedef struct _is PyInterpreterState;

PyAPI_DATA(PyTypeObject) PyTStateRef_Type;

#define PyTStateRef_Check(op) Py_IS_TYPE(op, &PyTStateRef_Type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too experienced with the C API, but I am pretty confident we don't want to expose this class in the C API without a strong reason. Generally we're trying to restrict the C API so that it is easier to evolve CPython. For example, changes to the PyThreadState have caused trouble several times. https://devguide.python.org/c-api/ says internal symbols should ideally be in Include/internal.

https://docs.python.org/3/c-api/init.html#c.PyThreadState also says that only the "interp" member of this struct is public, so putting it here isn't enough to allow well-behaved external users of the API to access the thread name. I think you should prefix the new struct members with underscores and add a new, documented and public function PyThreadState_GetName() for extensions to access the thread name.

@@ -783,7 +783,7 @@ init_threadstate(PyThreadState *tstate,
tstate->datastack_chunk = NULL;
tstate->datastack_top = NULL;
tstate->datastack_limit = NULL;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change

@@ -1034,6 +1034,11 @@ PyThreadState_Clear(PyThreadState *tstate)
Py_CLEAR(tstate->async_gen_finalizer);

Py_CLEAR(tstate->context);

if (tstate->ref != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how PyThreadState interacts with the garbage collector. How will the GC know that the thread state ref object is alive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ref object has a life on its own. The PyThreadState creates the ref object. Once the thread state is deallocated, it sets the actual ref inside the ref object to NULL and decrements its refcount. The ref object then lives on until there are references to it. Otherwise, it is GC'd at the next collection cycle.

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I am not convinced this is a good idea. We are moving more and more to a private thread state, and we want to be able to modify its ABI, fields and ordering even between patch releases so I am not comfortable assuring that even more fields can be retrieved via a constant offset from the thread state. Also, adding Python objects to the thread state is possible, but makes destruction more tricky (for example, destroying Python objects must be done with the GIL held, but when the thread state is being teared down from an extension module the GIL may be unlocked).

Additionally, I think this PR is adding quite a lot of extra complexity for very restricted gains.

I am not strongly against this, but unless there is strong consensus that this is clearly a win, I am not confident with this.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@iritkatriel
Copy link
Member

Closing since there were objections to the change and this has been idle for over a year since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants