Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40522: Store tstate in a Thread Local Storage #23976

Open
wants to merge 1 commit into
base: master
from

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Dec 28, 2020

If Python is built with GCC or clang, the current interpreter and the
current Python thread state are now stored in a Thread Local Storage.

Changes:

  • GCC and clang use the __thread keyword to declare the TLS
    variables.
  • Add set_current_tstate() sub-function which sets these two new TLS
    variables (if available).
  • _PyThreadState_Swap() and _PyThreadState_DeleteCurrent() now call
    set_current_tstate().
  • _PyThreadState_GET() and _PyInterpreterState_GET() now use the TLS
    variable if available.

https://bugs.python.org/issue40522

@vstinner
Copy link
Member Author

@vstinner vstinner commented Dec 28, 2020

Build fails on macOS with Clang 12.0.0 (clang-1200.0.32.27): "illegal thread local variable reference"

building 'math' extension
gcc -Wno-unused-result -Wsign-compare -g -O0 -Wall -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden -I./Include/internal -I./Include -I. -I/usr/local/include -I/Users/runner/work/cpython/cpython/Include -I/Users/runner/work/cpython/cpython -c /Users/runner/work/cpython/cpython/Modules/mathmodule.c -o build/temp.macosx-10.15-x86_64-3.10-pydebug/Users/runner/work/cpython/cpython/Modules/mathmodule.o -DPy_BUILD_CORE_MODULE
gcc -bundle -undefined dynamic_lookup build/temp.macosx-10.15-x86_64-3.10-pydebug/Users/runner/work/cpython/cpython/Modules/mathmodule.o Modules/_math.o -L/usr/local/lib -lm -o build/lib.macosx-10.15-x86_64-3.10-pydebug/math.cpython-310d-darwin.so

ld: illegal thread local variable reference to regular symbol __Py_current_tstate for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
@vstinner
Copy link
Member Author

@vstinner vstinner commented Dec 28, 2020

On MSC, we can try to use __declspec(thread). I tried but I got an error about DLL export. Maybe it should not be exported, and so the variables should not be used when building an extension module (not built as a builtin module).

@vstinner
Copy link
Member Author

@vstinner vstinner commented Dec 28, 2020

cc @markshannon who loves TLS :-)

@vstinner
Copy link
Member Author

@vstinner vstinner commented Dec 28, 2020

See https://bugs.python.org/issue40522#msg383899 for the emitted assembly code.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Dec 28, 2020

"In C11, the keyword _Thread_local is used to define thread-local variables. The header <threads.h>, if supported, defines thread_local as a synonym for that keyword."
https://en.wikipedia.org/wiki/Thread-local_storage#C.2B.2B

If Python is built with GCC or clang, the current interpreter and the
current Python thread state are now stored in a Thread Local Storage.

Changes:

* configure checks for C11 _Thread_local keyword.
* Use _Thread_local keyword, GCC and clang __thread extension.
* Add set_current_tstate() sub-function which sets these two new TLS
  variables (if available).
* _PyThreadState_Swap() and _PyThreadState_DeleteCurrent() now call
  set_current_tstate().
* _PyThreadState_GET() and _PyInterpreterState_GET() now use the TLS
  variable if available.
@vstinner vstinner force-pushed the vstinner:thread_tstate branch from 10252c2 to fd097fa Dec 28, 2020
@markshannon
Copy link
Contributor

@markshannon markshannon commented Jan 4, 2021

Would it be possible to keep all the portability macros in one place by putting something like #define Py_ThreadLocal(type, varname) ... in pyport.h?

@markshannon
Copy link
Contributor

@markshannon markshannon commented Jan 4, 2021

One other remark (not for this PR, but for future work):
The HotSpot JVM uses aligned stacks to be able to access its thread-local information as fast as possible by zeroing the low bits of the machine stack pointer.
It is possible to get very close to this in (mostly) portable C https://godbolt.org/z/eG64zz

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 4, 2021

I need to test -femulated-tls clang flag. See also: http://llvm.org/docs/LangRef.html#thread-local-storage-models

@python python deleted a comment from leandrogmuller Jan 13, 2021
@github-actions
Copy link

@github-actions github-actions bot commented Feb 13, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Feb 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants