Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpython3.so doesn't contain any symbols (stable ABI), cannot link to it. #104612

Open
beeender opened this issue May 18, 2023 · 14 comments
Open
Labels
build The build process and cross-build topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@beeender
Copy link

beeender commented May 18, 2023

Bug report

When playing with python3 stable ABI by following code:

#define Py_LIMITED_API
#include <Python.h>
#include <stdio.h>

int main(int argc, char** argv) {
    char* ver = Py_GetVersion();
    printf("Py version: %s", ver);
    return 0;
}

Linking with libpython3.11.so which works:

~/tmp/py_stable_abi via C v13.1.1-gcc
❯ gcc main.c -I/usr/include/python3.11 -lpython3.11  -L/usr/lib
main.c: In function ‘main’:
main.c:6:17: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
    6 |     char* ver = Py_GetVersion();
      |                 ^~~~~~~~~~~~~

~/tmp/py_stable_abi via C v13.1.1-gcc
❯ ./a.out
Py version: 3.11.3 (main, Apr  5 2023, 15:52:25) [GCC 12.2.1 20230201]%

Linking with libpython3.so, doesn't work:

~/tmp/py_stable_abi via C v13.1.1-gcc
❮ gcc main.c -I/usr/include/python3.11 -lpython3  -L/usr/lib
main.c: In function ‘main’:
main.c:6:17: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
    6 |     char* ver = Py_GetVersion();
      |                 ^~~~~~~~~~~~~
/usr/bin/ld: /tmp/ccfEEFx2.o: in function `main':
main.c:(.text+0x10): undefined reference to `Py_GetVersion'
collect2: error: ld returned 1 exit status

Actually, the libpython3.so doesn't contain any meaningful symbols, and the size is suspiciously small:

/lib🔒
❯ ls libpython* -l
lrwxrwxrwx 1 root root      19 Oct 11  2021 libpython2.7.so -> libpython2.7.so.1.0
-r-xr-xr-x 1 root root 6935488 Oct 11  2021 libpython2.7.so.1.0
lrwxrwxrwx 1 root root      20 Apr  5 23:52 libpython3.11.so -> libpython3.11.so.1.0
-rwxr-xr-x 1 root root 5866336 Apr  5 23:52 libpython3.11.so.1.0
lrwxrwxrwx 1 root root      20 Nov  5  2021 libpython3.7m.so -> libpython3.7m.so.1.0
-rwxr-xr-x 1 root root 3088608 Nov  5  2021 libpython3.7m.so.1.0
lrwxrwxrwx 1 root root      19 May 25  2022 libpython3.9.so -> libpython3.9.so.1.0
-rwxr-xr-x 1 root root 3765312 May 25  2022 libpython3.9.so.1.0
-rwxr-xr-x 1 root root   13816 Apr  5 23:52 libpython3.so
❯ nm -gD libpython3.so
                 w __cxa_finalize
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable

The statement to build libpython3.so looks strange, I am not sure what it is trying to do:

cpython on  main [?] via C v13.1.1-gcc via 🐍 v3.11.3
❯ ./configure --enable-shared
gcc -shared     -Wl,--no-as-needed -o libpython3.so -Wl,-hlibpython3.so libpython3.12.so

Does the above really do anything?

From https://peps.python.org/pep-0384/

On Unix systems, the ABI is typically provided by the python executable itself. PyModule_Create is changed to pass 3 as the API version if the extension module was compiled with Py_LIMITED_API; the version check for the API version will accept either 3 or the current PYTHON_API_VERSION as conforming. If Python is compiled as a shared library, it is installed as both libpython3.so, and libpython3.y.so; applications conforming to this PEP should then link to the former (extension modules can continue to link with no libpython shared object, but rather rely on runtime linking). The ABI version is symbolically available as PYTHON_ABI_VERSION.

The current libpython3.so is clearly not doing the right thing. Applications CANNOT link to it.

Your environment

 OS: Arch Linux
 Kernel: x86_64 Linux 6.3.1-zen2-1-zen

Checked the python3.9/3.10/3.11 from Arch's official repo, all have this problem.
Tried to build the cpython as well, the same problem.

@beeender beeender added the type-bug An unexpected behavior, bug, or error label May 18, 2023
@sunmy2019
Copy link
Member

sunmy2019 commented May 19, 2023

This is indeed a bug now.

gcc -shared     -Wl,--no-as-needed -o libpython3.so -Wl,-hlibpython3.so libpython3.12.so

This is introduced 12 yrs ago. My guess would be the toolchain has evolved in these years. Linkers nowadays cannot resolve symbols from the dependencies of a shared library.

I need to consult some linker experts.

Before we figure out a solution, link to the libpython3.12.so.


Updated:
We need a shared library with the same symbols as the libpython3.12.so, but with a different soname libpython3.so.

@sunmy2019 sunmy2019 added the build The build process and cross-build label May 19, 2023
@beeender
Copy link
Author

Cannot we just copy the libpython3.12.so to libpython3.so? Unless there are some differences between them by design.

@sunmy2019
Copy link
Member

Unless there are some differences

The soname is different.

@itamaro
Copy link
Contributor

itamaro commented May 21, 2023

Did this ever work?

I tried playing with this a bit, and can confirm the report is reproducible (with a fresh build from main branch, as well as with a copy of python 3.6 I happened to have around), using gcc 8.5 on Red Hat Linux.

With a certain combination of flags and setting LD_LIBRARY_PATH for the gcc invocation itself, it looked like the linker was at least aware of the existence of the linked libpython, but it refused to add symbols from it if it's not explicitly included in the command line:

/usr/bin/ld: /tmp/cc9r3PUU.o: undefined reference to symbol 'Py_GetVersion'
./libpython3.12.so.1.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

Some things that "worked":

Include both python3 and python3.12: gcc -I. -L. main.c -lpython3 -lpython3.12

this results both so's included in the DT NEEDED of the binary

readelf -d a.out | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libpython3.so]
 0x0000000000000001 (NEEDED)             Shared library: [libpython3.12.so.1.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

and if I patchelf the binary to drop libpython3.12.so, it still works at runtime because of the transitive dependency

patchelf --remove-needed libpython3.12.so.1.0 a.out
LD_LIBRARY_PATH=. ./a.out
Py version: 3.12.0a7+ ...

but if we're going through that trouble, then this would be equivalent and arguably cleaner:

# link only against the "real" DSO
gcc -I. -L. main.c -lpython3.12
# replace the DT NEEDED entry to use the ABI DSO
patchelf --replace-needed libpython3.12.so.1.0 libpython3.so a.out
LD_LIBRARY_PATH=. ./a.out
Py version: 3.12.0a7+ ...

Potentially the simplest workaround is to delete the "useless" libpython3.so and replace it with a symlink:

rm libpython3.so
ln -s libpython3.12.so libpython3.so
gcc -I. -L. main.c -lpython3

This still leaves us with a binary that specifies the 3.12 DSO, because that's the soname we link against

readelf -d a.out | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libpython3.12.so.1.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

which brings us back to patchelf'ing it to use the ABI DSO

patchelf --replace-needed libpython3.12.so.1.0 libpython3.so a.out
LD_LIBRARY_PATH=. ./a.out
Py version: 3.12.0a7+ ...

I think to actually fix it, we need libpython3.so to be a copy of libpython3.12.so with the correct soname, e.g.:

rm libpython3.so
cp libpython3.12.so libpython3.so
patchelf --set-soname libpython3.so libpython3.so
gcc -I. -L. main.c -lpython3
readelf -d a.out | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libpython3.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
LD_LIBRARY_PATH=. ./a.out
Py version: 3.12.0a7+ ...

@encukou
Copy link
Member

encukou commented May 22, 2023

The .so is useless, IMO we should delete it without replacement.

It's not needed for extensions (importable modules), since those normally shouldn't link to libpython -- they're loaded by Python, which already has the correct libpython loaded. (If you link to libpython yourself, you'll e.g. get the non-debug version for debug builds, and you definitely don't want that.)

For embedding, you can build & distribute Python with the embedding application (and then you don't need stable ABI), or try finding a “system libpython” to link to (which is tricky, but mostly possible -- but you'll end up with the versioned .so anyway).

We can definitely improve the situation for both cases, but I don't think copying or symlinking the current .so is worth it. (It should have been done -- and tested -- 14 years ago with PEP-384, but I see no point doing it now just to comply with that PEP.)

IMO, a good step would be to actually limit the set of symbols libpython3.so exports, like what's done on Windows. That would help people from accidentally using functions outside the stable ABI. But I don't know if it's actually possible with .so.

@beeender
Copy link
Author

beeender commented May 22, 2023

The .so is useless, IMO we should delete it without replacement.

The stable ABI is still useful IMO. Take PostgresSQL plpython for example, the extension itself doesn't need newer versions of python at compiling time. But on rhel8/Rock8, the plpython package gets stuck with the system's python3.6. To use a new version of python, the extension has to be re-complied. But it is not necessary if the libpython3.so and stable ABI do what they suppose to do. The extension can be built with a lower version of python, but at runtime, it still can use a newer version of Python to use the modern libs.

@Yhg1s
Copy link
Member

Yhg1s commented May 22, 2023

IMO, a good step would be to actually limit the set of symbols libpython3.so exports, like what's done on Windows. That would help people from accidentally using functions outside the stable ABI. But I don't know if it's actually possible with .so.

It is possible, but I don't think it would be useful. If the python binary is built with --enable-shared, it can't be linked to libpython3.so and has to use libpython3.XX.so. Any extension module loaded by the python binary would have access to symbols loaded from the regular libpython3.XX.so even if it's linked to libpython3.so. If the python binary is statically linked with libpython (which e.g. the debian python build does even while supplying libpython.so), the symbols come from the binary itself. We could hide those symbols but only if all extension modules use the Stable ABI.

A possible alternative would be to use link-map lists (a GNU extension; see dlmopen()) to load each extension module in a separate namespace. However, populating the new namespace with the right symbols would be a bit fragile (it's easy to make a mistake and end up using initialised state from the wrong .so), and I expect it would break the expectations of a number of extension modules. I think it would be a lot easier to verify Stable ABI compliance by inspecting the .so files and the symbols it would load from libpython.

(This story is a little different on macOS, btw, which is kinda halfway between Windows and ELF in how symbols are namespaced by default. IIUC our macOS build explicitly links into a flat namespace to make it work like on ELF, but by default macOS would like to know which shared library dependency symbols should come from.)

@beeender
Copy link
Author

beeender commented Jul 14, 2023

I am trying to create a patch for this. Interestingly, I found there is a script named stable_abi.py, which was invented to do the testing about the stable ABI. But it is not called by anyone.

Where should this test be executed if I want to add it as a part of the CI process?

@encukou
Copy link
Member

encukou commented Aug 3, 2023

I am trying to create a patch for this. Interestingly, I found there is a script named stable_abi.py, which was invented to do the testing about the stable ABI. But it is not called by anyone.

It's already part of a Make rule that's used in CI. Nothing to do on that front! :)

@vstinner
Copy link
Member

I don't understand well what you are trying to do. You are trying to build a program and link it to Python 3.11. That's not the stable ABI. That's just a program linked to libpython.

The common usage of the stable ABI is to build a C extension and then load it in Python. Please elaborate what you are trying to do.

@sunmy2019
Copy link
Member

sunmy2019 commented Sep 30, 2023

I don't understand well what you are trying to do. You are trying to build a program and link it to Python 3.11. That's not the stable ABI. That's just a program linked to libpython.

The common usage of the stable ABI is to build a C extension and then load it in Python. Please elaborate what you are trying to do.

I think they are trying to do things like:

build a program and link to (libpython3.so from Python3.11)
load it with (libpython3.so from 3.12) without libpython3.11.so

which is not currently achievable. This makes libpython3.so alone useless in most scenarios.

@vstinner
Copy link
Member

I understand that the use case is to build a program and expect that it will run with any Python version available on the system. The vim text editor managed to implement this use case by targeting the stable ABI, but loading a versioned libpython.

Maybe we should provide a recipe to implement such use case.

In general, you should:

  • Restrict your C code to the limited C API: #define Py_LIMITED_API <version>.
  • Load libpython at runtime: it's tricky to do it in a portable way :-(

@sunmy2019 is correct: libpython3.so is empty and so linking a program to libpython3.so doesn't load "any Python available on the system", but it loads and you end up with many missing symbols.

If someone wants to support such use case: "load any Python available on the system using libpython3.so", the constraints should be elaborated. Which Python versions would you accept? Is Python 2.7 ok? Is an alpha version of Python 3.13 ok? Is a debug build ok? Should "Python" be loaded from anywhere on the system?

IMO this kind of problem should be solved outside Python itself, with a 3rd party project which would load the "appropriate" Python version. Load a Python program, load libpython, whatever which brings "Python" into the current process.

I saw some funky projects which would run a separate Python process and communicate with this process with IPC! Would it be an acceptable implementation? :-)

@aallrd
Copy link

aallrd commented Oct 10, 2023

We see value in libpython3.so in our usecase with Red Hat Enterprise Linux (RHEL).
Our application is built on RHEL 8 and needs to run on both RHEL 8 and RHEL 9.
We assume that the C API is stable as documented, as long as we restrict ourselves to the Py_LIMITED_API APIs.

On RHEL, the system is shipped with a Python distribution version "frozen" for the whole lifecycle of the product:

  • RHEL 8: Python 3.6.8, libpython3.so -> libpython3.6m.so
  • RHEL 9: Python 3.9.16, libpython3.so -> libpython3.9.so.1.0

The common component here is libpython3.so (SONAME), so it would be great if we could directly link our application to it.
However, since it does not define any symbols, but only has a DT_NEEDED entry we need to do some "hacks" to make it work.
My expectation would be that the libpython3.so library exports all the Py_LIMITED_API 3 symbols instead of none.

Details:

# No symbols are defined in libpython3.so (RHEL 8)
$ nm -gD --defined-only /usr/lib64/libpython3.so
0000000000201028 B __bss_start
0000000000201028 D _edata
0000000000201030 B _end
0000000000000638 T _fini
00000000000004f8 T _init

# The list of DT_NEEDED entries in libpython3.so (RHEL 8)
$ patchelf --print-needed /usr/lib64/libpython3.so
libpython3.6m.so.1.0
libpthread.so.0
libc.so.6

# We cannot link directly with libpython3.so (RHEL 8)
$ echo -e "#include <Python.h> \n void main() { Py_Initialize(); }" | gcc -x c -I/usr/include/python3.6m -lpython3 -o hello -
/opt/rh/devtoolset-11/root/usr/libexec/gcc/x86_64-redhat-linux/11/ld: /tmp/cc2sNDQi.o: undefined reference to symbol 'Py_Initialize'
/opt/rh/devtoolset-11/root/usr/libexec/gcc/x86_64-redhat-linux/11/ld: /lib64/libpython3.6m.so.1.0: error adding symbols: DSO missing from command line

# We have to link with libpython3.6m.so to link successfully (RHEL 8)
$ echo -e "#include <Python.h> \n void main() { Py_Initialize(); }" | gcc -x c -I/usr/include/python3.6m -lpython3.6m -o hello -
$ file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), BuildID[sha1]=71ae38d71cacb2ed5f037dd8c4f6e787f04c1651, for GNU/Linux 2.6.32, not stripped

# Our produced binary has a dependency on libpython3.6m.so.1.0 and not libpython3.so (RHEL 8)
$ patchelf --print-needed hello
libpython3.6m.so.1.0
libc.so.6

# We need to patch it post build (RHEL 8)
$ patchelf --replace-needed libpython3.6m.so.1.0 libpython3.so hello
$ patchelf --print-needed hello
libpython3.so
libc.so.6

## The resulting binary is successfully loaded at runtime on both RHEL 8 and RHEL 9

# RHEL 8 (runtime)
$ ldd hello
        linux-vdso.so.1 =>  (0x00007ffef22b6000)
        libpython3.so => /lib64/libpython3.so (0x00007f5abbe7e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5abbab0000)
        libpython3.6m.so.1.0 => /lib64/libpython3.6m.so.1.0 (0x00007f5abb589000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5abb36d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5abc080000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f5abb169000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f5abaf66000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5abac64000)

# RHEL 9 (runtime)
$ ldd hello
        libpython3.so => /lib64/libpython3.so (0x000000400283e000)
        libc.so.6 => /lib64/libc.so.6 (0x0000004002845000)
        libpython3.9.so.1.0 => /lib64/libpython3.9.so.1.0 (0x0000004002a4e000)
        /lib64/ld-linux-x86-64.so.2 (0x0000004000000000)
        libm.so.6 => /lib64/libm.so.6 (0x0000004002db1000)

@Toolybird
Copy link

Here is a related downstream Arch bug report. Not sure if it's all the same issue, but there was definitely a bug in the Autofoo which is now fixed in 3.12.x and above. (Arch is still on 3.11.x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build topic-C-API type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

9 participants