Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42955: Add sys.modules_names #24238

Merged
merged 2 commits into from Jan 25, 2021
Merged

bpo-42955: Add sys.modules_names #24238

merged 2 commits into from Jan 25, 2021

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Jan 18, 2021

Add sys.module_names attribute: the list of the standard library
module names.

https://bugs.python.org/issue42955

@vstinner vstinner requested a review from python/windows-team as a code owner Jan 18, 2021
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

The most important part of the PR is the Tools/scripts/generate_module_names.py script which generates the list.

It ignores the following modules. I'm not sure if we should ignore them or not.

IGNORE = {
    '__pycache__',
    'site-packages',

    # Helper modules for public module
    # (ex: _osx_support is used by sysconfig)
    '_aix_support',
    '_collections_abc',
    '_compat_pickle',
    '_compression',
    '_markupbase',
    '_osx_support',
    '_sitebuiltins',
    '_strptime',
    '_threading_local',
    '_weakrefset',

    # Used to bootstrap setup.py
    '_bootsubprocess',

    # test modules
    'test',
    '__phello__.foo',

    # pure Python implementation
    '_py_abc',
    '_pydecimal',
    '_pyio',
}

I chose to ignore these modules to make the list looking nicer. But it makes the list "not correct".

For Windows, I'm lazy and hardcoded the list since it's short and is no updated often:

WINDOWS_MODULES = (
    "_msi",
    "_testconsole",
    "msvcrt",
    "winreg",
    "winsound"
)
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

With this PR, sys.module_names contains 295 names. It should contain the exact same number on any platform. A module is listed even if it's disabled explicitly at build time.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

I updated the documentation:

Some special stdlib modules are excluded from this list, like "test" and private helper modules of public modules.

I also fixed the code to also list sub-packages: the new module count is now 313.

Should we also list package sub-modules like asyncio.base_events?

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

ensurepip._bundled is not included since it doesn't contain any .py file, only .whl files.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

ensurepip._bundled is not included since it doesn't contain any .py file, only .whl files.

My bad, it's listed: there is Lib/ensurepip/_bundled/__init__.py.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

@pablogsal @serhiy-storchaka: Ok, this PR is now ready for your review :-) I updated the documentation to explicit which modules are included and which are excluded:

A tuple of strings giving the names of standard library modules.

All module kinds are listed: pure Python, built-in, frozen and extension
modules. Modules which are not available on some platforms and modules
disabled at Python build are also listed.

For packages, only sub-packages are listed, not sub-modules. For example,
``concurrent.futures`` is listed, but not ``concurrent.futures.base``.

Some special stdlib modules are excluded, like test and private modules.

It is a superset of the :attr:`sys.builtin_module_names` list.
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

There is the dump of sys.module_names, 296 modules: http://paste.alacon.org/47013

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

I modified the script to not ignore any module: with such hack, sys.module_names contains 1869 names. List of the 1575 ignored modules: http://paste.alacon.org/47014

I don't think that we should include these test modules and sub-modules. IMO only listing parent packages is enough. It's easy to detect that "asyncio" is a stdlib module from the "asyncio.base_events" name.

@vstinner vstinner force-pushed the vstinner:module_names branch from 1d5b2fe to bdcb735 Jan 18, 2021
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

I rebased my PR, squashed commits, and fixed a few comments of Tools/scripts/generate_module_names.py.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

"make regen-module-names" should be tested on macOS and FreeBSD, I'm not sure that setup.py reports properly missing modules in all cases.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

On FreeBSD, make regen-module-names does not change Python/module_names.h (sys.module_names also contains 296 modules).

I wrote a short script to check which modules can be imported or not. Only 9 modules cannot be import on my FreeBSD VM:

_msi: False
_tkinter: False
msilib: False
msvcrt: False
spwd: False
tkinter: False
turtle: False
winreg: False
winsound: False

There are 5 modules specific to Windows (_msi, msilib, msvcrt, winreg, winsound), spwd doesn't exist on FreeBSD, _tkinter probably needs a missing build dependency (and turtle needs it).

import_all.py script:

import sys
import io

def import_ok(name):
    try:
        __import__(name)
    except ImportError:
        return False
    else:
        return True

stdout = sys.stdout
stderr = sys.stderr

for name in sys.module_names:
    sys.stdout = io.StringIO()
    sys.stderr = io.StringIO()
    ok = import_ok(name)
    sys.stdout = stdout
    sys.stderr = stderr
    print(f"{name}: {ok}")
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

On Linux (on my Fedora 33 laptop), only 5 modules of sys.module_names cannot be imported, the 5 Windows specific modules:

_msi: False
msilib: False
msvcrt: False
winreg: False
winsound: False
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

On Windows, 22 modules cannot be imported:

_crypt: False
_curses: False
_curses_panel: False
_dbm: False
_gdbm: False
_posixshmem: False
_posixsubprocess: False
crypt: False
curses: False
fcntl: False
grp: False
nis: False
ossaudiodev: False
posix: False
pty: False
pwd: False
readline: False
resource: False
spwd: False
syslog: False
termios: False
tty: False

Oh, there are 3 built-in modules on Windows which are not listed on Linux:

_winapi
_xxsubinterpreters
nt

I chose to exclude _xxsubinterpreters in Tools/scripts/generate_module_names.py:

    # Experimental module
    '_xxsubinterpreters',

I would prefer to have the same list on Linux and Windows.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

I updated the PR to add 3 modules (_winapi, _xxsubinterpreters, nt). sys.module_names now contains 299 modules on all platforms.

Note: I checked that all built-in modules listed in Modules/config.c on Linux and PC/config.c on Windwos are listed by Python/module_names.h.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 18, 2021

I created PR #24254 for bpo-42923 to only dump third party extensions on a Python fatal error (Py_FatalError(), faulthandler fatal signal). The PR is based on this PR.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 19, 2021

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 19, 2021

@ronaldoussoren: The important part of this PR is the documentation. Do you think that it clearly describe what can found in the list and limitations? Or do you disagree with including modules which are not available?

@vstinner vstinner force-pushed the vstinner:module_names branch from e01b4cb to b331747 Jan 19, 2021
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 19, 2021

I rebased this PR which made it way simpler to only focus on adding the sys.module_names list.

I already merged the uncontroversial part in a private API. I started with a private list _Py_module_names in a new Python/module_names.h file: cad8020 It unblocked bpo-42923 to dump third party extension modules on a fatal error.

@vstinner vstinner force-pushed the vstinner:module_names branch 2 times, most recently from 340b544 to bcf6e5b Jan 19, 2021
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 19, 2021

I enhanced Tools/scripts/generate_module_names.py to reorder the list and to avoid duplicates. sysmodule.c no longer has to remove duplicates at runtime.

@vstinner vstinner force-pushed the vstinner:module_names branch from bcf6e5b to 8bcbd4c Jan 19, 2021
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 19, 2021

I enhanced Tools/scripts/generate_module_names.py to reorder the list and to avoid duplicates. sysmodule.c no longer has to remove duplicates at runtime.

Hum, the generated list can be sorted as well. I simplified the runtime construction of sys.module_names even more.

Doc/library/sys.rst Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
Python/pylifecycle.c Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
Doc/library/sys.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.10.rst Outdated Show resolved Hide resolved
Add sys.module_names, containing the list of the standard library
module names.
@vstinner vstinner force-pushed the vstinner:module_names branch from 8bcbd4c to 013effb Jan 20, 2021
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 20, 2021

I updated the PR to take @pablogsal review in account:

  • Change sys.module_names type to frozenset.
  • No longer ignore private modules (only ignore test modules).
  • Avoid unsafe PySequence_Contains().
  • Rephrase the documentation.
  • Remove the confusing note in the doc about sys.path and import.

Sorry, I amended my commit to be able to modify the commit message.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 21, 2021

@ronaldoussoren @serhiy-storchaka @pablogsal: I plan to merge this PR next monday. Please tell me if you want to review it before that.

When I created https://bugs.python.org/issue42955 I wasn't sure if sys.module_names would be useful, but then I found tons of use cases, and multiple persons told me that they need it for their projects (see the issue, I listed all of them).

Maybe we could add in addition a way to get paths of the stdlib, but I suggest to do that separately. Multiple use cases cannot import modules, but need to check the module name.

@vstinner vstinner merged commit db584bd into python:master Jan 25, 2021
11 checks passed
11 checks passed
Docs
Details
Check for source changes
Details
Check if generated files are up to date
Details
Windows (x86)
Details
Windows (x64) Windows (x64)
Details
macOS
Details
Ubuntu
Details
Azure Pipelines PR #20210120.2 succeeded
Details
Travis CI - Pull Request Build Passed
Details
bedevere/issue-number Issue number 42955 found
Details
bedevere/news News entry found in Misc/NEWS.d
@vstinner vstinner deleted the vstinner:module_names branch Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants