New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-31582: Created a new section describing sys.path initialization #31082
Conversation
Moved common details from the Windows specific finding modules section to a new platform neutral section. Linked to the new section from the relevant module documenation.
Suggested edit from @eryksun Co-authored-by: Eryk Sun <eryksun@gmail.com>
@zooba good to merge now? |
@zooba are we good to merge? Please let me know what concerns you may have. |
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Doc/using/cmdline.rst
Outdated
module search path even if the archive does not exist. If no archive was found, | ||
Python on Windows will continue the search for ``prefix`` by looking for :file:`Lib\\os.py` | ||
or :file:`Lib\\os.pyc`. Python on Unix will look for :file:`lib/python{majorversion}.{minorversion}/os.py` | ||
(``lib/python3.11/os.py``) or :file:`lib/python{majorversion}.{minorversion}/os.pyc` (``lib/python3.11/os.pyc``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true? On Unix systems these days, the .pyc files are supposed to be in lib/python3.n/__pycache__/
and have a version-specific and opt-level specific suffix. For example, on a current macOS framework install:
$ cd /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10
$ ls *.pyc
ls: *.pyc: No such file or directory
$ ls __pycache__/
__future__.cpython-310.opt-1.pyc nntplib.cpython-310.opt-1.pyc
__future__.cpython-310.pyc nntplib.cpython-310.pyc
__phello__.foo.cpython-310.opt-1.pyc ntpath.cpython-310.opt-1.pyc
[...]
The old unqualified .pyc form of the file name in the lib directly may still be supported as a legacy but that should be the exception these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted! I think you are correct, this should be updated in getpath.py I believe, @zooba what do you think?
At the moment line 170 in getpath.py has:
if os_name == 'posix' or os_name == 'darwin':
BUILDDIR_TXT = 'pybuilddir.txt'
BUILD_LANDMARK = 'Modules/Setup.local'
DEFAULT_PROGRAM_NAME = f'python{VERSION_MAJOR}'
STDLIB_SUBDIR = f'{platlibdir}/python{VERSION_MAJOR}.{VERSION_MINOR}'
STDLIB_LANDMARKS = [f'{STDLIB_SUBDIR}/os.py', f'{STDLIB_SUBDIR}/os.pyc']
PLATSTDLIB_LANDMARK = f'{platlibdir}/python{VERSION_MAJOR}.{VERSION_MINOR}/lib-dynload'
BUILDSTDLIB_LANDMARKS = ['Lib/os.py']
VENV_LANDMARK = 'pyvenv.cfg'
ZIP_LANDMARK = f'{platlibdir}/python{VERSION_MAJOR}{VERSION_MINOR}.zip'
DELIM = ':'
SEP = '/'
elif os_name == 'nt':
BUILDDIR_TXT = 'pybuilddir.txt'
BUILD_LANDMARK = f'{VPATH}\\Modules\\Setup.local'
DEFAULT_PROGRAM_NAME = f'python'
STDLIB_SUBDIR = 'Lib'
STDLIB_LANDMARKS = [f'{STDLIB_SUBDIR}\\os.py', f'{STDLIB_SUBDIR}\\os.pyc']
PLATSTDLIB_LANDMARK = f'{platlibdir}'
BUILDSTDLIB_LANDMARKS = ['Lib\\os.py']
VENV_LANDMARK = 'pyvenv.cfg'
ZIP_LANDMARK = f'python{VERSION_MAJOR}{VERSION_MINOR}{PYDEBUGEXT or ""}.zip'
WINREG_KEY = f'SOFTWARE\\Python\\PythonCore\\{PYWINVER}\\PythonPath'
DELIM = ';'
SEP = '\\'
Both the Windows and Unix os.pyc STDLIB_LANDMARKS should be updated/removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created https://bugs.python.org/issue46909 for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for f'{STDLIB_SUBDIR}\\os.pyc'
is even more historical now. Unless the standard library is manually precompiled, which is optional when installing, Python 3.11 won't have an "os[.*].pyc" file in any directory. The os
module is frozen in 3.11, so there's no reason to distribute or cache it as a PYC file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zooba closed https://bugs.python.org/issue46909 as the searching for .pyc is used. Probably best to leave it out of the the documentation though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os module is frozen in 3.11, so there's no reason to distribute or cache it as a PYC file.
This is a good point, we probably need to pick a different landmark (perhaps site.py
?).
I do think detecting the precompiled PYC is correct here. Precompiling on install generates the __pycache__
directories, but precompiling manually for an embedded runtime is totally legitimate (though I believe more performant if you also zip up the .pycs and put the ZIP file somewhere discoverable, which is why the ZIP detection overrides these landmarks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, we probably need to pick a different landmark (perhaps
site.py
?).
site.py is also frozen. See Tools/scripts/freeze_modules.py. Maybe add an empty landmark file that includes the Python version in its name, e.g. __py_stdlib_cpython_3_11
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steve, to be clear, do you think the docs should mention searching for os.pyc or not?
Thanks for doing this, BTW. I think it will be a nice improvement to the docs. |
I have made the requested changes; please review again. |
Doc/using/cmdline.rst
Outdated
|
||
The next items added are the directories containing standard Python modules as | ||
well as any :term:`extension module`\s that these modules depend on. Extension | ||
modules are .dll files on Windows, .dylib files on macOS and .so files on Linux. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I may have mislead you here. Extension modules have a .so
extension on macOS as well as on Linux and on other Unix-y systems - all other ones AFAIK (like the BSD's). What is true is that shared libraries on macOS typically have a .dylib
extension but Python extension modules on macOS aren't shared libraries. (A package installed (via pip, say) that includes an extension module might also install a shared library but I think that level of detail isn't needed here.) So perhaps this should be reworded to be .dll files on Windows, .so files on other platforms
. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extension modules are ".pyd" files in Windows, which in CPython are DLL files (IMAGE_FILE_DLL
to be specific in terms of the PE IMAGE_FILE_HEADER.Characteristics
). The ".pyd" file extension is used in Windows in order to associate a Python icon with the file, as defined by the "Python.Extension" progid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem @ned-deily. Thanks @eryksun , I never knew why Windows used .pyd, always seemed curious to me that Python had these renamed .dll files!
Doc/using/cmdline.rst
Outdated
~~~~~~~~~~ | ||
|
||
To completely override :data:`sys.path`, create a ``._pth`` file with the same | ||
name as the extension module (``python311._pth``) or the executable (``python._pth``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some confusion here: I believe the python311
here is referring to the Python interpreter shared library (a DLL), not an extension module (confusingly also a DLL). @zooba, can you confirm and perhaps suggest a more correct wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it this a Windows only feature? If not, the wording needs to take into account the fact that on Unix-y systems the python executable might have a shared library component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the same name as and located beside the Python executable or shared library (not "extension module") -- e.g. "python.exe" -> "python._pth" or "python311.dll" -> "python311._pth".
Support for "._pth" files was extended to all platforms. I've tested this in Linux for a "._pth" file that's based on the real executable name, as well as a symlink to the executable, as well as a symlink to the "._pth" file. All work, but note that relative paths are resolved against the opened location of the "._pth" file. The real path of the "._pth" file is not used. I haven't checked how a "._pth" file works when named for a shared library, if the interpreter is built as such. Apparently this configuration isn't tested in POSIX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I have been mixing my shared library and extension module terms far too freely. Will review.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @ned-deily: please review the changes made to this pull request. |
Doc/library/sys.rst
Outdated
* :ref:`using-on-finding-modules` for further details about the | ||
initialization of :data:`sys.path`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to move it outside the seealso section, it's directly related to sys.path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done.
Doc/using/cmdline.rst
Outdated
@@ -59,6 +59,7 @@ all consecutive arguments will end up in :data:`sys.argv` -- note that the first | |||
element, subscript zero (``sys.argv[0]``), is a string reflecting the program's | |||
source. | |||
|
|||
.. _using-on-interface-option-c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not needed, use:
:option:`-c <command> <-c>`
(I'm not sure if it works for -c because of its <command>
argument, I never tried.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works nicely, thank you for that!
Doc/using/cmdline.rst
Outdated
@@ -252,6 +254,7 @@ Miscellaneous options | |||
options). See also :envvar:`PYTHONDEBUG`. | |||
|
|||
|
|||
.. _using-on-misc-option-uppercase-e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use:
:option:`-E`
Doc/using/cmdline.rst
Outdated
.. _using-on-finding-modules: | ||
|
||
Finding modules | ||
--------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of adding a new file to the doc? This section doesn't directly belong to the "Command line and environment". I suggest adding the new file to the Doc/library/ directory.
Example: Doc/library/sys_path_init.rst
.
IMO listing this file in https://docs.python.org/dev/library/modules.html is a good place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this section doesn't fit neatly into the "Command line and environment" page. I'm open to the idea of a new page. Be interested to know what @zooba and @ned-deily think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the section into a new doc. Lets see what people think of this approach.
Doc/using/cmdline.rst
Outdated
--------------- | ||
|
||
A module search path is initialized when Python starts. This module search path | ||
may be accessed at :data:`sys.path`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be accessed at :data:`sys.path`. | |
is accessed at :data:`sys.path`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be accessed
is correct IMO. The data is available at sys.path, but you don't have to go and view it, that is what I was getting at with may be accessed
. The other correct alternative is is available at sys.path
Doc/using/cmdline.rst
Outdated
(``lib/python3.11/os.py``). On Windows ``prefix`` and ``exec_prefix`` are the same, | ||
however on other platforms :file:`lib/python{majorversion}.{minorversion}/lib-dynload` | ||
(``lib/python3.11/lib-dynload``) is searched for and used as an anchor for | ||
``exec_prefix``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should document sys.platlibdir and PYTHONPLATLIBDIR env var somewhere.
See: https://docs.python.org/dev/library/sys.html#sys.platlibdir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have now mentioned them.
Doc/using/cmdline.rst
Outdated
.. note:: | ||
|
||
Certain command line options may further affect path calculations. | ||
See :ref:`-E <using-on-misc-option-uppercase-e>`, :ref:`-I <using-on-misc-option-uppercase-i>`, :ref:`-s <using-on-misc-option-s>` and :ref:`-S <using-on-misc-option-uppercase-s>` for further details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of formatting, I suggest a list instead:
.. note:: | |
Certain command line options may further affect path calculations. | |
See :ref:`-E <using-on-misc-option-uppercase-e>`, :ref:`-I <using-on-misc-option-uppercase-i>`, :ref:`-s <using-on-misc-option-s>` and :ref:`-S <using-on-misc-option-uppercase-s>` for further details. | |
Certain command line options may further affect path calculations: | |
* :ref:`-E <using-on-misc-option-uppercase-e>`: ignore environment variables | |
* :ref:`-I <using-on-misc-option-uppercase-i>`: isolated mode | |
* :ref:`-s <using-on-misc-option-s>`: don't add user directory | |
* :ref:`-S <using-on-misc-option-uppercase-s>`: don't import the :mod:`site` module |
It would be nice to have at one place, the exhaustive list of things impacting sys.path:
- current working directory
- path of the Python executable program
- list of cmdline options
- list of env vars
- list of config files: see my list at https://docs.python.org/dev/c-api/init_config.html#python-path-configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a more verbose description, see 59cbc07#r822648316 however @zooba preferred the more succint approach.
Doc/using/cmdline.rst
Outdated
Embedded Python | ||
~~~~~~~~~~~~~~~ | ||
|
||
If Python is embedded within another application :c:func:`Py_SetPath` can be used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py_SetPath is deprecated. I suggest to document the PyConfig API first.
There is also Py_SetPathEx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do
Doc/using/cmdline.rst
Outdated
@@ -994,3 +1000,110 @@ Debug-mode variables | |||
Need Python configured with the :option:`--with-trace-refs` build option. | |||
|
|||
.. versionadded:: 3.11 | |||
|
|||
.. _using-on-finding-modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
importlib finds module. This section is only about sys.path initialization: it can be modified afterwards by modifying directly sys.path list. I suggest to rename the link to "sys-path-init".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair comment, I have amended.
Co-authored-by: Victor Stinner <vstinner@python.org>
Doc/using/cmdline.rst
Outdated
To completely override :data:`sys.path`, create a ``._pth`` file with the same | ||
name as the shared library (``python311._pth``) or the executable (``python._pth``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the shared-library case in Linux. It appears to not be supported at all. This may be the case on all POSIX platforms, including macOS. This description should emphasize that the shared library is used only if its path is known. It's always known in Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISTM that the path of the shared library in POSIX could be known in many if not most cases, since most Unix platforms support the non-standard dladdr()
function. For example, call dladdr(PyObject_Call, &info)
, which returns the object path in info.dli_fname
. If the interpreter is known to have been built as a shared library, then dli_fname
is either the absolute path of the library or relative to the current working directory at startup.
I wouldn't trust the result if Python is linked statically in the application, because the path comes from the command line (at least in Linux), which isn't necessarily related to the file that was actually executed. Then again, the code that sets sys.executable
doesn't seem to care about this in POSIX. For example, run Python with the command line "./spam", where "./spam" does not exist, but actually execute the real Python binary. Then sys.executable
will be the non-existent path "./spam", and "./spam._pth" will be used if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of the latter has leaked into Windows Python in 3.11. An existing file or directory from the command line will be used as sys.executable
, but only if the path from the command line uses backslashes instead of forward slashes.
For example, spawn Python by calling CreateProcessW()
with lpCommandLine
set to "C:\\Windows"
and lpApplicationName
set to the real executable path. Then sys.executable
will be "C:\\Windows"
(a directory), and "C:\\Windows._pth"
will used if it exists. But change the command line to "C:/Windows"
, and sys.executable
will be the path of the executed file.
From behavior in prior versions, I expect sys.executable
in Windows (or sys._base_executable
in the venv case) to always be the path of the process image, i.e. lpApplicationName
from the CreateProcessW()
call, never the argv[0]
file/directory from the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the shared-library case in Linux. It appears to not be supported at all. This may be the case on all POSIX platforms, including macOS. This description should emphasize that the shared library is used only if its path is known. It's always known in Windows.
Thanks for checking this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eryksun That last change you mention is deliberate, as it's actually essential to fix some symlink handling. It does open some interesting exploitability, however, using the ._pth
next to library
is always recommended if you're worried about someone hijacking it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zooba, if it's intentional, maybe it should support forward slash for the leaf component, e.g. r"C:/Windows\py.exe"
works, but passing r"C:/Windows/py.exe"
as argv[0]
falls back on using the real process image path for sys.executable
. Though I don't follow the necessity due to symlinks. The Windows loader never resolves symlinks for GetModuleFileNameW()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably right about the forward slash, I haven't looked back at why it's acting like that.
Possibly the symlink issue I'm thinking of is Explorer? I dug into it a while back, but I think ShellExecute resolves the symlink before launching, but passes the original path through argv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly the symlink issue I'm thinking of is Explorer?
Ah, the shell API. It's always the shell API...
I'm happy with this I think. @vstinner had the last thorough review, so if he's happy with the changes then let's put this in the docs! (Never too late to make more updates as necessary) |
Moved common details from the Windows specific finding modules section to a new platform neutral section. Linked to the new section from the relevant module documenation.
https://bugs.python.org/issue31582