Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-38671: Make sure to return an absolute path in pathlib._WindowsFlavour.resolve() #17716
Conversation
87e0e37
to
78953ba
This guarentees we are returning an absolute path when the input `path` is relative. Nothing would change if `path` is already absolute.
78953ba
to
34c68f7
if previous_s == s: | ||
return path | ||
return os.path.join(s or os.getcwd(), *reversed(tail_parts)) |
This comment has been minimized.
This comment has been minimized.
eryksun
Dec 28, 2019
Contributor
At the beginning, I'd rather set s = path = os.path.abspath(path)
. Note that _getfinalpathname
calls CreateFileW
, which internally calls GetFullPathNameW
on all paths except for ones that begin with exactly "\\?\" (backslash only). This means that ".." components in normalized paths are resolved naively, so it's fine to do the same via abspath
. Calling abspath
at the start also lets us resolve special DOS-device paths such as "C:\Temp\nul.txt" -> "\\.\nul".
This comment has been minimized.
This comment has been minimized.
uranusjr
Dec 28, 2019
Author
Contributor
By “at the start” do you mean line 205? One advantage of _getfinalpathname
over abspath
is that it canonicalize the case (e.g. python.EXE
-> python.exe
), which abspath
can’t do. I think we need to keep that.
This comment has been minimized.
This comment has been minimized.
eryksun
Dec 28, 2019
Contributor
I meant line 194, i.e. replace s = str(path)
with s = path = os.path.abspath(path)
. I wasn't proposing to replace _getfinalpathname
. It is irreplaceable in this function.
This comment has been minimized.
This comment has been minimized.
uranusjr
Dec 28, 2019
•
Author
Contributor
I see. It might lead to undesired consequences though. abspath
normalizes path without following symlinks, so consider this example (z
is a symlink):
/path/to/x/
`-- y/
`-- z/ -> /path/to/x/
Now for cwd x
and non-existent f
, Path("y/z/../f").resolve()
would return /path/to/f
in the current (POSIX) implementation, because y/z
is resolved first into /path/to/x
(following symlink), and ../f
appended.
But if we call abspath
at line 194, the result would instead become /path/to/y/f
, because abspath
normalizes the z/..
part first incorrectly without realizing it actually involves a symlink.
This comment has been minimized.
This comment has been minimized.
eryksun
Dec 28, 2019
•
Contributor
As I mentioned in the first message, that's not a concern. When we call _getfinalpathname
, it first creates a kernel file object via WINAPI CreateFileW
. The implementation of the latter WINAPI function first has to resolve and translate the path to a native NT absolute object path prior to making the NTAPI NtCreateFile
system call. DOS-to-NT path translation in user mode naively resolves all "." and ".." components, since they generally have no special meaning in NT paths. (The "." and ".." entries are faked by NTFS when listing a directory, but it does at least reserve these names. On the other hand, FAT32 and exFAT even allow naming files with these names if we use a "\\?\" non-normalized path.)
Based on your example, if the working directory is "C:\Temp", the native path that _getfinalpathname("y/z/../f")
opens will be "\??\C:\Temp\y\f" (well, effectively; but actually in this case it will pass a handle for "C:\Temp" as the RootDirectory
field, plus the relative path "y\f" as the ObjectName
). If this open fails with an error that leads to raising FileNotFoundError
, we'll back off and try _getfinalpathname("y/z/..")
, which will try to open "\??\C:\Temp\y". In neither case is "z" actually checked, so it doesn't matter whether or not it's a symbolic link.
There is one case where the executive does assign a special meaning to ".." components -- but it's not relevant here because it's limited to reparsing a symlink in the kernel.
When reparsing relative symlinks (i.e. IO_REPARSE_TAG_SYMLINK
reparse points that target a path that's relative to the location of the reparse point), the I/O manager carefully resolves ".." components. In so doing, it also ensures that mount points (i.e. IO_REPARSE_TAG_MOUNT_POINT
, aka "junctions") are handled as regular directories, just like Unix mount points, while symlinks resolve to their target, just like Unix symlinks.
For example, when resolving a relative symlink "C:\Temp\relative_symlink" that targets "foo\absolute_symlink\..\eggs", it follows the symlink named "absolute_symlink" to its target before evaluating "..". If "absolute_symlink" targets "X:\spam\bar", then the final path will be "X:\spam\eggs". In contrast, trying to resolve this manually the POSIX way is dysfunctional in Windows. The joined path from join(dirname(linkpath), readlink(linkpath))
would be "C:\Temp\foo\absolute_symlink\..\eggs". If you pass this to CreateFileW
, user-mode path normalization will naively resolve the ".." and end up trying to open "C:\Temp\foo\eggs", a path that has nothing to do with the correct final path, "X:\spam\eggs".
This comment has been minimized.
This comment has been minimized.
uranusjr
Dec 28, 2019
•
Author
Contributor
Thanks for the very detailed explanation.
Based on your example, if the working directory is "C:\Temp", the native path that _getfinalpathname("y/z/../f") opens will be "??\C:\Temp\y\f"
That’s not the case though. As I mentioned above, abspath
eagerly resolves .
and ..
. So if you call it on line 194, _getfinalpathname
(first called on line 200) would never get a chance to see y/z/../f
, only y/f
. Your strategy needs path
to preserve those dot components, and to do that you must not call path = abspath(path)
before feeding it into _getfinalpathname
.
With that said, we can easily avoid the eager resolution problem. From the docs:
os.path.abspath(path)
Return a normalized absolutized version of the pathname path. On most platforms, this is equivalent to calling the functionnormpath()
as follows:normpath(join(os.getcwd(), path))
.
So maybe s = path = os.path.join(os.getcwd(), path)
would do what you have in mind?
This comment has been minimized.
This comment has been minimized.
eryksun
Dec 28, 2019
Contributor
That’s not the case though. As I mentioned above,
abspath
eagerly resolves.
and..
. So if you call it on line 194,_getfinalpathname
(first called on line 200) would never get a chance to seey/z/../f
, onlyy/f
.
My point was that there is no difference between calling _getfinalpathname("y/z/../f")
and _getfinalpathname("y/f")
. WINAPI CreateFileW
resolves the ".." with the same user-mode runtime library code (in ntdll.dll) that gets called by WINAPI GetFullPathNameW
, the call we use in abspath
via _getfullpathname
. The I/O manager and filesystem in the kernel never sees this ".." component.
This comment has been minimized.
This comment has been minimized.
eryksun
Dec 28, 2019
Contributor
The only way to pass a "." or ".." component name to the kernel in an opened path is in a "\\?\" non-normalized path. If you try this with CreateFileW
, you'll see that NTFS rejects this component name as invalid; it's a reserved name, but otherwise has no semantic value. In contrast, FAT32 and exFAT handle ".." as a regular filename. For example, if "E:\Temp" is a FAT32 directory, we can create a data file with the path "\\?\E:\Temp\..". This is certainly not what anyone reasonably expects or wants. I don't know why they haven't reserved the name like NTFS does. I guess they just never considered the possibility.
uranusjr commentedDec 27, 2019
•
edited by bedevere-bot
https://bugs.python.org/issue38671