Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38671: Make sure to return an absolute path in pathlib._WindowsFlavour.resolve() #17716

Open
wants to merge 1 commit into
base: master
from

Conversation

@uranusjr
Copy link
Contributor

uranusjr commented Dec 27, 2019

@uranusjr uranusjr force-pushed the uranusjr:path-resolve-nonstrict-win branch 2 times, most recently from 87e0e37 to 78953ba Dec 27, 2019
This guarentees we are returning an absolute path when the input `path`
is relative. Nothing would change if `path` is already absolute.
@uranusjr uranusjr force-pushed the uranusjr:path-resolve-nonstrict-win branch from 78953ba to 34c68f7 Dec 28, 2019
if previous_s == s:
return path
return os.path.join(s or os.getcwd(), *reversed(tail_parts))

This comment has been minimized.

Copy link
@eryksun

eryksun Dec 28, 2019

Contributor

At the beginning, I'd rather set s = path = os.path.abspath(path). Note that _getfinalpathname calls CreateFileW, which internally calls GetFullPathNameW on all paths except for ones that begin with exactly "\\?\" (backslash only). This means that ".." components in normalized paths are resolved naively, so it's fine to do the same via abspath. Calling abspath at the start also lets us resolve special DOS-device paths such as "C:\Temp\nul.txt" -> "\\.\nul".

This comment has been minimized.

Copy link
@uranusjr

uranusjr Dec 28, 2019

Author Contributor

By “at the start” do you mean line 205? One advantage of _getfinalpathname over abspath is that it canonicalize the case (e.g. python.EXE -> python.exe), which abspath can’t do. I think we need to keep that.

This comment has been minimized.

Copy link
@eryksun

eryksun Dec 28, 2019

Contributor

I meant line 194, i.e. replace s = str(path) with s = path = os.path.abspath(path). I wasn't proposing to replace _getfinalpathname. It is irreplaceable in this function.

This comment has been minimized.

Copy link
@uranusjr

uranusjr Dec 28, 2019

Author Contributor

I see. It might lead to undesired consequences though. abspath normalizes path without following symlinks, so consider this example (z is a symlink):

/path/to/x/
`-- y/
    `-- z/ -> /path/to/x/

Now for cwd x and non-existent f, Path("y/z/../f").resolve() would return /path/to/f in the current (POSIX) implementation, because y/z is resolved first into /path/to/x (following symlink), and ../f appended.

But if we call abspath at line 194, the result would instead become /path/to/y/f, because abspath normalizes the z/.. part first incorrectly without realizing it actually involves a symlink.

This comment has been minimized.

Copy link
@eryksun

eryksun Dec 28, 2019

Contributor

As I mentioned in the first message, that's not a concern. When we call _getfinalpathname, it first creates a kernel file object via WINAPI CreateFileW. The implementation of the latter WINAPI function first has to resolve and translate the path to a native NT absolute object path prior to making the NTAPI NtCreateFile system call. DOS-to-NT path translation in user mode naively resolves all "." and ".." components, since they generally have no special meaning in NT paths. (The "." and ".." entries are faked by NTFS when listing a directory, but it does at least reserve these names. On the other hand, FAT32 and exFAT even allow naming files with these names if we use a "\\?\" non-normalized path.)

Based on your example, if the working directory is "C:\Temp", the native path that _getfinalpathname("y/z/../f") opens will be "\??\C:\Temp\y\f" (well, effectively; but actually in this case it will pass a handle for "C:\Temp" as the RootDirectory field, plus the relative path "y\f" as the ObjectName). If this open fails with an error that leads to raising FileNotFoundError, we'll back off and try _getfinalpathname("y/z/.."), which will try to open "\??\C:\Temp\y". In neither case is "z" actually checked, so it doesn't matter whether or not it's a symbolic link.


There is one case where the executive does assign a special meaning to ".." components -- but it's not relevant here because it's limited to reparsing a symlink in the kernel.

When reparsing relative symlinks (i.e. IO_REPARSE_TAG_SYMLINK reparse points that target a path that's relative to the location of the reparse point), the I/O manager carefully resolves ".." components. In so doing, it also ensures that mount points (i.e. IO_REPARSE_TAG_MOUNT_POINT, aka "junctions") are handled as regular directories, just like Unix mount points, while symlinks resolve to their target, just like Unix symlinks.

For example, when resolving a relative symlink "C:\Temp\relative_symlink" that targets "foo\absolute_symlink\..\eggs", it follows the symlink named "absolute_symlink" to its target before evaluating "..". If "absolute_symlink" targets "X:\spam\bar", then the final path will be "X:\spam\eggs". In contrast, trying to resolve this manually the POSIX way is dysfunctional in Windows. The joined path from join(dirname(linkpath), readlink(linkpath)) would be "C:\Temp\foo\absolute_symlink\..\eggs". If you pass this to CreateFileW, user-mode path normalization will naively resolve the ".." and end up trying to open "C:\Temp\foo\eggs", a path that has nothing to do with the correct final path, "X:\spam\eggs".

This comment has been minimized.

Copy link
@uranusjr

uranusjr Dec 28, 2019

Author Contributor

Thanks for the very detailed explanation.

Based on your example, if the working directory is "C:\Temp", the native path that _getfinalpathname("y/z/../f") opens will be "??\C:\Temp\y\f"

That’s not the case though. As I mentioned above, abspath eagerly resolves . and ... So if you call it on line 194, _getfinalpathname (first called on line 200) would never get a chance to see y/z/../f, only y/f. Your strategy needs path to preserve those dot components, and to do that you must not call path = abspath(path) before feeding it into _getfinalpathname.

With that said, we can easily avoid the eager resolution problem. From the docs:

os.path.abspath(path)
Return a normalized absolutized version of the pathname path. On most platforms, this is equivalent to calling the function normpath() as follows: normpath(join(os.getcwd(), path)).

So maybe s = path = os.path.join(os.getcwd(), path) would do what you have in mind?

This comment has been minimized.

Copy link
@eryksun

eryksun Dec 28, 2019

Contributor

That’s not the case though. As I mentioned above, abspath eagerly resolves . and ... So if you call it on line 194, _getfinalpathname (first called on line 200) would never get a chance to see y/z/../f, only y/f.

My point was that there is no difference between calling _getfinalpathname("y/z/../f") and _getfinalpathname("y/f"). WINAPI CreateFileW resolves the ".." with the same user-mode runtime library code (in ntdll.dll) that gets called by WINAPI GetFullPathNameW, the call we use in abspath via _getfullpathname. The I/O manager and filesystem in the kernel never sees this ".." component.

This comment has been minimized.

Copy link
@eryksun

eryksun Dec 28, 2019

Contributor

The only way to pass a "." or ".." component name to the kernel in an opened path is in a "\\?\" non-normalized path. If you try this with CreateFileW, you'll see that NTFS rejects this component name as invalid; it's a reserved name, but otherwise has no semantic value. In contrast, FAT32 and exFAT handle ".." as a regular filename. For example, if "E:\Temp" is a FAT32 directory, we can create a data file with the path "\\?\E:\Temp\..". This is certainly not what anyone reasonably expects or wants. I don't know why they haven't reserved the name like NTFS does. I guess they just never considered the possibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.