Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-88569: add os.path.isreserved() #95486

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jul 31, 2022

@eryksun
Copy link
Contributor

@eryksun eryksun commented Jul 31, 2022

In ntpath.isreserved(), let's remove the exception for UNC paths. GetFullPathNameW() doesn't reserve DOS device names in UNC paths, but the SMB provider disallows creating files with DOS device names. It's the most common and most important UNC provider. For example, on the "//localhost/C$" SMB share:

>>> for fn in ('nul', 'con', 'aux', 'prn', 'com1', 'lpt1'):
...     try:
...         open(f'//localhost/C$/Temp/{fn}', 'w')
...     except PermissionError:
...         print(f'{fn} disallowed')
...
nul disallowed
con disallowed
aux disallowed
prn disallowed
com1 disallowed
lpt1 disallowed

Also, a name that ends with a space or dot should be reserved. The system's path normalization strips trailing spaces and dots from the final path component (e.g. "spam. . ." -> "spam"). Path normalization applies to all cases except opening "\\?\" extended paths.

Also, the characters :<>"?*| are reserved in file names, as are as well the ASCII control characters (ordinals 0-31). (The characters : and | aren't reserved by all file systems, but generally they are.)

In file systems that support file streams, : delimits the stream name and type (e.g. "filename:streamname:$DATA"). Stream names can include the characters <>"?*| as well as control characters other than null (ordinals 1-31). I think it's simplest to just reserve a name that includes :, so names that specify a file stream are marked as reserved for general use. Code that needs to use file streams should be aware of this and separately validate the file name and stream name.

@ChrisDenton
Copy link

@ChrisDenton ChrisDenton commented Jul 31, 2022

Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now C:\path\COM1 is not reserved and neither is aux.c. But aux on its own still is (and aux.. .. because trailing spaces and dots are always stripped).

@eryksun
Copy link
Contributor

@eryksun eryksun commented Jul 31, 2022

Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now C:\path\COM1 is not reserved and neither is aux.c. But aux on its own still is (and aux.. .. because trailing spaces and dots are always stripped).

In Windows 11, path normalization no longer special cases a DOS device name if it has an extension (e.g. "con.txt"). Also a DOS device name isn't special cased if it's the leaf component of a path -- except for the "NUL" device. For example:

>>> nt._getfullpathname('nul.txt')
'C:\\Temp\\nul.txt'
>>> nt._getfullpathname('c:/temp/nul') # NUL is still reserved
'\\\\.\\nul'

However, since DOS device names are still special cased as unqualified names and still reserved by the SMB server, they should still be avoided as file names. For example, creating a file named "con" in the current working directory would have to use the path "./con".

@barneygale
Copy link
Contributor Author

@barneygale barneygale commented Jul 31, 2022

@eryksun what should ntpath.isreserved('.') return? Thanks

@eryksun
Copy link
Contributor

@eryksun eryksun commented Jul 31, 2022

@eryksun what should ntpath.isreserved('.') return? Thanks

Isn't that a cross-platform question? The names "." and ".." are reserved in POSIX and Windows. For ".." it always has to be exact. Otherwise trailing dots and spaces are stripped in Windows. For example:

>>> nt._getfullpathname('z:/spam/..')
'z:\\'
>>> nt._getfullpathname('z:/spam/...')
'z:\\spam\\'
>>> nt._getfullpathname('z:/spam/.. .')
'z:\\spam\\'

@barneygale
Copy link
Contributor Author

@barneygale barneygale commented Jul 31, 2022

That would make pathlib.Path().is_reserved() always true. I'm not convinced tbh. I can't see any existing CPython bug or feature request that covers the behaviour change you're requesting.

@eryksun
Copy link
Contributor

@eryksun eryksun commented Jul 31, 2022

That would make pathlib.Path().is_reserved() always true. I'm not convinced tbh. I can't see any existing CPython bug or feature request that covers the behaviour change you're requesting.

I'm referring to the base name, i.e. os.fsdecode(basename(path)), in terms of answering the question, can one create a file or directory with this exact name? One cannot create/overwrite a file named "." or "..". The names are reserved as implicit hard links or aliases to the current directory and parent directory. It's no different from something like "nul" in Windows. For example, open('c:/Temp/nul', 'w') succeeds because it is a valid path. But the name "nul" implicitly resolves to "\\.\nul" (even in Windows 11).

@brettcannon brettcannon requested a review from eryksun Aug 5, 2022
Copy link
Member

@brettcannon brettcannon left a comment

I like the idea, but I think there's an extraneous function call to take out. I would also like @eryksun to make sure we are copying over the right implementation details, or make any tweaks as necessary now if possible.

@@ -216,6 +216,12 @@ def ismount(path):
return False


def isreserved(path):
"""Return true if the pathname is reserved by the system."""
os.fspath(path)
Copy link
Member

@brettcannon brettcannon Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this call if you're not going to use the result?

Suggested change
os.fspath(path)

Copy link
Contributor Author

@barneygale barneygale Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure an exception is raised if the given argument isn't PathLike. If that sounds like a good idea I can add a comment explaining, otherwise I'll accept your suggestion!

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Aug 5, 2022

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@barneygale
Copy link
Contributor Author

@barneygale barneygale commented Aug 5, 2022

Thinking out loud:

There's probably a distinction between reserved names and invalid names, analogous to the distinction between reserved Python keywords (for, break, etc) and invalid Python identifiers (my-var, 3dpoints)

pathlib.PurePath.is_reserved() consciously concerns itself only with the former case (at least for the moment!)

@eryksun
Copy link
Contributor

@eryksun eryksun commented Aug 6, 2022

There's probably a distinction between reserved names and invalid names,

Does this mean that you don't want to reserve names that contain the wildcard characters *, ?, " (i.e. DOS_DOT), < (i.e. DOS_STAR), and > (i.e. DOS_QM)? Wildcard characters are illegal in the name of a file or directory because they're used in system calls such as NtQueryDirectoryFile() (its FileName parameter can contain wildcards to filter a directory listing) and ultimately by filesystem device drivers via runtime library functions such as FsRtlIsNameInExpression().

What about names that are illegal to create because they're reserved in other contexts? That's the case when creating a file with a DOS device name on an SMB share. SMB disallows creating a file named "nul", for example, because it causes problems with accessing the file directly on the server (e.g. as "C:\share\nul").

What about base names that contain colons? For example, in an NTFS filesystem, "spam:00" creates a file named "spam" that contains a data stream named "00". That can surprise even experienced developers, particularly if they come from a POSIX background. In a FAT filesystem, "spam:00" is an invalid name. In a VboxSharedFolderFS (VirtualBox) filesystem with a Linux host, "spam:00" is allowed as the literal name.

What about a name that changes when accessed because trailing dots and spaces are stripped? For example, "spam ." -> "spam".

What about "." and ".." components in "\\?\" extended paths? In Windows, "." and ".." components are handled in the user-mode API. However, normalization is skipped for a "\\?\" extended path, so the filesystem is passed a path that contains "." and ".." components. The results can be dysfunctional and surprising. For example, FAT filesystems (volume "E:" in this example) allow creating regular files named "." and "..":

>>> open(r'\\?\E:\Temp\.', 'w').close()
>>> open(r'\\?\E:\Temp\..', 'w').close()
>>> print(*[(e[-2], e[-1], e[0]) for e in win32file.FindFilesW('*')], sep='\n')
('.', '', 16)
('..', '', 16)
('.', 'F93F~1', 32)
('..', 'C59D~1', 32)  

The first two are the required "." and ".." entries, which are directories (16) and have no short name. The last two are regular files with the archive attribute set (32) and legacy short names.

Otherwise, FAT filesystems don't support opening paths with literal "." and ".." entries. The open fails with ERROR_PATH_NOT_FOUND (i.e. C ENOENT). For example:

>>> open(r'\\?\E:\Temp\..\spam.txt', 'w').close()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\E:\\Temp\\..\\spam.txt' 

NTFS fails all three of the above cases with ERROR_INVALID_NAME (i.e. C EINVAL).

@barneygale
Copy link
Contributor Author

@barneygale barneygale commented Aug 6, 2022

@eryksun whats your view on the existing pathlib.PurePath.is_reserved() method? It doesn't perform any of the validity checks you mention. Is it broken without them? If so, perhaps we should consider deprecating the method, rather than expanding it and chasing a long tail of pathname validity bugs?

@eryksun
Copy link
Contributor

@eryksun eryksun commented Aug 7, 2022

whats your view on the existing pathlib.PurePath.is_reserved() method? It doesn't perform any of the validity checks you mention. Is it broken without them? If so, perhaps we should consider deprecating the method, rather than expanding it and chasing a long tail of pathname validity bugs?

The is_reserved() method is useful in theory, but it isn't used much. A sanitizepath() function would be more useful. Regarding the design, it was a mistake to exclude UNC paths. Just because one can create "//?/C:/Temp/nul", that doesn't mean it's not a troublesome file name that should never be used. Also, file names that end with trailing spaces and dots have to be reserved.

As to reserved characters and "." and ".." components in extended paths, I'm just putting it out there for discussion. I'm less concerned about the five wildcard characters. They're always disallowed in base file names (but not stream names). A filesystem that didn't exclude them would be broken. Creating a file named "spam?.txt" isn't going to magically succeed in a surprising way. But a filesystem might allow :, |, and ASCII control characters in base file names, even though they should be illegal. VboxSharedFolderFS does. If a name like that doesn't get sanitized, it can cause problems later on, e.g. when trying to copy the file to another filesystem. The potentially surprising behavior with names that specify a data stream is the worst offender (e.g. "spam:.txt" has a stream named ".txt"). If I could only choose one character to flag as reserved in base file names, it would be colon.

Lib/ntpath.py Outdated Show resolved Hide resolved
Lib/test/test_ntpath.py Outdated Show resolved Hide resolved
Lib/test/test_ntpath.py Outdated Show resolved Hide resolved
Co-authored-by: Eryk Sun <eryksun@gmail.com>
# Trailing spaces and dots are reserved.
# File streams are reserved (e.g. "filename:stream[:type]").
if name.rstrip('. ') != name or ':' in name:
return True
Copy link
Contributor

@eryksun eryksun Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Trailing spaces and dots are reserved.
# File streams are reserved (e.g. "filename:stream[:type]").
if name.rstrip('. ') != name or ':' in name:
return True
# Trailing spaces and dots are reserved.
if name not in ('.', '..') and name.rstrip('. ') != name:
return True
# File streams are reserved (e.g. "filename:stream[:type]").
if ':' in name:
return True

@@ -822,6 +822,49 @@ def test_ismount(self):
self.assertTrue(ntpath.ismount(b"\\\\localhost\\c$"))
self.assertTrue(ntpath.ismount(b"\\\\localhost\\c$\\"))

def test_isreserved(self):
self.assertFalse(ntpath.isreserved(''))
Copy link
Contributor

@eryksun eryksun Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertFalse(ntpath.isreserved(''))
self.assertFalse(ntpath.isreserved(''))
self.assertFalse(ntpath.isreserved('.'))
self.assertFalse(ntpath.isreserved('..'))

self.assertTrue(ntpath.isreserved('c:/baz/con/NUL'))
self.assertFalse(ntpath.isreserved('c:/NUL/con/baz'))
# Bytes are supported.
self.assertFalse(ntpath.isreserved(b''))
Copy link
Contributor

@eryksun eryksun Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertFalse(ntpath.isreserved(b''))
self.assertFalse(ntpath.isreserved(b''))
self.assertFalse(ntpath.isreserved(b'.'))
self.assertFalse(ntpath.isreserved(b'..'))

@eryksun
Copy link
Contributor

@eryksun eryksun commented Aug 9, 2022

The following check would need to be removed from PureWindowsPathTest.test_is_reserved() in "Lib/test/test_pathlib.py":

        # UNC paths are never reserved.
        self.assertIs(False, P('//my/share/nul/con/aux').is_reserved()) 

Though this entire pathlib test is redundant now. As you consolidate the implementations, you might want to remove redundant tests, and only keep tests that are specific to how pathlib uses the os.path flavour interfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants