New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-88569: add os.path.isreserved()
#95486
base: main
Are you sure you want to change the base?
Conversation
barneygale
commented
Jul 31, 2022
•
edited by bedevere-bot
edited by bedevere-bot
- Issue: gh-88569
In
Also, a name that ends with a space or dot should be reserved. The system's path normalization strips trailing spaces and dots from the final path component (e.g. "spam. . ." -> "spam"). Path normalization applies to all cases except opening "\\?\" extended paths. Also, the characters In file systems that support file streams, |
Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now |
In Windows 11, path normalization no longer special cases a DOS device name if it has an extension (e.g. "con.txt"). Also a DOS device name isn't special cased if it's the leaf component of a path -- except for the "NUL" device. For example:
However, since DOS device names are still special cased as unqualified names and still reserved by the SMB server, they should still be avoided as file names. For example, creating a file named "con" in the current working directory would have to use the path "./con". |
@eryksun what should |
Isn't that a cross-platform question? The names "." and ".." are reserved in POSIX and Windows. For ".." it always has to be exact. Otherwise trailing dots and spaces are stripped in Windows. For example:
|
That would make |
I'm referring to the base name, i.e. |
I like the idea, but I think there's an extraneous function call to take out. I would also like @eryksun to make sure we are copying over the right implementation details, or make any tweaks as necessary now if possible.
@@ -216,6 +216,12 @@ def ismount(path): | |||
return False | |||
|
|||
|
|||
def isreserved(path): | |||
"""Return true if the pathname is reserved by the system.""" | |||
os.fspath(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this call if you're not going to use the result?
os.fspath(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure an exception is raised if the given argument isn't PathLike
. If that sounds like a good idea I can add a comment explaining, otherwise I'll accept your suggestion!
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Thinking out loud: There's probably a distinction between reserved names and invalid names, analogous to the distinction between reserved Python keywords (
|
Does this mean that you don't want to reserve names that contain the wildcard characters What about names that are illegal to create because they're reserved in other contexts? That's the case when creating a file with a DOS device name on an SMB share. SMB disallows creating a file named "nul", for example, because it causes problems with accessing the file directly on the server (e.g. as "C:\share\nul"). What about base names that contain colons? For example, in an NTFS filesystem, "spam:00" creates a file named "spam" that contains a data stream named "00". That can surprise even experienced developers, particularly if they come from a POSIX background. In a FAT filesystem, "spam:00" is an invalid name. In a VboxSharedFolderFS (VirtualBox) filesystem with a Linux host, "spam:00" is allowed as the literal name. What about a name that changes when accessed because trailing dots and spaces are stripped? For example, "spam ." -> "spam". What about "." and ".." components in "\\?\" extended paths? In Windows, "." and ".." components are handled in the user-mode API. However, normalization is skipped for a "\\?\" extended path, so the filesystem is passed a path that contains "." and ".." components. The results can be dysfunctional and surprising. For example, FAT filesystems (volume "E:" in this example) allow creating regular files named "." and "..":
The first two are the required "." and ".." entries, which are directories (16) and have no short name. The last two are regular files with the archive attribute set (32) and legacy short names. Otherwise, FAT filesystems don't support opening paths with literal "." and ".." entries. The open fails with
NTFS fails all three of the above cases with |
@eryksun whats your view on the existing |
The As to reserved characters and "." and ".." components in extended paths, I'm just putting it out there for discussion. I'm less concerned about the five wildcard characters. They're always disallowed in base file names (but not stream names). A filesystem that didn't exclude them would be broken. Creating a file named "spam?.txt" isn't going to magically succeed in a surprising way. But a filesystem might allow |
Co-authored-by: Eryk Sun <eryksun@gmail.com>
# Trailing spaces and dots are reserved. | ||
# File streams are reserved (e.g. "filename:stream[:type]"). | ||
if name.rstrip('. ') != name or ':' in name: | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Trailing spaces and dots are reserved. | |
# File streams are reserved (e.g. "filename:stream[:type]"). | |
if name.rstrip('. ') != name or ':' in name: | |
return True | |
# Trailing spaces and dots are reserved. | |
if name not in ('.', '..') and name.rstrip('. ') != name: | |
return True | |
# File streams are reserved (e.g. "filename:stream[:type]"). | |
if ':' in name: | |
return True |
@@ -822,6 +822,49 @@ def test_ismount(self): | |||
self.assertTrue(ntpath.ismount(b"\\\\localhost\\c$")) | |||
self.assertTrue(ntpath.ismount(b"\\\\localhost\\c$\\")) | |||
|
|||
def test_isreserved(self): | |||
self.assertFalse(ntpath.isreserved('')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertFalse(ntpath.isreserved('')) | |
self.assertFalse(ntpath.isreserved('')) | |
self.assertFalse(ntpath.isreserved('.')) | |
self.assertFalse(ntpath.isreserved('..')) |
self.assertTrue(ntpath.isreserved('c:/baz/con/NUL')) | ||
self.assertFalse(ntpath.isreserved('c:/NUL/con/baz')) | ||
# Bytes are supported. | ||
self.assertFalse(ntpath.isreserved(b'')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertFalse(ntpath.isreserved(b'')) | |
self.assertFalse(ntpath.isreserved(b'')) | |
self.assertFalse(ntpath.isreserved(b'.')) | |
self.assertFalse(ntpath.isreserved(b'..')) |
The following check would need to be removed from # UNC paths are never reserved.
self.assertIs(False, P('//my/share/nul/con/aux').is_reserved()) Though this entire pathlib test is redundant now. As you consolidate the implementations, you might want to remove redundant tests, and only keep tests that are specific to how |