Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support "UNC" device paths in ntpath.splitdrive #81790

Closed
eryksun opened this issue Jul 17, 2019 · 10 comments · Fixed by #91882
Closed

support "UNC" device paths in ntpath.splitdrive #81790

eryksun opened this issue Jul 17, 2019 · 10 comments · Fixed by #91882
Labels
3.10 OS-windows stdlib type-bug

Comments

@eryksun
Copy link
Contributor

@eryksun eryksun commented Jul 17, 2019

BPO 37609
Nosy @pfmoore, @tjguk, @zware, @serhiy-storchaka, @eryksun, @zooba, @nsiregar, @barneygale
PRs
  • #14841
  • #25261
  • #31702
  • Files
  • splitdrive.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-07-17.11:26:34.050>
    labels = ['type-bug', 'library', '3.10', 'OS-windows']
    title = 'support "UNC" device paths in ntpath.splitdrive'
    updated_at = <Date 2022-03-06.18:47:25.283>
    user = 'https://github.com/eryksun'

    bugs.python.org fields:

    activity = <Date 2022-03-06.18:47:25.283>
    actor = 'steve.dower'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2019-07-17.11:26:34.050>
    creator = 'eryksun'
    dependencies = []
    files = ['49541']
    hgrepos = []
    issue_num = 37609
    keywords = ['patch']
    message_count = 9.0
    messages = ['348055', '348099', '348206', '352110', '352355', '379780', '390375', '414604', '414621']
    nosy_count = 8.0
    nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'nsiregar', 'barneygale']
    pr_nums = ['14841', '25261', '31702']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue37609'
    versions = ['Python 3.10']

    @eryksun
    Copy link
    Contributor Author

    @eryksun eryksun commented Jul 17, 2019

    Windows Python includes UNC shares such as "//server/spam" in its definition of a drive. This is natural because Windows supports setting a UNC path as the working directory and handles the share component as the working drive when resolving rooted paths such as "/eggs". For the sake of generality when working with \\?\ extended paths, Python should expand its definition of a UNC drive to include "UNC" device paths.

    A practical example is calling glob.glob with a "//?/UNC" device path.

    >>> import os, sys, glob
    >>> sys.addaudithook(lambda s,a: print('#', a[0]) if s == 'glob.glob' else None)
    

    regular UNC path:

        >>> glob.glob('//localhost/C$/Sys*')
        # //localhost/C$/Sys*
        ['//localhost/C$/System Volume Information']

    "UNC" device path:

        >>> glob.glob('//?/UNC/localhost/C$/Sys*')
        # //?/UNC/localhost/C$/Sys*
        # //?/UNC/localhost/C$
        # //?/UNC/localhost
        # //?/UNC/
        []

    Since the magic character "?" is in the path (IMO, the drive should be excluded from this check, but that's a separate issue), the internal function glob._iglob calls itself recursively until it reaches the base case of dirname == pathname, where dirname is from os.path.split(pathname). The problem here is that ntpath.split doesn't stop at the proper base case of "//?/UNC/localhost/C$". This is due to ntpath.splitdrive. For example:

        >>> os.path.splitdrive('//?/UNC/localhost/C$/Sys*')
        ('//?/UNC', '/localhost/C$/Sys*')
    
        >>> os.path.splitdrive('//./UNC/localhost/C$/Sys*')
        ('//./UNC', '/localhost/C$/Sys*')

    The results should be "//?/UNC/localhost/C$" and "//./UNC/localhost/C$".

    In other cases, returning a device as the drive is fine, if not exactly meaningful (e.g. "//./NUL"). I don't think this needs to change.

    @eryksun eryksun added 3.8 3.9 stdlib OS-windows type-bug labels Jul 17, 2019
    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Jul 18, 2019

    Do you want to create a PR Eryk?

    @nsiregar
    Copy link
    Mannequin

    @nsiregar nsiregar mannequin commented Jul 20, 2019

    I was unsure about implementation in the patch, do you have UNC format specification?

    @zooba
    Copy link
    Member

    @zooba zooba commented Sep 12, 2019

    For clarity, given Eryk's examples above, both "\\?\UNC\" and "//?/UNC/" are okay (as are any combination of forward and backslashes in the prefix, as normalization will be applied for any except the "\\?\" version). "UNC" is also case-insensitive.

    @eryksun
    Copy link
    Contributor Author

    @eryksun eryksun commented Sep 13, 2019

    Please consult the attached file "splitdrive.py". I redesigned splitdrive() to support "UNC" and "GLOBAL" junctions in device paths. I relaxed the design to allow repeated separators everywhere except for the UNC root. IIRC, Windows has supported this since XP. For example:

        >>> print(nt._getfullpathname('//server///share'))
        \\server\share
        >>> print(nt._getfullpathname(r'\\server\\\share'))
        \\server\share

    There are also a couple of minor behavior changes in the new implementation.

    The old implementation would split "//server/" as ('//server/', ''). Since there's no share, this should not count as a drive. The new implementation splits it as ('', '//server/'). Similarly it splits '//?/UNC/server/' as ('', '//?/UNC/server/').

    The old implementation also allowed any character as a drive 'letter'. For example, it would split '/:/spam' as ('/:', '/spam'). The new implementation ensures that the drive letter in a DOS drive is alphabetic.

    I also extended test_splitdrive to use a list of test cases in order to avoid having to define each case twice. It calls tester() a second time for each case, with slash and backslash swapped.

    @eryksun
    Copy link
    Contributor Author

    @eryksun eryksun commented Oct 27, 2020

    I'm attaching a rewrite of splitdrive() from msg352355. This version uses an internal _next() function to get the indices of the next path component, ignoring repeated separators. It also flattens the nested structure of the previous implementation by adding multiple return statements.

    @zooba
    Copy link
    Member

    @zooba zooba commented Apr 6, 2021

    Once bpo-43105 is merged, I've got a fairly simple implementation for this using the (new) nt._path_splitroot native method, as well as improved tests that cover both the native and emulated calculations.

    @zooba zooba added 3.10 and removed 3.8 3.9 labels Apr 6, 2021
    @zooba zooba assigned zooba and unassigned zooba Apr 6, 2021
    @barneygale
    Copy link
    Mannequin

    @barneygale barneygale mannequin commented Mar 6, 2022

    I'd like to pick this up, as it would allow us to remove a duplicate implementation in pathlib with its own shortcomings.

    If using native functionality if difficult to get right, could I put @eryksun's splitdrive.py implementation up for review?

    @zooba
    Copy link
    Member

    @zooba zooba commented Mar 6, 2022

    If you can build this on top of nt._path_splitroot then it could save a decent amount of work, though at the same time I think it's worthwhile having a pure Python implementation which is cross-platform.

    Haven't looked at the PR yet (or Eryk's implementation recently), but at the very least it would be nice to have tests that verify consistency with nt._path_splitroot. That way at least we'll discover if the native version changes.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    barneygale pushed a commit to barneygale/cpython that referenced this issue Apr 24, 2022
    barneygale added a commit to barneygale/cpython that referenced this issue Apr 24, 2022
    barneygale added a commit to barneygale/cpython that referenced this issue Apr 24, 2022
    @barneygale
    Copy link
    Contributor

    @barneygale barneygale commented May 20, 2022

    I have a patch for this issue here: #91882

    Could a core dev please review? Thank you.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 OS-windows stdlib type-bug
    Projects
    None yet
    Development

    Successfully merging a pull request may close this issue.

    4 participants