Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib.Path.iterdir doesn't raise an exception until you start iterating #78722

Closed
PaulPinterits mannequin opened this issue Aug 29, 2018 · 11 comments
Closed

pathlib.Path.iterdir doesn't raise an exception until you start iterating #78722

PaulPinterits mannequin opened this issue Aug 29, 2018 · 11 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement

Comments

@PaulPinterits
Copy link
Mannequin

PaulPinterits mannequin commented Aug 29, 2018

BPO 34541
Nosy @pitrou, @prudvinit
PRs
  • bpo-34541: pathlib.Path.iterdir throw an exception when path is not valid #8996
  • gh-78722: Fixed, pathlib.Path.iterdir now throws an exception when path is not valid #8999
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-08-29.08:20:21.120>
    labels = ['3.8', 'type-feature', 'library']
    title = "pathlib.Path.iterdir doesn't throw an exception until you start iterating"
    updated_at = <Date 2019-05-23.22:47:36.119>
    user = 'https://bugs.python.org/PaulPinterits'

    bugs.python.org fields:

    activity = <Date 2019-05-23.22:47:36.119>
    actor = 'cheryl.sabella'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2018-08-29.08:20:21.120>
    creator = 'Paul Pinterits'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34541
    keywords = ['patch']
    message_count = 3.0
    messages = ['324307', '324332', '324334']
    nosy_count = 3.0
    nosy_names = ['pitrou', 'Paul Pinterits', 'prudvinit']
    pr_nums = ['8996', '8999']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue34541'
    versions = ['Python 3.8']

    Linked PRs

    @PaulPinterits
    Copy link
    Mannequin Author

    PaulPinterits mannequin commented Aug 29, 2018

    The fact that Path.iterdir() only throws exceptions once you start iterating over it makes it very difficult to write correct code.

    Let's look at an example: We'll iterate over all children of a directory and print their file size.

    If we try to do it like this, the try...except has no effect whatsoever:

    try:
        children = path_that_doesnt_exist.iterdir()
    except FileNotFoundError:
        print("directory doesn't exist")
    for path in children:
        print(path.stat().st_size)

    If we explicitly check whether the path exists (and is a directory), we end up with a race condition:

    if path_that_doesnt_exist.is_dir():
        print("directory doesn't exist")
    else:
        for path in children:
            print(path.stat().st_size)

    We can wrap the whole loop in a try...except, but then we might end up catching exceptions we didn't intend to catch. (For example, the exception that's thrown when we try to get the size of a broken symlink.)

    try:
        for path in path_that_doesnt_exist.iterdir():
            print(path.stat().st_size)  # this can also throw FileNotFoundError
    except FileNotFoundError:
        print("directory doesn't exist")

    We can manually call next on the iterator inside of a try..except block, but that's awfully verbose and requires knowledge about iterators and the next function:

    children = iter(path_that_doesnt_exist.iterdir())
    while True:
        try:
            path = next(children)
        except FileNotFoundError:
            print("directory doesn't exist")
            break
        print(path.stat().st_size)

    Or we can turn the iterator into a list inside of a try...except, which seems to be the best option, but completely defeats the point of having an iterator:

    try:
        children = list(path_that_doesnt_exist.iterdir())
    except FileNotFoundError:
        print("directory doesn't exist")
    else:
        for path in children:
            print(path.stat().st_size)

    As you can see, writing correct (and good) code with iterdir is more difficult than it has any right to be. Please change this behavior so that exceptions are thrown immediately when iterdir is called.

    @PaulPinterits PaulPinterits mannequin added type-bug An unexpected behavior, bug, or error 3.7 (EOL) end of life stdlib Python modules in the Lib dir labels Aug 29, 2018
    @PaulPinterits
    Copy link
    Mannequin Author

    PaulPinterits mannequin commented Aug 29, 2018

    As an afterthought, I'd like to suggest an alternative solution: If changing the iterdir behavior is not possible or not desirable for some reason, please add a Path.listdir method that returns a list instead of an iterator. Lazy file system operations can often introduce unnecessary race conditions in the code, so I believe it would be very useful if every lazy method had an eager equivalent.

    @prudvinit
    Copy link
    Mannequin

    prudvinit mannequin commented Aug 29, 2018

    Made changes, pathlib.Path('.').iterdir() now throws a FileNotFoundError if the path is not valid.

    @csabella csabella added 3.8 only security fixes type-feature A feature request or enhancement and removed 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels May 23, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @barneygale
    Copy link
    Contributor

    Work-around: call list(path.iterdir()):

    try:
        children = list(path_that_doesnt_exist.iterdir())
    except FileNotFoundError:
        print("directory doesn't exist")
    for path in children:
        print(path.stat().st_size)

    I don't think we can change the iterdir() behaviour at this stage, nor does it justify the introduction of a similar method like listdir() to my mind.

    @CAM-Gerlach
    Copy link
    Member

    CAM-Gerlach commented Jan 8, 2023

    Yeah, the fact that it returns a generator/lazy iterator seems pretty fundamental to iterdir's behavior, and matches the other current and past iter* functions/methods in Pythons, as well as pathlib.glob and similar methods (as well as builtins like range and others) which would make it suddenly behaving eagerly quite surprising to many/most users, as well as sacrificing performance/efficiency.

    list(<lazy iterator>) to make something iterable is a common idiom in Python to handle this situation (and others), and adding a separate method to do the same thing eagerly seems thoroughly redundant.

    @barneygale
    Copy link
    Contributor

    barneygale commented Apr 14, 2023

    I'm going to close this issue per CAM's rationale above. Use list(path.iterdir()) if you need to handle exceptions before you iterate. Thanks all.

    @barneygale barneygale closed this as not planned Won't fix, can't repro, duplicate, stale Apr 14, 2023
    @jaraco
    Copy link
    Member

    jaraco commented Jul 18, 2023

    I don't believe list(iterdir()) is a suitable workaround, as it requires the entire directory to be materialized in memory, increasing storage with little benefit. I'd argue that itertools (or similar) should provide a mechanism to filter or stop on exceptions, so consuming the iterable can continue to be lazy.

    more_itertools.iter_except almost handles this case, but (a) requires the callable to be passed and (b) can't handle StopIteration as a stop condition.

    @barneygale barneygale reopened this Jul 26, 2023
    barneygale added a commit to barneygale/cpython that referenced this issue Jul 26, 2023
    …t delay.
    
    `pathlib.Path.iterdir()` now immediately raises any `OSError`
    exception from `os.listdir()`, rather than waiting until its
    result is iterated over.
    @barneygale
    Copy link
    Contributor

    Upon reflection I agree. I've put a PR up. Do you have a preference on whether it should be backported?

    @merwok merwok changed the title pathlib.Path.iterdir doesn't throw an exception until you start iterating pathlib.Path.iterdir doesn't raise an exception until you start iterating Aug 28, 2023
    @barneygale
    Copy link
    Contributor

    A comment from @AA-Turner:

    I agree fixing iterdir at the source would be cleaner

    barneygale added a commit that referenced this issue Sep 2, 2023
    …y. (#107320)
    
    `pathlib.Path.iterdir()` now immediately raises any `OSError`
    exception from `os.listdir()`, rather than waiting until its
    result is iterated over.
    @barneygale
    Copy link
    Contributor

    Fixed in 3.13 / #107320 / bdc3c88

    @jaraco
    Copy link
    Member

    jaraco commented Sep 3, 2023

    Thanks for the fix!

    Upon reflection I agree. I've put a PR up. Do you have a preference on whether it should be backported?

    This feels like a change in behavior / improvement, so probably not suitable for a backport.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants