Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change shutil.rmtree and os.walk to support very deep hierarchies #89727

Open
AlexanderPatrakov mannequin opened this issue Oct 21, 2021 · 10 comments
Open

Change shutil.rmtree and os.walk to support very deep hierarchies #89727

AlexanderPatrakov mannequin opened this issue Oct 21, 2021 · 10 comments
Labels
3.12 stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@AlexanderPatrakov
Copy link
Mannequin

AlexanderPatrakov mannequin commented Oct 21, 2021

BPO 45564

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-10-21.22:21:02.146>
labels = ['library', '3.9', 'type-crash']
title = 'shutil.rmtree and os.walk are implemented using recursion, fail on deep hierarchies'
updated_at = <Date 2021-10-21.22:21:02.146>
user = 'https://bugs.python.org/AlexanderPatrakov'

bugs.python.org fields:

activity = <Date 2021-10-21.22:21:02.146>
actor = 'Alexander.Patrakov'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2021-10-21.22:21:02.146>
creator = 'Alexander.Patrakov'
dependencies = []
files = []
hgrepos = []
issue_num = 45564
keywords = []
message_count = 1.0
messages = ['404687']
nosy_count = 1.0
nosy_names = ['Alexander.Patrakov']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue45564'
versions = ['Python 3.9']

Linked PRs

@AlexanderPatrakov
Copy link
Mannequin Author

AlexanderPatrakov mannequin commented Oct 21, 2021

It is possible to create deep directory hierarchies that cannot be removed via shutil.rmtree or walked via os.walk, because these functions exceed the interpreter recursion limit. This may have security implications for web services (e.g. various webdisks) that have to clean up user-created mess or walk through it.

[aep@aep-haswell ~]$ mkdir /tmp/badstuff
[aep@aep-haswell ~]$ cd /tmp/badstuff
[aep@aep-haswell badstuff]$ for x in `seq 2048` ; do mkdir $x ; cd $x ; done
[aep@aep-haswell 103]$ cd
[aep@aep-haswell ~]$ python
Python 3.9.7 (default, Oct 10 2021, 15:13:22) 
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.rmtree('/tmp/badstuff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/shutil.py", line 726, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 992 more times]
  File "/usr/lib/python3.9/shutil.py", line 642, in _rmtree_safe_fd
    fullname = os.path.join(path, entry.name)
  File "/usr/lib/python3.9/posixpath.py", line 77, in join
    sep = _get_sep(a)
  File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>> import os
>>> list(os.walk('/tmp/badstuff'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/os.py", line 418, in _walk
    yield from _walk(new_path, topdown, onerror, followlinks)
  File "/usr/lib/python3.9/os.py", line 418, in _walk
    yield from _walk(new_path, topdown, onerror, followlinks)
  File "/usr/lib/python3.9/os.py", line 418, in _walk
    yield from _walk(new_path, topdown, onerror, followlinks)
  [Previous line repeated 993 more times]
  File "/usr/lib/python3.9/os.py", line 412, in _walk
    new_path = join(top, dirname)
  File "/usr/lib/python3.9/posixpath.py", line 77, in join
    sep = _get_sep(a)
  File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>>

@AlexanderPatrakov AlexanderPatrakov mannequin added 3.9 stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 21, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@AlexWaygood AlexWaygood added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 10, 2022
jonburdo added a commit to jonburdo/cpython that referenced this issue Nov 26, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
jonburdo added a commit to jonburdo/cpython that referenced this issue Nov 27, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
@barneygale
Copy link
Contributor

barneygale commented Nov 27, 2022

This also affects pathlib.Path.walk()

jonburdo added a commit to jonburdo/cpython that referenced this issue Nov 30, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
jonburdo added a commit to jonburdo/cpython that referenced this issue Dec 15, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
Ovsyanka83 added a commit to Ovsyanka83/cpython that referenced this issue Dec 16, 2022
Ovsyanka83 added a commit to Ovsyanka83/cpython that referenced this issue Dec 16, 2022
…f github.com:ovsyanka83/cpython into pythongh-89727/fix-pathlib.Path.walk-recursion-depth
@merwok merwok added 3.12 and removed 3.9 labels Dec 16, 2022
@merwok
Copy link
Member

merwok commented Dec 16, 2022

I am unsure on the bug vs feature classification here. On one side it seems like a clear problem and the fixes don’t change function signatures or the way they are used, but on the other side there is no guarantee that extremely deep hierachies are supported, and I wonder if there could be negative effects from the ocde changes (for example, can the stack list/deque grow extremely large during traversal?).

(Adding PR reviewers: @carljm @brettcannon @serhiy-storchaka)

@carljm
Copy link
Contributor

carljm commented Dec 16, 2022

I don't have strong feelings about calling it a bug vs a feature; I would tend to call it a bug because the function just fails in a subset of cases that aren't obviously outside its domain. I agree that it's a bug that we could just document as a known limitation, though I don't see any reason to do that when the fix is not that difficult. I guess the only real impact of how it's classified is whether the fix would be backported?

I wonder if there could be negative effects from the ocde changes (for example, can the stack list/deque grow extremely large during traversal?)

Sure, but in the current code the Python stack would instead grow very large in the same scenario. And the Python stack frames (plus everything they reference) are almost certainly larger than the stack elements tracked in the iterative version, so if anything I would expect the iterative version to also save memory in the case of a very deep traversal.

@merwok
Copy link
Member

merwok commented Dec 16, 2022

Yes exactly, the impact of my question is backporting or not.
Thanks for the reply about the memory concern!

I take it you feel ok about backporting the fix; let’s wait to see what a core dev thinks.

@brettcannon
Copy link
Member

brettcannon commented Dec 16, 2022

I would classify this as a feature request. There are plenty of things in CPython for which you can you run out of stack space for and it isn't considered a bug in those instances (and that's just part of general limitations that CPython has).

@merwok merwok added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 16, 2022
@merwok merwok changed the title shutil.rmtree and os.walk are implemented using recursion, fail on deep hierarchies Change shutil.rmtree and os.walk to support very deep hierarchies Dec 16, 2022
@carljm
Copy link
Contributor

carljm commented Dec 16, 2022

(To be clear, I don't think there's a strong case for backporting this, so reasoning backwards from the conclusion I'm quite happy to call it a feature :P )

@barneygale
Copy link
Contributor

barneygale commented Dec 18, 2022

Are glob.glob(recursive=True) and pathlib.Path.rglob() also affected?

@jonburdo
Copy link
Contributor

jonburdo commented Dec 19, 2022

Are glob.glob(recursive=True) and pathlib.Path.rglob() also affected?

Yes, it looks like it. _rlistdir in Lib/os.py and _RecursiveWildcardSelector in Lib/pathlib.py could be changed to fix it

JelleZijlstra pushed a commit that referenced this issue Dec 19, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
carljm added a commit to carljm/cpython that referenced this issue Dec 19, 2022
* main:
  pythongh-89727: Fix os.walk RecursionError on deep trees (python#99803)
  Docs: Don't upload CI artifacts (python#100330)
  pythongh-94912: Added marker for non-standard coroutine function detection (python#99247)
  Correct CVE-2020-10735 documentation (python#100306)
  pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)
  pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)
  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)
  pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)
  pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
jonburdo added a commit to jonburdo/cpython that referenced this issue Dec 20, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
@barneygale
Copy link
Contributor

barneygale commented Dec 20, 2022

Some previous discussion in #70732.

Ovsyanka83 added a commit to Ovsyanka83/cpython that referenced this issue Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants