New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change shutil.rmtree and os.walk to support very deep hierarchies #89727
Comments
It is possible to create deep directory hierarchies that cannot be removed via shutil.rmtree or walked via os.walk, because these functions exceed the interpreter recursion limit. This may have security implications for web services (e.g. various webdisks) that have to clean up user-created mess or walk through it. [aep@aep-haswell ~]$ mkdir /tmp/badstuff
[aep@aep-haswell ~]$ cd /tmp/badstuff
[aep@aep-haswell badstuff]$ for x in `seq 2048` ; do mkdir $x ; cd $x ; done
[aep@aep-haswell 103]$ cd
[aep@aep-haswell ~]$ python
Python 3.9.7 (default, Oct 10 2021, 15:13:22)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.rmtree('/tmp/badstuff')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/shutil.py", line 726, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
[Previous line repeated 992 more times]
File "/usr/lib/python3.9/shutil.py", line 642, in _rmtree_safe_fd
fullname = os.path.join(path, entry.name)
File "/usr/lib/python3.9/posixpath.py", line 77, in join
sep = _get_sep(a)
File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>> import os
>>> list(os.walk('/tmp/badstuff'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/os.py", line 418, in _walk
yield from _walk(new_path, topdown, onerror, followlinks)
File "/usr/lib/python3.9/os.py", line 418, in _walk
yield from _walk(new_path, topdown, onerror, followlinks)
File "/usr/lib/python3.9/os.py", line 418, in _walk
yield from _walk(new_path, topdown, onerror, followlinks)
[Previous line repeated 993 more times]
File "/usr/lib/python3.9/os.py", line 412, in _walk
new_path = join(top, dirname)
File "/usr/lib/python3.9/posixpath.py", line 77, in join
sep = _get_sep(a)
File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>> |
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
This also affects |
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
…f github.com:ovsyanka83/cpython into pythongh-89727/fix-pathlib.Path.walk-recursion-depth
I am unsure on the bug vs feature classification here. On one side it seems like a clear problem and the fixes don’t change function signatures or the way they are used, but on the other side there is no guarantee that extremely deep hierachies are supported, and I wonder if there could be negative effects from the ocde changes (for example, can the (Adding PR reviewers: @carljm @brettcannon @serhiy-storchaka) |
I don't have strong feelings about calling it a bug vs a feature; I would tend to call it a bug because the function just fails in a subset of cases that aren't obviously outside its domain. I agree that it's a bug that we could just document as a known limitation, though I don't see any reason to do that when the fix is not that difficult. I guess the only real impact of how it's classified is whether the fix would be backported?
Sure, but in the current code the Python stack would instead grow very large in the same scenario. And the Python stack frames (plus everything they reference) are almost certainly larger than the stack elements tracked in the iterative version, so if anything I would expect the iterative version to also save memory in the case of a very deep traversal. |
Yes exactly, the impact of my question is backporting or not. I take it you feel ok about backporting the fix; let’s wait to see what a core dev thinks. |
I would classify this as a feature request. There are plenty of things in CPython for which you can you run out of stack space for and it isn't considered a bug in those instances (and that's just part of general limitations that CPython has). |
(To be clear, I don't think there's a strong case for backporting this, so reasoning backwards from the conclusion I'm quite happy to call it a feature :P ) |
Are |
Yes, it looks like it. |
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
* main: pythongh-89727: Fix os.walk RecursionError on deep trees (python#99803) Docs: Don't upload CI artifacts (python#100330) pythongh-94912: Added marker for non-standard coroutine function detection (python#99247) Correct CVE-2020-10735 documentation (python#100306) pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273) pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551) Docs: Use `PY_VERSION_HEX` for version comparison (python#100179) pythongh-97909: Fix markup for `PyMethodDef` members (python#100089) pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890) pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890) pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205) pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
Some previous discussion in #70732. |
AlexanderPatrakov mannequin commentedOct 21, 2021
•
edited by bedevere-bot
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: