New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rmtree
& copy
method to pathlib
#92771
Comments
rmtree seems entirely possible, considering we will have walk() soon. I'll have to benchmark to see if we can use my walk() implementation for this (as it can be a bit slow due to the use of stat()). Considering we have rename(), copy seems to be possible as well. We could also just wrap shutil's functions in pathlib but then Path._scandir will not be utilized; also shutil is not currently used in pathlib at all -- and we might have a good reason for it. I believe that @barneygale would also be interested in this conversation. |
rmtree
, copy
,... method to pathlibrmtree
& copy
method to pathlib
I recall a discussion (although despite searching I can't find it) that pathlib shouldn't replicate (or wrap or otherwise provide) this sort of functionality, since the version in shutil is fine. The intention was solely to provide an alternative to os.path et al. I sort of agree, but I do think there is a compelling case for the two methods you're proposing in the context of Barney's abstract path concept. |
@domdfcoding I feel like pathlib can and should support most functionality of both os.path and shutil. It feels natural to use rmtree, rmdir, unlink, walk, etc on Path objects. |
To be clear, pathlib is not expected to fully supplant everything in |
And to add on to my last comment, each addition needs to somehow improve add to the situation, else it's just wrapping a function call in a method and that isn't helpful enough to need to support that API for decades. /cc @barneygale |
Wrapping a function call in a method is 50% of pathlib, and it made pathlib wildly popular even before it was added to the standard library. The helpfulness comes from the object-oriented API and the DSL-like goodness of constructing paths with I'd consider the following shutil functions candidates for wrapping in pathlib: @Conchylicultor do you think you could log a separate issue for |
@brettcannon could you explain what reason there is for pathlib to not support, for example, a majority of functionality of shutil? I've been wondering about that for a while now. |
In my recollection, pathlib’s goal is to provide a class abstraction to replace |
Good point. I still believe that reading functionality should be a part of pathlib (such as walk) but it makes sense that things like rmtree are not exactly necessary there. |
@barneygale You're right, making an object-oriented API that cleverly leveraged operator overloading is what has made pathlib successful. But there is a worry of scope creep and trying to keep the API small enough to fit in your head. For instance, In all of this, you have to also understand that pathlib quite possibly would not have a lot of what's in there today as it predates Now I fully admit this is all subjective, but this is what we have to think through with all of this. Remember, any API we add will need to be supported for literally decades and I have to assume I will be the one doing the support (I have personally been on Python's development team for 19 years and I still have code from my first commit being used in the stdlib). And since we are not deprecating shutil here as that's not really benefiting the preexisting users, you have to have a really good reason for increasing the amount of code you're asking me to support for a couple of decades (on top of everything else I already have to support and will need to support in the future). Now, I do understand that @barneygale has plans for an @Ovsyanka83 what @merwok said covers it nicely; common actions can/should be on pathlib, but things that are lower-level or more rare should lean on |
The fspath protocol is not a solution here, because it assume local filesystem. For example, all the following will fail: shutils.rmtree(epath.Path('gs://bucket/a')) # Fail
shutils.rmtree(universal_pathlib.UPath('s3://bucket/a')) # Fail
shutils.copy(zipfile.Path('x.zip', 'file.txt'), 'dst/file.txt') # Fail Copying my original message:
So I can write If all program where using the pathlib-protocol in their program instead of |
To be honest, the more I read this discussion, the more I side with Brett. Thinking realistically: imagine that google cloud library implements its own rmtree, copy, etc. It won't be very nice but it will give you +- the same interface. Additionally, PathLab can also add support for such things. If it works and becomes the de-facto approach, then maybe some day we can merge a portion of its methods with pathlib. And before that, pathlab will only be one The more I think about the ways we could reach this full vision with everyone following pathlib interface, the more complex the vision gets. Let's figure out exactly why shutil and os are divided, then try to come up with a vision of system-independent full (copy, move, rmtree, etc) path interface (ideally where only low-level methods would need to be implemented), and then turn it into a full blown PEP. That way we will either be disillusioned with the whole idea or we will have a solid motivation section. Update: After some research, I consider shutil's and os's division slightly arbitrary. To put it simply: os tries to be a thin wrapper around c stdlib and is intended to have a similar api with it. Shutil, on the other hand, doesn't. If that's all there is, I feel like merging their functionality within pathlib isn't such a big issue. However, we will need to find (or become) a core developer who agrees to support it for decades :) |
Ah! That is indeed a valuable use case. I can easily imagine some web dev script that removes temporary files, not caring if it’s a local directory for local dev or some cloud storage bucket on the server. |
I spotted this in PEP 428:
I've gone off the idea of wrapping the original functions or duplicating their implementations. I'm beginning to think we should actually move some functions from |
I agree with the idea of moving them into pathlib. Though I believe that os.path should be deprecated first. |
I don’t think there is a plan for deprecation. os.path is a foundation module that works with strings (similar to basic syscalls) We have different levels for doing things with files. It’s not duplication (same things in multiple places) but different interfaces. |
I don't agree with deprecating |
No problem with that :) Nevertheless, I'm really happy that we're actively working on improving pathlib and I'm so excited for Barney's plan for extensible |
I'll be more explicit and say there are no plans, nor will there ever be any plans, to deprecate |
( |
Okay-okay, I retract my comment :) |
FTR in my message I was also reacting to this;
|
After much thought I tend to agree with @brettcannon and @merwok . Once the dream of universal API comes closer to fruition, implementing rmtree, copytree, etc will make a lot more sense and will be a safe change. So I'm inclined to vote for closing this issue and my pull request connected to it. Are we in agreement here? If not, I can clean up my PR and await Brett's review on it. |
@Ovsyanka83 I appreciate the thought and work you have put into this! I think I'm personally convinced enough to close this at this point. Since both |
Feature or enhancement
Currently
pathlib
is missing someshutil
features:rmtree
orcopy
Pitch
pathlib.Path
define an Interface / API that many third party modules implement:Having a consistent API is great as I can write code for pathlib, and it is automatically compatible with all pathlib-like backends without any code changes.
(e.g. user can replace
pathlib.Path
byepath.Path
and the function magically work with Butt storage likegs://
,s3://
,...).However due to the lack of
.rmtree
and.copy
supports, some operations are impossible. It means each pathlib-like backend who want to support recursive directory deletion or copy has to come up with their own API, leading to inconsistency.(technically it is possible to implement some
rmtree(path.Path())
implemented only usingpathlib
function, but this would lead in very inefficient performance on remote storage)Defining them directly in pathlib would make them part of the standard, with well defined semantic. This would allow implementations to rely on them and to switch between pathlib-like objects.
This will be even more relevant when pathlib will provide an explicit interface that users an inherit from: https://discuss.python.org/t/make-pathlib-extensible/3428/7
The text was updated successfully, but these errors were encountered: