New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize pathlib path construction #101362
Comments
I'd like to land #101363 before I put the first PR up for this issue. |
`PurePath` now normalises and splits paths only when necessary, e.g. when `.name` or `.parent` is accessed. The result is cached. This speeds up path object construction by around 4x. `PurePath.__fspath__()` now returns an unnormalised path, which should be transparent to filesystem APIs (else pathlib's normalisation is broken!). This extends the earlier performance improvement to most impure `Path` methods, and also speeds up pickling, `p.joinpath('bar')` and `p / 'bar'`. This also fixes pythonGH-76846 and pythonGH-85281 by unifying path constructors and adding an `__init__()` method.
This saves a comparison in `pathlib.Path.__new__()` and reduces the time taken to run `Path()` by ~5%
This saves a comparison in `pathlib.Path.__new__()` and reduces the time taken to run `Path()` by ~5%
…b.PurePath This reduces the time taken to run `PurePath("foo")` by ~15%
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized! This reduces the time taken to run `PurePath("foo", "bar") by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.
Does the PR cope with |
Could you clarify? The PRs maintain the behaviour that attempting to instantiate |
This behaviour should be preserved:
|
Right! That will be broken by #101667 as things stand: >>> from os import fspath
>>> from pathlib import *
>>> p = PureWindowsPath("a/b/c")
>>> p
PureWindowsPath('a/b/c')
>>> fspath(p)
'a\\b\\c'
>>> PurePosixPath(fspath(p))
PurePosixPath('a\\b\\c')
>>> PurePosixPath(p)
PurePosixPath('a\\b\\c') It doesn't appear to be documented or tested behaviour, and it feels odd to me that |
This feature doesn't really work with drives or roots: >>> PurePosixPath(PureWindowsPath('//server/share/dir'))
PurePosixPath('\\\\server\\share\\/dir')
>>> PurePosixPath(PureWindowsPath('c:/dir'))
PurePosixPath('c:\\/dir')
>>> PurePosixPath(PureWindowsPath('/dir'))
PurePosixPath('\\/dir') As far as I can tell, no one has ever logged a bug about it. However, using >>> PurePosixPath(PureWindowsPath('//server/share/dir').as_posix())
PurePosixPath('//server/share/dir')
>>> PurePosixPath(PureWindowsPath('c:/dir').as_posix())
PurePosixPath('c:/dir')
>>> PurePosixPath(PureWindowsPath('/dir').as_posix())
PurePosixPath('/dir') So I'm tempted to conclude that converting with |
I think the direct conversion should either be consistent with Might be worth adding a short note to the docs warning that the constructor can't reliably convert from different On another perf note, is it possible that parsing the path up front isn't necessary? Obviously it'll save the most time to keep a single string literal around and parse it later, but I don't personally have a good feel for whether that's common or not. (Obviously if it's available pre-parsed then keep it.) |
That's my plan! We can return the unnormalized path from |
…H-101664) This saves a comparison in `pathlib.Path.__new__()` and reduces the time taken to run `Path()` by ~5%. Automerge-Triggered-By: GH:AlexWaygood
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized! This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%. Automerge-Triggered-By: GH:AlexWaygood
Pathlib is slow. One of the most obvious symptoms is that
pathlib.PurePath
objects are slow to construct. We should be able to speed construction up without making other parts of pathlib slower.Two possible approaches:
__new__()
,_from_parts()
,_parse_parts()
,_parse_args()
.Linked PRs
The text was updated successfully, but these errors were encountered: