New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize pathlib path pickling #112855
Labels
Comments
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
Dec 7, 2023
The second item in the tuple returned from `__reduce__()` is a tuple of arguments to supply to path constructor. Previously we returned the `parts` tuple here, which entailed joining, parsing and normalising the path object, and produced a compact pickle representation. With this patch, we instead return a tuple of paths that were originally given to the path constructor. This makes pickling much faster (at the expense of compactness). By also omitting to `sys.intern()` the path parts, we slightly speed up path parsing/normalization more generally.
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
Dec 18, 2023
Add a few more simple test cases, like non-anchored paths. Remove misplaced and indirect test that pickling doesn't change the `stat()` value.
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
Dec 22, 2023
barneygale
added a commit
that referenced
this issue
Dec 22, 2023
…13243) Add a few more simple test cases, like non-anchored paths. Remove misplaced and indirect test that pickling doesn't change the `stat()` value.
ryan-duve
pushed a commit
to ryan-duve/cpython
that referenced
this issue
Dec 26, 2023
…ng (python#113243) Add a few more simple test cases, like non-anchored paths. Remove misplaced and indirect test that pickling doesn't change the `stat()` value.
kulikjak
pushed a commit
to kulikjak/cpython
that referenced
this issue
Jan 22, 2024
…ng (python#113243) Add a few more simple test cases, like non-anchored paths. Remove misplaced and indirect test that pickling doesn't change the `stat()` value.
aisk
pushed a commit
to aisk/cpython
that referenced
this issue
Feb 11, 2024
…ng (python#113243) Add a few more simple test cases, like non-anchored paths. Remove misplaced and indirect test that pickling doesn't change the `stat()` value.
barneygale
added a commit
that referenced
this issue
Apr 20, 2024
The second item in the tuple returned from `__reduce__()` is a tuple of arguments to supply to path constructor. Previously we returned the `parts` tuple here, which entailed joining, parsing and normalising the path object, and produced a compact pickle representation. With this patch, we instead return a tuple of paths that were originally given to the path constructor. This makes pickling much faster (at the expense of compactness). It's worth noting that, in the olden times, pathlib performed this parsing/normalization up-front in every case, and so using `parts` for pickling was almost free. Nowadays pathlib only parses/normalises paths when it's necessary or advantageous to do so (e.g. computing a path parent, or iterating over a directory, respectively).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
pathlib.PurePath.__reduce__()
currently accesses and returns theparts
tuple. Pathlib ensures that the strings therein are interned.There's a good reason to do this: it ensures that the pickled data is as small as possible, with maximum re-use of small string objects.
However, it comes with some disadvantages:
sys.intern(str(part))
on each partparts
tuple.We could instead make
__reduce__()
return the raw paths fed to the constructor (the_raw_paths
attribute). This would be faster but less space efficient. With the cost of storage and bandwidth falling at a faster rate than compute, I suspect this trade-off is worth making.Linked PRs
pathlib.PurePath
pickling #112856pathlib.PurePath
pickling #113243The text was updated successfully, but these errors were encountered: