Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pathlib path pickling #112855

Closed
barneygale opened this issue Dec 7, 2023 · 0 comments
Closed

Optimize pathlib path pickling #112855

barneygale opened this issue Dec 7, 2023 · 0 comments
Labels
performance Performance or resource usage topic-pathlib

Comments

@barneygale
Copy link
Contributor

barneygale commented Dec 7, 2023

pathlib.PurePath.__reduce__() currently accesses and returns the parts tuple. Pathlib ensures that the strings therein are interned.

There's a good reason to do this: it ensures that the pickled data is as small as possible, with maximum re-use of small string objects.

However, it comes with some disadvantages:

  1. When normalising any path, we need to call sys.intern(str(part)) on each part
  2. When pickling a path, we must join, parse and normalise, and then generate the parts tuple.

We could instead make __reduce__() return the raw paths fed to the constructor (the _raw_paths attribute). This would be faster but less space efficient. With the cost of storage and bandwidth falling at a faster rate than compute, I suspect this trade-off is worth making.

Linked PRs

@barneygale barneygale added performance Performance or resource usage topic-pathlib labels Dec 7, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Dec 7, 2023
The second item in the tuple returned from `__reduce__()` is a tuple of
arguments to supply to path constructor. Previously we returned the `parts`
tuple here, which entailed joining, parsing and normalising the path
object, and produced a compact pickle representation.

With this patch, we instead return a tuple of paths that were originally
given to the path constructor. This makes pickling much faster (at the
expense of compactness). By also omitting to `sys.intern()` the path parts,
we slightly speed up path parsing/normalization more generally.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 18, 2023
Add a few more simple test cases, like non-anchored paths. Remove misplaced
and indirect test that pickling doesn't change the `stat()` value.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 22, 2023
barneygale added a commit that referenced this issue Dec 22, 2023
…13243)

Add a few more simple test cases, like non-anchored paths. Remove misplaced
and indirect test that pickling doesn't change the `stat()` value.
ryan-duve pushed a commit to ryan-duve/cpython that referenced this issue Dec 26, 2023
…ng (python#113243)

Add a few more simple test cases, like non-anchored paths. Remove misplaced
and indirect test that pickling doesn't change the `stat()` value.
kulikjak pushed a commit to kulikjak/cpython that referenced this issue Jan 22, 2024
…ng (python#113243)

Add a few more simple test cases, like non-anchored paths. Remove misplaced
and indirect test that pickling doesn't change the `stat()` value.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ng (python#113243)

Add a few more simple test cases, like non-anchored paths. Remove misplaced
and indirect test that pickling doesn't change the `stat()` value.
barneygale added a commit that referenced this issue Apr 20, 2024
The second item in the tuple returned from `__reduce__()` is a tuple of arguments to supply to path constructor. Previously we returned the `parts` tuple here, which entailed joining, parsing and normalising the path object, and produced a compact pickle representation.

With this patch, we instead return a tuple of paths that were originally given to the path constructor. This makes pickling much faster (at the expense of compactness).

It's worth noting that, in the olden times, pathlib performed this parsing/normalization up-front in every case, and so using `parts` for pickling was almost free. Nowadays pathlib only parses/normalises paths when it's necessary or advantageous to do so (e.g. computing a path parent, or iterating over a directory, respectively).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

No branches or pull requests

1 participant