Skip to content

gh-93714 Create fast version of match_keys for exact dict #93752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

da-woods
Copy link
Contributor

When the type is an exact dict, there's a real speed-up to be gained by skipping the call to "get" and using PyDict_GetItemWithError instead. The implementation is just a slightly modified copy of the existing match_keys function.

When the type is an exact dict, there's a real speed-up to be
gained by skipping the call to "get" and using PyDict_GetItemWithError
instead.
@AA-Turner AA-Turner added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 12, 2022
@sweeneyde
Copy link
Member

Do you have any benchmark results to prove whether this is worth it?

@da-woods
Copy link
Contributor Author

da-woods commented Jun 13, 2022

I'd made a set of microbenchmarks for pattern matching. This change obviously only affects the first test subject (the dict). I've also included the second subject (an inexact dict) below for comparison.

  • Without:
subject {'a': 'xxx', 1: 500} - time (μs): 1.29 1.28 1.28 1.29 1.28
subject D({'a': 'xxx', 1: 500}) - time (μs): 1.29 1.28 1.29 1.28 1.29
...
  • With:
subject {'a': 'xxx', 1: 500} - time (μs): 1.05 1.04 1.05 1.04 1.06
subject D({'a': 'xxx', 1: 500}) - time (μs): 1.33 1.27 1.27 1.27 1.27
...

It's about a 20% speed up here. If I increase the number of keys to be tested so that the subject is {'a': 'xxx', 1: 500, 'b': 1, 'c': 2, 'd': 3} and the pattern is {"a": "xxx", A.b: x, "b": _, "c": _, "d": _, **extra} then the speed-up remains about 0.2 μs per call, suggesting that it's the constant work of getting subject.get and setting up the dummy object that's being eliminated.

So the benefit is probably "real, but not huge".

(I should add: I'm just compiling Python with make and no special C flags. The Python 3.10 version pre-installed on my system is notably quicker than both versions here, so I suspect I'm not getting the right compiler flags to really test it)

@da-woods
Copy link
Contributor Author

With --enable-optimizations --with-ltoI get:

With this change

subject {'a': 'xxx', 1: 500} - time (μs): 0.506 0.505 0.506 0.504 0.506
subject D({'a': 'xxx', 1: 500}) - time (μs): 0.593 0.596 0.594 0.594 0.597

and without

subject {'a': 'xxx', 1: 500} - time (μs): 0.569 0.574 0.574 0.577 0.578
subject D({'a': 'xxx', 1: 500}) - time (μs): 0.578 0.58 0.585 0.583 0.584

So a fairly similar level of speed-up.

@markshannon
Copy link
Member

I think this is heading in the wrong direction.
We should be improving the performance of pattern matching by producing better bytecode, not by making the interpreter more complex.

@da-woods
Copy link
Contributor Author

I think this is heading in the wrong direction. We should be improving the performance of pattern matching by producing better bytecode, not by making the interpreter more complex.

I agree at least in principle - I known @brandtbucher said he'd done some work on that. I was mainly proposing this PR as a fairly simple short-term improvement. But I'd ultimately expect it to be replaced with bytecode.

Please do reject it if you don't think the extra code is worth the improvement though.

@markshannon
Copy link
Member

I won't close this just yet.
We might choose to use it as a temporary measure, until the compiler generates better code for matching mappings.
@brandtbucher thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants