Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialized match_keys for exact dictionary type #93714

Open
da-woods opened this issue Jun 11, 2022 · 6 comments
Open

Specialized match_keys for exact dictionary type #93714

da-woods opened this issue Jun 11, 2022 · 6 comments
Labels
performance type-feature

Comments

@da-woods
Copy link
Contributor

@da-woods da-woods commented Jun 11, 2022

Feature or enhancement

I think it's probably worthwhile to write a specialized version of match_keys (for structural pattern matching of mappings) for exact dictionary types

Pitch

The most common mapping type is almost certainly the built-in dict. The match_keys function in ceval.c could be specialized to handle this type. Doing so would allow you to skip the dummy object, and replace the get call with PyDict_GetItemWithError. I don't think this should change the observable behaviour at all for exact dict

I'm in the process of reimplementing structural pattern matching on Cython, and this change gives ~2x optimization of some microbenchmarks.

If this change is deemed desirable I'm happy to implement it myself.

Previous discussion

Not aware of any

@da-woods da-woods added the type-feature label Jun 11, 2022
@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jun 11, 2022

What you mean is something like

if(PyDIct_CheckExact(o)) {
    PyDIct_GetItemWIthError(o, key)
}
else {
// The current code goes here
}

right? (Whenever I see the word "specialize" now I immediately think of PEP 659, so I wanted to clear that up).

@da-woods
Copy link
Contributor Author

@da-woods da-woods commented Jun 11, 2022

Link to the current match_keys function just to make it easier to discuss:

match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)

I was thinking of having a second copy of the function that operates on dicts. So match_keys starts something like:

if (PyDict_CheckExact(map)) {
    return match_keys_dict(tstate, map, keys);
}

and match_keys_dict then uses PyDIct_GetItemWIthError(o, key) instead of calling mapping.get.

And no - I wasn't proposing anything like PEP 659 - just an if statement to pick a slightly faster implementation

@AlexWaygood AlexWaygood added the performance label Jun 11, 2022
@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jun 11, 2022

In theory this seems sound to me and I'm +1. But I'll wait to hear what @brandtbucher has to say.

@brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Jun 12, 2022

I actually discovered during a PyCon all-nighter (thanks to some nerd-sniping by @tonybaloney) that we can get a huge speedup for mapping patterns by moving this logic into the bytecode and:

  • removing the length check
  • using CONTAINS_OP and BINARY_SUBSCR instead of two-argument get
  • not building a set to check for duplicate keys if they’re all known statically (common case)
  • some other tidying up

As a bonus, with this scheme we get dictionary specialization for free, since the logic is being moved “into Python code”.

My plan is to make this change in 3.12. I have a branch with the work, just haven’t gotten around to cleaning it up and opening a PR yet. 😬

I’d be okay merging a PR that does what you describe now since it’s always better to make things faster. But just know that there’s a pretty good chance it could get completely ripped out and replaced before 3.12 is released, haha.

@TeamSpen210
Copy link

@TeamSpen210 TeamSpen210 commented Jun 12, 2022

Unfortunately it’s documented that get has to be used, to prevent defaultdict’s behaviour from being triggered.

@brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Jun 12, 2022

Indeed, that’s the main sticking point. Although I'm pretty confident that it won't be much of an issue because:

  • We already require the Mapping interface, which includes containment and subscription
  • Checking for containment before subscription still avoids creating new keys in defaultdicts, which is the main reason why we went with 2-argument get in the first place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance type-feature
Projects
None yet
Development

No branches or pull requests

5 participants