Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39308: Make _tp_cache typed #17974

Open
wants to merge 2 commits into
base: master
from

Conversation

@kornicameister
Copy link

kornicameister commented Jan 12, 2020

@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Jan 12, 2020

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@kornicameister

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@kornicameister

This comment has been minimized.

Copy link
Author

kornicameister commented Jan 12, 2020

ok, haven't exactly read entire contributing docs and missed News bit, I will deal with that shortly.
I am more worried about lack of CLA which I have signed just yesterday :/


just signed CLA again, changed bugs.python.org username to match GH account (not even sure if that was needed, better be sorry though). Not sure if there's anything else I can now, I thought I've completed process correctly.

Copy link
Member

gvanrossum left a comment

Looks good. CLA usually takes a business day.

@kornicameister

This comment has been minimized.

Copy link
Author

kornicameister commented Jan 13, 2020

@gvanrossum alright, so let's wait for CLA to be processed. I assume my job here is done and bot will simply change label if CLA gets in.

cheers 👍

@ilevkivskyi

This comment has been minimized.

Copy link
Contributor

ilevkivskyi commented Jan 13, 2020

I am a bit worried about performance impact. This is relevant only for Literal[...] but will affect all generics like List[int]. If the performance impact is small, then it is OK, otherwise I would specifically used the typed cache version only for Literal[...].

@ilevkivskyi

This comment has been minimized.

Copy link
Contributor

ilevkivskyi commented Jan 13, 2020

(Removed the auto-merge label until perf question is resolved).

@kornicameister

This comment has been minimized.

Copy link
Author

kornicameister commented Jan 13, 2020

@ilevkivskyi seems a bit contradicting ;), performance issue and caching. But that's worth checking anyway, I will do such comparison later on.

@ilevkivskyi

This comment has been minimized.

Copy link
Contributor

ilevkivskyi commented Jan 13, 2020

What is contradicting here? Typed cache lookup will be slower (because we need to compare the types), the question is how much slower.

@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Jan 13, 2020

Looking at the source, the difference in the cache is that with typed=False, the cache key used for Literal[1] will be the integer object 1, while with typed=True it will construct a tuple (1, int). Or, more likely, since @_tp_cache is applied to a method (__getitem__), the cache key without will be the tuple (self, 1) and the cache key with will be (self, 1, Literal, int). So cache key construction will be a bit slower and cache lookup will too. Using timeit it might be easy for the OP to check whether this is significant. (Note that the "real" code is written in C, and I didn't look at it, but IIRC it faithfully follows the Python version linked above.)

@kornicameister

This comment has been minimized.

Copy link
Author

kornicameister commented Jan 13, 2020

Results:

stmt typed=False typed=True 2 caches
t.Literal[1] 0.46885ms 0.53354ms 0.52857ms
t.Literal[1] 0.43094ms 0.49489ms 0.50706ms
(t.Literal[1], t.Literal[1], t.Literal[0], t.Literal[0]) 1.71090ms 1.99839ms 2.08259ms
t.Literal[True] 0.42652ms 0.46590ms 0.51885ms
t.Literal[True, False] 0.42159ms 0.46590ms 0.53356ms
t.Literal[0] 0.41263ms 0.46680ms 0.51893ms
t.Literal[0, 1] 0.42797ms 0.47858ms 0.54927ms
t.List[t.Literal[1,2,3]] 1.68909ms 1.85799ms 1.88905ms
t.Union[float, complex] 0.57196ms 0.55289ms 0.58154ms
t.Union[str, bytes, float, complex, t.Literal[1]] 1.57654ms 1.68061ms 1.74077ms
t.NewType("NT", int) 0.31505ms 0.30028ms 0.29731ms
t.Literal[1,2,3,4,5,6,7,8] 0.44533ms 0.49266ms 0.54586ms
t.Dict[t.AnyStr, complex] 0.71057ms 0.80595ms 0.78011ms
t.Set[t.AnyStr] 0.67253ms 0.72656ms 0.68497ms
t.Set[t.Union[float, int]] 1.82547ms 1.99398ms 2.18703ms
t.AbstractSet[t.Union[float, int]] 1.84454ms 1.96812ms 2.12587ms
t.Sequence[t.Union[float, int]] 1.88198ms 1.98809ms 2.01617ms
t.NamedTuple("A", x=int) 56.12496ms 55.05577ms 56.93304ms
t.NamedTuple("B", x=int,y=float) 60.79059ms 62.23118ms 63.67331ms
t.NamedTuple("C", x=int,y=float,z=complex) 65.46186ms 63.83327ms 66.34171ms

Scripts:

import typing as t
import timeit as tt
import time
import operator
import functools

stmts = (
    't.Literal[1]',
    't.Literal[1]',
    '(t.Literal[1], t.Literal[1], t.Literal[0], t.Literal[0])',
    't.Literal[True]',
    't.Literal[True, False]',
    '(t.Literal[True, False], t.Literal[True, False], t.Literal[True, False], t.Literal[True, False], t.Literal[True, False])',
    't.Literal[0]',
    't.Literal[0, 1]',
    't.List[t.Literal[1,2,3]]',
    't.Union[float, complex]',
    't.Union[str, bytes, float, complex, t.Literal[1]]',
    't.NewType("NT", int)',
    't.Literal[1,2,3,4,5,6,7,8]',
    't.Dict[t.AnyStr, complex]',
    't.Set[t.AnyStr]',
    't.Set[t.Union[float, int]]',
    't.AbstractSet[t.Union[float, int]]',
    't.Sequence[t.Union[float, int]]',
    't.NamedTuple("A", x=int)',
    't.NamedTuple("B", x=int,y=float)',
    't.NamedTuple("C", x=int,y=float,z=complex)',
)
repeats = 300.0
exec_number = 1000

for stmt in stmts:
    times=tt.repeat(
        stmt,
        globals={
            't': t,
        },
        number=exec_number,
        repeat=int(repeats),
        timer=time.perf_counter,
    )
    avg = functools.reduce(operator.add, times) / repeats
    print(f'{stmt} => {avg*1000.0:.5f}ms')

indeed, having typed=False results in performance impact but worst results can be observed when there are two caches one with typed=True and the other with typed=False i.e.:

normal_cache = functools.lru_cache(typed=False)(func)
literal_cached = functools.lru_cache(typed=True)(func)

_cleanups.append(normal_cache.cache_clear)
_cleanups.append(literal_cached.cache_clear)
@gvanrossum

This comment has been minimized.

Copy link
Member

gvanrossum commented Jan 13, 2020

@kornicameister

This comment has been minimized.

Copy link
Author

kornicameister commented Jan 13, 2020

a bit worried about performance impact. This is relevant only for Literal[...] but will affect all generics like List[int]. If the performance impact is small, then it is OK, otherwise I would specifically used the typed cache version only for Literal[...].

@ilevkivskyi mentioned about having cache typed=True just for Literal.

@kornicameister

This comment has been minimized.

Copy link
Author

kornicameister commented Jan 13, 2020

I was hoping that maybe there will be a possibility to affect __eq__ or __hash__ for Literal but given how it is constructed, I don't see how it can be achieved at reasonable cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.