Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in semantics and much worse performance for enum members. #93910

Open
markshannon opened this issue Jun 16, 2022 · 76 comments
Open

Change in semantics and much worse performance for enum members. #93910

markshannon opened this issue Jun 16, 2022 · 76 comments
Assignees
Labels
3.12 new features, bug and security fixes deferred-blocker performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@markshannon
Copy link
Member

markshannon commented Jun 16, 2022

Given the enum:

class Colours(Enum):
    RED = 1

In Python 3.9 and 3.10:

>>> Colours.__dict__["RED"] is Colours.RED
True
>>> Colours.RED.RED
<Colours.RED: 1>

In Python 3.11:

>>> Colours.__dict__["RED"] is Colours.RED
False
>>> Colours.RED.RED
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mark/repos/cpython/Lib/enum.py", line 198, in __get__
    raise AttributeError(
    ^^^^^^^^^^^^^^^^^^^^^
AttributeError: <enum 'Colours'> member has no attribute 'RED'

While these might seem like minor semantic changes, there is also a large performance impact.
Lookup of Colours.RED is simple and efficient in 3.10, but involves a lot of indirection and dispatching through the enum.property class in 3.11.
The performance impact is likely to get worse in 3.12, as we optimize more kinds of attributes.

Introduced in c314e60, I believe.

@pablogsal
@ethanfurman

@markshannon markshannon added type-bug An unexpected behavior, bug, or error 3.11 bug and security fixes 3.12 new features, bug and security fixes labels Jun 16, 2022
@markshannon markshannon added deferred-blocker performance Performance or resource usage labels Jun 16, 2022
@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Jun 16, 2022
@ethanfurman ethanfurman removed type-bug An unexpected behavior, bug, or error deferred-blocker labels Jun 22, 2022
@ethanfurman
Copy link
Member

Those changes are intentional, and remedy an issue present since 3.5. As for performance, the work @markshannon and others have done has already had a considerable impact on making enum faster in 3.12.

@ethanfurman ethanfurman self-assigned this Jun 22, 2022
@markshannon
Copy link
Member Author

What issue does the change remedy?

I'm puzzled by your claim that the work we've done has sped up enums for 3.12? It hasn't. Without c314e60, it would have.

@markshannon
Copy link
Member Author

It looks like the change in Colours.RED.RED has been brought up before: #87328
Failing on 3.11 seems to be in violation of PEP 387. A warning should be issued until 3.12.

What I don't understand is why this change is necessary at all.
What benefits does it bring? Accessing class attributes from instances is fairly universal, why should enums be special?

@ethanfurman
Copy link
Member

The rudimentary timings I did on my system for member access showed enums steadily getting faster over the releases, with a big jump in performance between 3.11 and 3.12.

The Colors.RED.RED behavior was illegal/missing in 3.4, added in 3.5 for performance (but warned against), and SC permission recieved for skipping the programmatic warning in 3.11

Enums are special, and this change brings them back into line with the original, intended, specification.

@markshannon
Copy link
Member Author

The performance of enum lookup is much slower than it should be in 3.11.
It would be a shame to have to advise people not to use enums if they care about performance.

The SC permission pertains to changes in repr(), str(), and format(), AFAICT. No mention is made of the the change in semantics of attribute lookup and performance.

@warsaw Could you confirm?

@markshannon
Copy link
Member Author

markshannon commented Jun 24, 2022

I'm seeing an approx x9 slowdown on attribute access.

class Color(Enum):
    RED = "Red"
    BLUE = "Blue"
    GREEN = "Green"

class FastColor:
    RED = Color.RED
    BLUE = Color.BLUE
    GREEN = Color.GREEN
    
def f():
    for _ in range(1000):
        Color.RED
        Color.BLUE
        Color.GREEN
        
import timeit
print(timeit.timeit('f()', number=10000, globals=globals()))

Color = FastColor
print(timeit.timeit('f()', number=10000, globals=globals()))

With an optimized, but non pgo, no lto, build.

0.8739701807498932
0.09454824822023511

@markshannon
Copy link
Member Author

OOI, why is it necessary that code like if self == self.START: no longer works?

@pablogsal
Copy link
Member

I am marking this as a release blocker so we can discuss this properly. If we don't reach consensus quickly, please, let's send an email to python-dev so we can collectively reach an agreement.

@hroncok
Copy link
Contributor

hroncok commented Jun 24, 2022

For the record, in #87328 (comment) @ethanfurman said:

DeprecationWarning will be active in 3.10 and 3.11 with removal in 3.12.

This was not followed, for reasons I don't understand.

This once again breaks real projects, e.g. googleapis/proto-plus-python#326

@JelleZijlstra
Copy link
Member

The performance of enum lookup is much slower than it should be in 3.11. It would be a shame to have to advise people not to use enums if they care about performance.

The SC permission pertains to changes in repr(), str(), and format(), AFAICT. No mention is made of the the change in semantics of attribute lookup and performance.

@warsaw Could you confirm?

The permission was granted in python/steering-council#80 (linked in the last line of the linked message).

But I don't think the SC's permission means we must make the change in 3.11. A 9x performance regression is very severe and doesn't seem worth the minor benefit of disallowing Color.RED.GREEN.

@gpshead
Copy link
Member

gpshead commented Jun 25, 2022

Agreed, the existing checked in solution to getting rid of Color.RED.BLUE behavior wart makes the default case performance in all code worse. So we shouldn't "fix" the wart that way. Within 3.11-beta the best thing to do is revert that change.

Finding a well performing attribute of an attribute wart fix remains for a future version, if it ever gets fixed.

@warsaw
Copy link
Member

warsaw commented Jun 25, 2022

The SC permission pertains to changes in repr(), str(), and format(), AFAICT. No mention is made of the the change in semantics of attribute lookup and performance.
@warsaw Could you confirm?

That SC exception very specifically was for removing Color.RED.BLUE in 3.11 without a deprecation warning in the code. At the time, there was no discussion about any potential performance impact.

The SC rejection notice for PEP 663 had this to say about str() and format():

One aspect of the PEP we do agree with is the alignment of IntEnum’s str() and format(). It is confusing that they give different results, and that is worth the small compatibility breakage to fix. We are uncertain whether it’s better to change str to be more like format or vice versa. Our recommendation is to take this discussion to python-dev and see if consensus on this topic can be reached.

This one's a little less definitive (others on that edition of the SC can also chime in; this was a unanimous decision and agreed upon communication). This was saying that we thought it would be good to align IntEnum's str() and format() output in 3.11, although we urged consensus building on python-dev in order to decide how exactly to resolve that misalignment. We thought it was worth a compatibility break (implied: without warning or transition period) to align those in 3.11.

But I don't think the SC's permission means we must make the change in 3.11. A 9x performance regression is very severe and doesn't seem worth the minor benefit of disallowing Color.RED.GREEN.

Personally, I agree with this, but I don't have a vote on the SC any more. I would argue to not incur the performance loss in 3.11 and take the time to remove the wart performantly in 3.12, if that's even possible.

@ethanfurman
Copy link
Member

@markshannon How did you measure the performance? If you give me the same steps so I have something to work towards I'll try to remedy the performance impact.

Out of curiosity, are regular Python properties slower, or not speeding up, with the work being done? The new Enum members are basically properties, so I'm a bit surprised they're so much slower

@markshannon
Copy link
Member Author

Using the script above building with ./configure (non-debug, no-pgo, no-lto build) for 3.10 and 3.11 on ubuntu linux.

@markshannon
Copy link
Member Author

We haven't sped up regular properties up for 3.11, but have for 3.12. #93912

@markshannon
Copy link
Member Author

Could someone point me to a reference explaining why accessing one enum value from another is considered a wart?

I'm curious what is wrong with code like if state == state.START: or while state != state.END: that one might write in generic state machine code.

@AlexWaygood
Copy link
Member

Could someone point me to a reference explaining why accessing one enum value from another is considered a wart?

It can lead to confusing behaviour, as is detailed in the documentation here:

>>> from enum import Enum
>>> class FieldTypes(Enum):
...     name = 0
...     value = 1
...     size = 2
...
>>> FieldTypes.value.size
<FieldTypes.size: 2>
>>> FieldTypes.size.value
2

@markshannon
Copy link
Member Author

That a.b.c != a.c.b is only surprising if you expect the . operator to be commutative for some reason.
It would be better to recommend avoid using value as an enum member, IMO. Possibly even rejecting value at class creation time.

If you pick silly enough names, weird things will happen.

>>> class FieldTypes(Enum):
...     __class__ = 0

ethanfurman added a commit that referenced this issue Jul 18, 2022
gpshead pushed a commit that referenced this issue Jul 18, 2022
@gpshead
Copy link
Member

gpshead commented Jul 18, 2022

Thanks for the revert PRs! ❤️

When Enum was first introduced in 3.4, member.member access was prohibited and I want to restore that behavior at some point.

While I understand the desire not to have that behavior, what 3.4 had is no longer relevant. People are writing code to use actual observed behaviors of 3.6-3.10 today. So we shouldn't change this without a 2 release deprecation warning cycle, and if either adding a warning or changing the behavior cause a noteworthy performance regression in normal cases... That is reason enough to never change it and just live with the "maybe unusual depending on how you look at" it behavioral oddity.

@ethanfurman
Copy link
Member

@markshannon wrote:

has wrecked the performance of accessing value, which was already terrible.

__getattribute__ is now gone. Looking more at value and member access, they have (unsurprisingly) the same performance level, while the builtin property is only slightly slower than a normal class attribute -- in other words, any custom descriptor written in Python is going to have terrible performance.

Is there any hope for such descriptors?

@JelleZijlstra
Copy link
Member

This issue still has the "deferred blocker" label. Is there any remaining problem that warrants that label? As far as I can see, 3.11 is now in a good state. Any possible future changes should go into 3.12 and be discussed in a separate issue.

@Fidget-Spinner
Copy link
Member

__getattribute__ is now gone. Looking more at value and member access, they have (unsurprisingly) the same performance level, while the builtin property is only slightly slower than a normal class attribute -- in other words, any custom descriptor written in Python is going to have terrible performance.

Is there any hope for such descriptors?

Yes. In 3.12 we specialize for instance, class, and builtin property attribute accesses. Without which property would be very slow. I'm working on __getattribute__ and __getattr__ specialisations that will speed some others up. Descriptors will always be slower though.

@brandtbucher
Copy link
Member

brandtbucher commented Jul 18, 2022

Is there any hope for such descriptors?

I believe we could specialize for these using a similar technique to property if necessary (cache the descriptor lookup on the class, cache the __get__ lookup on the descriptor, and inline the __get__ call). I think the main things blocking this are:

  • Custom descriptors are quite rare.
  • The obvious implementation would require allocating larger inline caches (at least an additional 2-4 bytes) at the site of every single attribute or method load, which is a lot of wasted memory for the common case.

I personally think that our efforts are better spent writing specializations for things like classmethod, staticmethod, and metaclass property lookups, which are much more common and don't require additional cache space.

@markshannon
Copy link
Member Author

@ethanfurman

Is there any hope for such descriptors?

Yes, if you keep them simple enough.

@property
def value(self):
    return self._value_

will give good enough performance for 3.12 as we specialize property accesses.
Later versions will have larger region optimizations allowing us to inline the property call (3.14 at the earliest).

@steffann
Copy link

I just wish to express my gratitude for how professional this community works on these issues (both backwards compatibility and performance). THANK YOU!

@markshannon
Copy link
Member Author

Accessing attributes (Color.RED) using the code from #93910 (comment) we have:

Version Enum "Fast Enum" Normal class
3.10.5+ 3.22 1.08 1.10
3.11.0b4+ 3.15 1.00 1.00
3.12.0a0 3.04 0.97 0.30

We expect accessing attributes of enums to be as fast as "normal" class attributes in 3.12, see #95004, about x10 as fast as for 3.11.

Accessing self.value relative to an attribute of a normal class:

Version Time
3.10 ~7
3.11/main ~12
3.12 (projected) ~4

The number for 3.12 assumes that value is implemented as a simple property.

I'm happy to close this for 3.11 if the current performance is acceptable and focus on fixing performance for 3.12.

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Jul 19, 2022

Custom descriptors are quite rare.

They are rare in user code but lots of libraries use descriptors. It would be worth analyzing the use of descriptors in pyperformance but note that pyperformance currently underrepresents some workloads.

The obvious implementation would require allocating larger inline caches (at least an additional 2-4 bytes) at the site of every single attribute or method load, which is a lot of wasted memory for the common case.

I agree that it would require more inline cache space but at a high level in a real application, the memory used by bytecode is nothing compared to the memory used by strings, tuples, lists etc and user created objects so IMO we should focus on reducing memory use of these objects rather than the already small bytecode.

@kumaraditya303
Copy link
Contributor

I personally think that our efforts are better spent writing specializations for things like classmethod, staticmethod, and metaclass property lookups, which are much more common and don't require additional cache space.

If we optimize for all Python descriptors then we can implement these descriptors in pure Python and that would benefit both custom descriptors and classmethod, staticmethod...

@pablogsal pablogsal removed the 3.11 bug and security fixes label Aug 5, 2022
@davfsa
Copy link

davfsa commented Oct 11, 2022

Would like to chime in to the conversation and share some of my personal experience.

Performance for enum and intenum used to be quite bad back in 3.8 until an effort was made to push for greater performance by the Faster CPython group (greatly appreciated btw!). This lead us to create a more performant implementation of the CPython enum that kept the same semantics while gaining performance and implementing some strictness parameters that we wanted.

This lead us to creating this https://github.com/hikari-py/hikari/blob/master/hikari/internal/enums.py#L38-L297 which gave a massive performance compared to 3.8.

Results (3.8)
BasicPyEnum.__call__('25') 0.34792722999918624 µs
BasicHikariEnum.__call__('25') 0.09627478299989889 µs
BasicPyEnum._value2member_map_['25'] 0.08696765999957279 µs
BasicHikariEnum._value_to_member_map['25'] 0.03552190700065694 µs
BasicPyEnum.__getitem__['z'] 0.13542789199891558 µs
BasicHikariEnum.__getitem__['z'] 0.07373986300081015 µs
BasicPyEnum.__call__ profile
         2000003 function calls in 0.596 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.199    0.199    0.596    0.596 <string>:1(<module>)
  1000000    0.228    0.000    0.397    0.000 enum.py:358(__call__)
  1000000    0.169    0.000    0.169    0.000 enum.py:670(__new__)
        1    0.000    0.000    0.596    0.596 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


BasicHikariEnum.__call__ profile
         1000003 function calls in 0.224 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.152    0.152    0.224    0.224 <string>:1(<module>)
  1000000    0.072    0.000    0.072    0.000 enums.py:103(__call__)
        1    0.000    0.000    0.224    0.224 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


BasicPyEnum.__getitem__ profile
         1000003 function calls in 0.244 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.129    0.129    0.244    0.244 <string>:1(<module>)
  1000000    0.115    0.000    0.115    0.000 enum.py:431(__getitem__)
        1    0.000    0.000    0.244    0.244 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


BasicHikariEnum.__getitem__ profile
         1000003 function calls in 0.197 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.130    0.130    0.197    0.197 <string>:1(<module>)
  1000000    0.067    0.000    0.067    0.000 enums.py:111(__getitem__)
        1    0.000    0.000    0.197    0.197 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

After running into this issue, I tried running the same performance benchmark but against 3.12 and was quite pleased to find that the performance is almost matched 1-1.

Results (3.12)
BasicPyEnum.__call__('25') 0.4155333540002175 µs
BasicHikariEnum.__call__('25') 0.2095957300007285 µs
BasicPyEnum._value2member_map_['25'] 0.06382369200036919 µs
BasicHikariEnum._value_to_member_map['25'] 0.06354223100061063 µs
BasicPyEnum.__getitem__['z'] 0.12423047000083898 µs
BasicHikariEnum.__getitem__['z'] 0.12400906000038958 µs
BasicPyEnum.__call__ profile
         2000003 function calls in 1.447 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.712    0.712    1.447    1.447 <string>:1(<module>)
  1000000    0.261    0.000    0.261    0.000 enum.py:1051(__new__)
  1000000    0.473    0.000    0.734    0.000 enum.py:675(__call__)
        1    0.000    0.000    1.447    1.447 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


BasicHikariEnum.__call__ profile
         1000003 function calls in 0.810 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.656    0.656    0.810    0.810 <string>:1(<module>)
  1000000    0.155    0.000    0.155    0.000 enums.py:103(__call__)
        1    0.000    0.000    0.810    0.810 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


BasicPyEnum.__getitem__ profile
         1000003 function calls in 0.658 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.519    0.519    0.658    0.658 <string>:1(<module>)
  1000000    0.139    0.000    0.139    0.000 enum.py:749(__getitem__)
        1    0.000    0.000    0.658    0.658 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


BasicHikariEnum.__getitem__ profile
         1000003 function calls in 0.670 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.529    0.529    0.670    0.670 <string>:1(<module>)
  1000000    0.141    0.000    0.141    0.000 enums.py:111(__getitem__)
        1    0.000    0.000    0.670    0.670 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

We did manage to keep the same schematics as was seen in 3.8 and that @markshannon pointed out in the beginning of this issue:

Python 3.12.0a0 (heads/main-dirty:e0ae9dd, Oct 11 2022, 10:38:47) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hikari
>>> 
>>> # The following is just an example of an enum used in the library
>>> hikari.GuildFeature
<enum GuildFeature>
>>> hikari.GuildFeature.__dict__["VERIFIED"] is hikari.GuildFeature.VERIFIED
True
>>> hikari.GuildFeature.VERIFIED.VERIFIED
<GuildFeature.VERIFIED: 'VERIFIED'>
>>> 

Would there be any interest in me opening a PR to port our implementation of enum to CPython? I will of course make sure that all standard CPython functionality is maintained!

"Benchmarking" script used: https://github.com/hikari-py/hikari/blob/master/scripts/enum_benchmark.py


Also wanted to leave a note to further thank the Faster CPython project. They managed to not only improve the performance of IntFlag, but even outperform ours in some cases! Really really appreciate the efforts being made here

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Oct 11, 2022

@davfsa I don't understand the results. Don't they show that almost everything slowed down?

Also you're running with cProfile. That should slow down. Almost none of the faster cpython optimizations are enabled when cProfile is enabled. If you really want to see speedups, don't use cProfile and instead time with something else (pyperf or time.perf_counter.)

@davfsa
Copy link

davfsa commented Oct 11, 2022

@davfsa I don't understand the results. Don't they show that almost everything slowed down?

The main focus of the results are this part (taken off the 3.8 example):

BasicPyEnum.__call__('25') 0.34792722999918624 µs
BasicHikariEnum.__call__('25') 0.09627478299989889 µs
BasicPyEnum._value2member_map_['25'] 0.08696765999957279 µs
BasicHikariEnum._value_to_member_map['25'] 0.03552190700065694 µs
BasicPyEnum.__getitem__['z'] 0.13542789199891558 µs
BasicHikariEnum.__getitem__['z'] 0.07373986300081015 µs

Where BasicPyEnum is the pure CPython implementation and BasicHikariEnum ours. They were obtained using timeit.timeit, which I thought would be enough for a basic benchmark (this was just a simple dirty script we made to test it back in the day).

Also you're running with cProfile. That should slow down. Almost none of the faster cpython optimizations are enabled when cProfile is enabled. If you really want to see speedups, don't use cProfile and instead time with something else (pyperf or time.perf_counter.)

cProfile runs are made after the timeit.timeit ones occur. They were used in the past to detect a little bottle neck in our implementation. Should have probably trimmed that part out for this posts sake. My bad. Removing the import and the code using cProfile yields the same results.


EDIT: Forgot to mention, but Ill re-make the script making use of pyperf to get more accurate data and post it here

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Oct 11, 2022

@davfsa sorry I didn't read the script carefully. I think timeit is good enough for this (you don't need to re-run with pyperf). Thank you for taking the time to report your results to us.

Yes that was the part I was reading. Yet I'm still confused because it seems things slowed down. Maybe I'm too jetlagged to be reading numbers at the moment and I'm making a fool out of myself.

@davfsa
Copy link

davfsa commented Oct 11, 2022

I reran the benchmarks using pyperf and noticed that they were considerably slower to 3.8 (as you had correctly noticed), but then realized that I forgot to build python with PGO enabled, which I later did, and you can find the updated benchmarks bellow (thanks for the pyperf suggestion, it makes it much easier to read the results!)

Script
import pyperf

setup_basic_enum = [
    "import enum",
    "class BasicPyEnum(str, enum.Enum):",
    '    a = "0"',
    '    b = "1"',
    '    c = "2"',
    '    d = "3"',
    '    e = "4"',
    '    f = "5"',
    '    g = "6"',
    '    h = "7"',
    '    i = "8"',
    '    j = "9"',
    '    k = "10"',
    '    l = "11"',
    '    m = "12"',
    '    n = "13"',
    '    o = "14"',
    '    p = "15"',
    '    q = "16"',
    '    r = "17"',
    '    s = "18"',
    '    t = "19"',
    '    u = "20"',
    '    v = "21"',
    '    w = "22"',
    '    x = "23"',
    '    y = "24"',
    '    z = "25"',
]

setup_hikari_enum = [
    "from hikari.internal import enums",
    "class BasicHikariEnum(str, enums.Enum):",
    '    a = "0"',
    '    b = "1"',
    '    c = "2"',
    '    d = "3"',
    '    e = "4"',
    '    f = "5"',
    '    g = "6"',
    '    h = "7"',
    '    i = "8"',
    '    j = "9"',
    '    k = "10"',
    '    l = "11"',
    '    m = "12"',
    '    n = "13"',
    '    o = "14"',
    '    p = "15"',
    '    q = "16"',
    '    r = "17"',
    '    s = "18"',
    '    t = "19"',
    '    u = "20"',
    '    v = "21"',
    '    w = "22"',
    '    x = "23"',
    '    y = "24"',
    '    z = "25"',
]


runner = pyperf.Runner()


runner.timeit(name="BasicPyEnum.__call__('25')", stmt="BasicPyEnum('25')", setup=setup_basic_enum)
runner.timeit(name="BasicHikariEnum.__call__('25')", stmt="BasicHikariEnum('25')", setup=setup_hikari_enum)


runner.timeit(
    name="BasicPyEnum._value2member_map_['25']",
    stmt="BasicPyEnum._value2member_map_['25']",
    setup=setup_basic_enum,
)
runner.timeit(
    name="BasicHikariEnum._value_to_member_map_['25']",
    stmt="BasicHikariEnum._value_to_member_map_['25']",
    setup=setup_hikari_enum,
)

runner.timeit(
    name="BasicPyEnum.__getitem__['z']",
    stmt="BasicPyEnum['z']",
    setup=setup_basic_enum,
)
runner.timeit(
    name="BasicHikariEnum.__getitem__['z']",
    stmt="BasicHikariEnum['z']",
    setup=setup_hikari_enum,
)
3.8 using pyperf
.....................
BasicPyEnum.__call__('25'): Mean +- std dev: 340 ns +- 4 ns
.....................
BasicHikariEnum.__call__('25'): Mean +- std dev: 87.2 ns +- 2.6 ns
.....................
BasicPyEnum._value2member_map_['25']: Mean +- std dev: 66.1 ns +- 4.1 ns
.....................
BasicHikariEnum._value_to_member_map_['25']: Mean +- std dev: 26.6 ns +- 1.2 ns
.....................
BasicPyEnum.__getitem__['z']: Mean +- std dev: 114 ns +- 3 ns
.....................
BasicHikariEnum.__getitem__['z']: Mean +- std dev: 65.6 ns +- 1.9 ns
3.12 using pyperf
.....................
BasicPyEnum.__call__('25'): Mean +- std dev: 202 ns +- 6 ns
.....................
BasicHikariEnum.__call__('25'): Mean +- std dev: 99.5 ns +- 2.3 ns
.....................
BasicPyEnum._value2member_map_['25']: Mean +- std dev: 23.8 ns +- 1.1 ns
.....................
BasicHikariEnum._value_to_member_map_['25']: Mean +- std dev: 24.2 ns +- 1.3 ns
.....................
BasicPyEnum.__getitem__['z']: Mean +- std dev: 51.6 ns +- 1.4 ns
.....................
BasicHikariEnum.__getitem__['z']: Mean +- std dev: 52.9 ns +- 3.0 ns

@markshannon
Copy link
Member Author

What about the performance of attribute lookups? That is the one measure that really matters.

@Yhg1s
Copy link
Member

Yhg1s commented Oct 11, 2022

The performance of defining enums also matters, in particular for the startup time of programs.

@davfsa
Copy link

davfsa commented Oct 11, 2022

I'll get performance benchmarks for those two as soon as possible

@ethanfurman
Copy link
Member

ethanfurman commented Oct 11, 2022

We did manage to keep the same schematics as was seen in 3.8 and that @markshannon pointed out in the beginning of this issue:

Python 3.12.0a0 (heads/main-dirty:e0ae9dd, Oct 11 2022, 10:38:47) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hikari
>>> 
>>> # The following is just an example of an enum used in the library
>>> hikari.GuildFeature
<enum GuildFeature>
>>> hikari.GuildFeature.__dict__["VERIFIED"] is hikari.GuildFeature.VERIFIED
True

The __dict__ access truism is an implementation detail, and does not need to persist in the new version.

>>> hikari.GuildFeature.VERIFIED.VERIFIED
<GuildFeature.VERIFIED: 'VERIFIED'>
>>> 

The VERIFIED.VERIFIED behavior is not desired and will raise a DeprecationWarning in 3.12.

If your implementation passes the 3.12 test suite and is faster in attribute lookup and enum creation time, a PR would be welcome.

@davfsa
Copy link

davfsa commented Oct 12, 2022

@markshannon @Yhg1s Here are your requested benchmarks :)

Script
import pyperf

import_basic_enum = "import enum"
import_hikari_enum = "from hikari.internal import enums"

create_basic_enum = [
    "class BasicPyEnum(str, enum.Enum):",
    '    a = "0"',
    '    b = "1"',
    '    c = "2"',
    '    d = "3"',
    '    e = "4"',
    '    f = "5"',
    '    g = "6"',
    '    h = "7"',
    '    i = "8"',
    '    j = "9"',
    '    k = "10"',
    '    l = "11"',
    '    m = "12"',
    '    n = "13"',
    '    o = "14"',
    '    p = "15"',
    '    q = "16"',
    '    r = "17"',
    '    s = "18"',
    '    t = "19"',
    '    u = "20"',
    '    v = "21"',
    '    w = "22"',
    '    x = "23"',
    '    y = "24"',
    '    z = "25"',
]

create_hikari_enum = [
    "class BasicHikariEnum(str, enums.Enum):",
    '    a = "0"',
    '    b = "1"',
    '    c = "2"',
    '    d = "3"',
    '    e = "4"',
    '    f = "5"',
    '    g = "6"',
    '    h = "7"',
    '    i = "8"',
    '    j = "9"',
    '    k = "10"',
    '    l = "11"',
    '    m = "12"',
    '    n = "13"',
    '    o = "14"',
    '    p = "15"',
    '    q = "16"',
    '    r = "17"',
    '    s = "18"',
    '    t = "19"',
    '    u = "20"',
    '    v = "21"',
    '    w = "22"',
    '    x = "23"',
    '    y = "24"',
    '    z = "25"',
]

full_basic_setup = [import_basic_enum, *create_basic_enum]
full_hikari_setup = [import_hikari_enum, *create_hikari_enum]

runner = pyperf.Runner()


runner.timeit(
    name="BasicPyEnum creation",
    stmt=create_basic_enum,
    setup=import_basic_enum,
)
runner.timeit(
    name="BasicHikariEnum creation",
    stmt=create_hikari_enum,
    setup=import_hikari_enum,
)

runner.timeit(
    name="BasicPyEnum.z",
    stmt="BasicPyEnum.z",
    setup=full_basic_setup,
)
runner.timeit(
    name="BasicHikariEnum.z",
    stmt="BasicHikariEnum.z",
    setup=full_hikari_setup,
)

runner.timeit(
    name="BasicPyEnum.__call__('25')",
    stmt="BasicPyEnum('25')",
    setup=full_basic_setup,
)
runner.timeit(
    name="BasicHikariEnum.__call__('25')",
    stmt="BasicHikariEnum('25')",
    setup=full_hikari_setup,
)


runner.timeit(
    name="BasicPyEnum._value2member_map_['25']",
    stmt="BasicPyEnum._value2member_map_['25']",
    setup=full_basic_setup,
)
runner.timeit(
    name="BasicHikariEnum._value_to_member_map_['25']",
    stmt="BasicHikariEnum._value_to_member_map_['25']",
    setup=full_hikari_setup,
)

runner.timeit(
    name="BasicPyEnum.__getitem__['z']",
    stmt="BasicPyEnum['z']",
    setup=full_basic_setup,
)
runner.timeit(
    name="BasicHikariEnum.__getitem__['z']",
    stmt="BasicHikariEnum['z']",
    setup=full_hikari_setup,
)
Results (3.12, main branch)
.....................
BasicPyEnum creation: Mean +- std dev: 179 us +- 3 us
.....................
BasicHikariEnum creation: Mean +- std dev: 39.5 us +- 0.6 us
.....................
BasicPyEnum.z: Mean +- std dev: 73.8 ns +- 1.1 ns
.....................
BasicHikariEnum.z: Mean +- std dev: 15.3 ns +- 0.5 ns
.....................
BasicPyEnum.__call__('25'): Mean +- std dev: 207 ns +- 6 ns
.....................
BasicHikariEnum.__call__('25'): Mean +- std dev: 99.5 ns +- 2.1 ns
.....................
BasicPyEnum._value2member_map_['25']: Mean +- std dev: 24.6 ns +- 1.3 ns
.....................
BasicHikariEnum._value_to_member_map_['25']: Mean +- std dev: 24.1 ns +- 1.2 ns
.....................
BasicPyEnum.__getitem__['z']: Mean +- std dev: 52.4 ns +- 1.9 ns
.....................
BasicHikariEnum.__getitem__['z']: Mean +- std dev: 52.3 ns +- 1.5 ns

Ill start working on a PR to add this to main :)

@davfsa
Copy link

davfsa commented Oct 21, 2022

Little update. Been working on porting it to CPython and ran into a bunch of issues at the beginning, so I decided to just rewrite the implementation and merge both ideas. This left the following performance benchmarks:

Results (3.12)
.....................
BasicPyEnum creation: Mean +- std dev: 73.4 us +- 1.3 us
.....................
BasicPyEnum.z: Mean +- std dev: 15.4 ns +- 0.3 ns
.....................
BasicPyEnum.__call__('25'): Mean +- std dev: 149 ns +- 6 ns
.....................
BasicPyEnum._value2member_map_['25']: Mean +- std dev: 24.4 ns +- 1.8 ns
.....................
BasicPyEnum.__getitem__['z']: Mean +- std dev: 51.9 ns +- 2.3 ns

Results like the one for __call__ im not overly happy with, but the issue is that its purely due to the functional api taking in so many arguments that causes the slow down (the * makes up for 40ns alone).

There is still some features that need to be implemented, as well as making sure it passes the test suite, so still slowly but surely working on that.

I also wanted to have a second look at the creation logic and hopefully make it faster. This might not be possible due to the checks that are in place.

parthea added a commit to googleapis/proto-plus-python that referenced this issue Jan 5, 2023
* Adjust to enum changes in Python 3.11.0b3

There are two changes:

Changes in the actual code:

 - _member_names changed from a list to a dict in python/cpython#28907
    - we instance-check and remove by list-specific or dict-specific way

Change in the tests only:

 - accessing other enum members via instance attributes is no longer possible
    - we access them via the class instead
    - we leave the original test in a try-except block

Some of the Python enum changes might get reverted,
see python/cpython#93910
But the fix is backwards compatible.

Fixes #326

* ci: unit test session with python 3.11.0-beta.3

* ci: add python v3.11.0-beta.3 to noxfile.py

* another attempt to get python 3.11.0b3 working in github actions

* ci: use python 3.8 for docs check

* ci: fix docs build

* fix ci

* mark python 3.11 tests as required

* add python 3.11 to setup.py

* fix docs build

* remove python 3.11 test for unitcpp

* remove python 3.11 test for unitcpp

* remove python 3.11 test for unitcpp

* attempt to fix exclude in github action

Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 new features, bug and security fixes deferred-blocker performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
Status: In Progress
Development

No branches or pull requests