New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in semantics and much worse performance for enum members. #93910
Comments
Those changes are intentional, and remedy an issue present since 3.5. As for performance, the work @markshannon and others have done has already had a considerable impact on making enum faster in 3.12. |
What issue does the change remedy? I'm puzzled by your claim that the work we've done has sped up enums for 3.12? It hasn't. Without c314e60, it would have. |
It looks like the change in What I don't understand is why this change is necessary at all. |
The rudimentary timings I did on my system for member access showed enums steadily getting faster over the releases, with a big jump in performance between 3.11 and 3.12. The Enums are special, and this change brings them back into line with the original, intended, specification. |
The performance of enum lookup is much slower than it should be in 3.11. The SC permission pertains to changes in repr(), str(), and format(), AFAICT. No mention is made of the the change in semantics of attribute lookup and performance. @warsaw Could you confirm? |
I'm seeing an approx x9 slowdown on attribute access. class Color(Enum):
RED = "Red"
BLUE = "Blue"
GREEN = "Green"
class FastColor:
RED = Color.RED
BLUE = Color.BLUE
GREEN = Color.GREEN
def f():
for _ in range(1000):
Color.RED
Color.BLUE
Color.GREEN
import timeit
print(timeit.timeit('f()', number=10000, globals=globals()))
Color = FastColor
print(timeit.timeit('f()', number=10000, globals=globals())) With an optimized, but non pgo, no lto, build.
|
OOI, why is it necessary that code like |
I am marking this as a release blocker so we can discuss this properly. If we don't reach consensus quickly, please, let's send an email to python-dev so we can collectively reach an agreement. |
For the record, in #87328 (comment) @ethanfurman said:
This was not followed, for reasons I don't understand. This once again breaks real projects, e.g. googleapis/proto-plus-python#326 |
The permission was granted in python/steering-council#80 (linked in the last line of the linked message). But I don't think the SC's permission means we must make the change in 3.11. A 9x performance regression is very severe and doesn't seem worth the minor benefit of disallowing |
Agreed, the existing checked in solution to getting rid of Finding a well performing attribute of an attribute wart fix remains for a future version, if it ever gets fixed. |
That SC exception very specifically was for removing The SC rejection notice for PEP 663 had this to say about
This one's a little less definitive (others on that edition of the SC can also chime in; this was a unanimous decision and agreed upon communication). This was saying that we thought it would be good to align
Personally, I agree with this, but I don't have a vote on the SC any more. I would argue to not incur the performance loss in 3.11 and take the time to remove the wart performantly in 3.12, if that's even possible. |
@markshannon How did you measure the performance? If you give me the same steps so I have something to work towards I'll try to remedy the performance impact. Out of curiosity, are regular Python properties slower, or not speeding up, with the work being done? The new Enum members are basically properties, so I'm a bit surprised they're so much slower |
Using the script above building with |
We haven't sped up regular properties up for 3.11, but have for 3.12. #93912 |
Could someone point me to a reference explaining why accessing one enum value from another is considered a wart? I'm curious what is wrong with code like |
It can lead to confusing behaviour, as is detailed in the documentation here: >>> from enum import Enum
>>> class FieldTypes(Enum):
... name = 0
... value = 1
... size = 2
...
>>> FieldTypes.value.size
<FieldTypes.size: 2>
>>> FieldTypes.size.value
2 |
That If you pick silly enough names, weird things will happen. >>> class FieldTypes(Enum):
... __class__ = 0 |
Thanks for the revert PRs!
While I understand the desire not to have that behavior, what 3.4 had is no longer relevant. People are writing code to use actual observed behaviors of 3.6-3.10 today. So we shouldn't change this without a 2 release deprecation warning cycle, and if either adding a warning or changing the behavior cause a noteworthy performance regression in normal cases... That is reason enough to never change it and just live with the "maybe unusual depending on how you look at" it behavioral oddity. |
@markshannon wrote:
Is there any hope for such descriptors? |
This issue still has the "deferred blocker" label. Is there any remaining problem that warrants that label? As far as I can see, 3.11 is now in a good state. Any possible future changes should go into 3.12 and be discussed in a separate issue. |
Yes. In 3.12 we specialize for instance, class, and builtin |
I believe we could specialize for these using a similar technique to
I personally think that our efforts are better spent writing specializations for things like |
Yes, if you keep them simple enough. @property
def value(self):
return self._value_ will give good enough performance for 3.12 as we specialize property accesses. |
I just wish to express my gratitude for how professional this community works on these issues (both backwards compatibility and performance). THANK YOU! |
Accessing attributes (
We expect accessing attributes of enums to be as fast as "normal" class attributes in 3.12, see #95004, about x10 as fast as for 3.11. Accessing
The number for 3.12 assumes that I'm happy to close this for 3.11 if the current performance is acceptable and focus on fixing performance for 3.12. |
They are rare in user code but lots of libraries use descriptors. It would be worth analyzing the use of descriptors in pyperformance but note that pyperformance currently underrepresents some workloads.
I agree that it would require more inline cache space but at a high level in a real application, the memory used by bytecode is nothing compared to the memory used by strings, tuples, lists etc and user created objects so IMO we should focus on reducing memory use of these objects rather than the already small bytecode. |
If we optimize for all Python descriptors then we can implement these descriptors in pure Python and that would benefit both custom descriptors and |
Would like to chime in to the conversation and share some of my personal experience. Performance for enum and intenum used to be quite bad back in 3.8 until an effort was made to push for greater performance by the Faster CPython group (greatly appreciated btw!). This lead us to create a more performant implementation of the CPython enum that kept the same semantics while gaining performance and implementing some strictness parameters that we wanted. This lead us to creating this https://github.com/hikari-py/hikari/blob/master/hikari/internal/enums.py#L38-L297 which gave a massive performance compared to 3.8. Results (3.8)
After running into this issue, I tried running the same performance benchmark but against 3.12 and was quite pleased to find that the performance is almost matched 1-1. Results (3.12)
We did manage to keep the same schematics as was seen in 3.8 and that @markshannon pointed out in the beginning of this issue: Python 3.12.0a0 (heads/main-dirty:e0ae9dd, Oct 11 2022, 10:38:47) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hikari
>>>
>>> # The following is just an example of an enum used in the library
>>> hikari.GuildFeature
<enum GuildFeature>
>>> hikari.GuildFeature.__dict__["VERIFIED"] is hikari.GuildFeature.VERIFIED
True
>>> hikari.GuildFeature.VERIFIED.VERIFIED
<GuildFeature.VERIFIED: 'VERIFIED'>
>>> Would there be any interest in me opening a PR to port our implementation of enum to CPython? I will of course make sure that all standard CPython functionality is maintained! "Benchmarking" script used: https://github.com/hikari-py/hikari/blob/master/scripts/enum_benchmark.py Also wanted to leave a note to further thank the Faster CPython project. They managed to not only improve the performance of IntFlag, but even outperform ours in some cases! Really really appreciate the efforts being made here |
@davfsa I don't understand the results. Don't they show that almost everything slowed down? Also you're running with cProfile. That should slow down. Almost none of the faster cpython optimizations are enabled when cProfile is enabled. If you really want to see speedups, don't use cProfile and instead time with something else ( |
The main focus of the results are this part (taken off the 3.8 example):
Where
cProfile runs are made after the EDIT: Forgot to mention, but Ill re-make the script making use of |
@davfsa sorry I didn't read the script carefully. I think timeit is good enough for this (you don't need to re-run with pyperf). Thank you for taking the time to report your results to us. Yes that was the part I was reading. Yet I'm still confused because it seems things slowed down. Maybe I'm too jetlagged to be reading numbers at the moment and I'm making a fool out of myself. |
I reran the benchmarks using pyperf and noticed that they were considerably slower to 3.8 (as you had correctly noticed), but then realized that I forgot to build python with PGO enabled, which I later did, and you can find the updated benchmarks bellow (thanks for the pyperf suggestion, it makes it much easier to read the results!) Scriptimport pyperf
setup_basic_enum = [
"import enum",
"class BasicPyEnum(str, enum.Enum):",
' a = "0"',
' b = "1"',
' c = "2"',
' d = "3"',
' e = "4"',
' f = "5"',
' g = "6"',
' h = "7"',
' i = "8"',
' j = "9"',
' k = "10"',
' l = "11"',
' m = "12"',
' n = "13"',
' o = "14"',
' p = "15"',
' q = "16"',
' r = "17"',
' s = "18"',
' t = "19"',
' u = "20"',
' v = "21"',
' w = "22"',
' x = "23"',
' y = "24"',
' z = "25"',
]
setup_hikari_enum = [
"from hikari.internal import enums",
"class BasicHikariEnum(str, enums.Enum):",
' a = "0"',
' b = "1"',
' c = "2"',
' d = "3"',
' e = "4"',
' f = "5"',
' g = "6"',
' h = "7"',
' i = "8"',
' j = "9"',
' k = "10"',
' l = "11"',
' m = "12"',
' n = "13"',
' o = "14"',
' p = "15"',
' q = "16"',
' r = "17"',
' s = "18"',
' t = "19"',
' u = "20"',
' v = "21"',
' w = "22"',
' x = "23"',
' y = "24"',
' z = "25"',
]
runner = pyperf.Runner()
runner.timeit(name="BasicPyEnum.__call__('25')", stmt="BasicPyEnum('25')", setup=setup_basic_enum)
runner.timeit(name="BasicHikariEnum.__call__('25')", stmt="BasicHikariEnum('25')", setup=setup_hikari_enum)
runner.timeit(
name="BasicPyEnum._value2member_map_['25']",
stmt="BasicPyEnum._value2member_map_['25']",
setup=setup_basic_enum,
)
runner.timeit(
name="BasicHikariEnum._value_to_member_map_['25']",
stmt="BasicHikariEnum._value_to_member_map_['25']",
setup=setup_hikari_enum,
)
runner.timeit(
name="BasicPyEnum.__getitem__['z']",
stmt="BasicPyEnum['z']",
setup=setup_basic_enum,
)
runner.timeit(
name="BasicHikariEnum.__getitem__['z']",
stmt="BasicHikariEnum['z']",
setup=setup_hikari_enum,
) 3.8 using pyperf
3.12 using pyperf
|
What about the performance of attribute lookups? That is the one measure that really matters. |
The performance of defining enums also matters, in particular for the startup time of programs. |
I'll get performance benchmarks for those two as soon as possible |
Python 3.12.0a0 (heads/main-dirty:e0ae9dd, Oct 11 2022, 10:38:47) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hikari
>>>
>>> # The following is just an example of an enum used in the library
>>> hikari.GuildFeature
<enum GuildFeature>
>>> hikari.GuildFeature.__dict__["VERIFIED"] is hikari.GuildFeature.VERIFIED
True The >>> hikari.GuildFeature.VERIFIED.VERIFIED
<GuildFeature.VERIFIED: 'VERIFIED'>
>>> The If your implementation passes the 3.12 test suite and is faster in attribute lookup and enum creation time, a PR would be welcome. |
@markshannon @Yhg1s Here are your requested benchmarks :) Scriptimport pyperf
import_basic_enum = "import enum"
import_hikari_enum = "from hikari.internal import enums"
create_basic_enum = [
"class BasicPyEnum(str, enum.Enum):",
' a = "0"',
' b = "1"',
' c = "2"',
' d = "3"',
' e = "4"',
' f = "5"',
' g = "6"',
' h = "7"',
' i = "8"',
' j = "9"',
' k = "10"',
' l = "11"',
' m = "12"',
' n = "13"',
' o = "14"',
' p = "15"',
' q = "16"',
' r = "17"',
' s = "18"',
' t = "19"',
' u = "20"',
' v = "21"',
' w = "22"',
' x = "23"',
' y = "24"',
' z = "25"',
]
create_hikari_enum = [
"class BasicHikariEnum(str, enums.Enum):",
' a = "0"',
' b = "1"',
' c = "2"',
' d = "3"',
' e = "4"',
' f = "5"',
' g = "6"',
' h = "7"',
' i = "8"',
' j = "9"',
' k = "10"',
' l = "11"',
' m = "12"',
' n = "13"',
' o = "14"',
' p = "15"',
' q = "16"',
' r = "17"',
' s = "18"',
' t = "19"',
' u = "20"',
' v = "21"',
' w = "22"',
' x = "23"',
' y = "24"',
' z = "25"',
]
full_basic_setup = [import_basic_enum, *create_basic_enum]
full_hikari_setup = [import_hikari_enum, *create_hikari_enum]
runner = pyperf.Runner()
runner.timeit(
name="BasicPyEnum creation",
stmt=create_basic_enum,
setup=import_basic_enum,
)
runner.timeit(
name="BasicHikariEnum creation",
stmt=create_hikari_enum,
setup=import_hikari_enum,
)
runner.timeit(
name="BasicPyEnum.z",
stmt="BasicPyEnum.z",
setup=full_basic_setup,
)
runner.timeit(
name="BasicHikariEnum.z",
stmt="BasicHikariEnum.z",
setup=full_hikari_setup,
)
runner.timeit(
name="BasicPyEnum.__call__('25')",
stmt="BasicPyEnum('25')",
setup=full_basic_setup,
)
runner.timeit(
name="BasicHikariEnum.__call__('25')",
stmt="BasicHikariEnum('25')",
setup=full_hikari_setup,
)
runner.timeit(
name="BasicPyEnum._value2member_map_['25']",
stmt="BasicPyEnum._value2member_map_['25']",
setup=full_basic_setup,
)
runner.timeit(
name="BasicHikariEnum._value_to_member_map_['25']",
stmt="BasicHikariEnum._value_to_member_map_['25']",
setup=full_hikari_setup,
)
runner.timeit(
name="BasicPyEnum.__getitem__['z']",
stmt="BasicPyEnum['z']",
setup=full_basic_setup,
)
runner.timeit(
name="BasicHikariEnum.__getitem__['z']",
stmt="BasicHikariEnum['z']",
setup=full_hikari_setup,
) Results (3.12, main branch)
Ill start working on a PR to add this to main :) |
Little update. Been working on porting it to CPython and ran into a bunch of issues at the beginning, so I decided to just rewrite the implementation and merge both ideas. This left the following performance benchmarks: Results (3.12)
Results like the one for There is still some features that need to be implemented, as well as making sure it passes the test suite, so still slowly but surely working on that. I also wanted to have a second look at the creation logic and hopefully make it faster. This might not be possible due to the checks that are in place. |
* Adjust to enum changes in Python 3.11.0b3 There are two changes: Changes in the actual code: - _member_names changed from a list to a dict in python/cpython#28907 - we instance-check and remove by list-specific or dict-specific way Change in the tests only: - accessing other enum members via instance attributes is no longer possible - we access them via the class instead - we leave the original test in a try-except block Some of the Python enum changes might get reverted, see python/cpython#93910 But the fix is backwards compatible. Fixes #326 * ci: unit test session with python 3.11.0-beta.3 * ci: add python v3.11.0-beta.3 to noxfile.py * another attempt to get python 3.11.0b3 working in github actions * ci: use python 3.8 for docs check * ci: fix docs build * fix ci * mark python 3.11 tests as required * add python 3.11 to setup.py * fix docs build * remove python 3.11 test for unitcpp * remove python 3.11 test for unitcpp * remove python 3.11 test for unitcpp * attempt to fix exclude in github action Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Given the enum:
In Python 3.9 and 3.10:
In Python 3.11:
While these might seem like minor semantic changes, there is also a large performance impact.
Lookup of
Colours.RED
is simple and efficient in 3.10, but involves a lot of indirection and dispatching through theenum.property
class in 3.11.The performance impact is likely to get worse in 3.12, as we optimize more kinds of attributes.
Introduced in c314e60, I believe.
@pablogsal
@ethanfurman
The text was updated successfully, but these errors were encountered: