New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower string concatenation in CPython 3.11 #99862
Comments
Python 3.11 has a specialized (a la PEP 659) instruction called Lines 2064 to 2098 in 5bbf8ed
However, this optimization was restricted to only local variables within a function, not global/module-scope variables. Does the regression go away if you put the code in a function? def f():
a = ''
for _ in range(1000000):
a += 'a' |
Indeed, this is significantly faster when the code is put in a function: 90 ms in a function in CPython 3.11.0 while it takes 140 ms in CPython 3.10.8. Thank you for this information. Thus, the problem is only for global variables. Solving this could be useful for people working directly in an interactive interpreter console (like IPython). |
that code is quadratic and the only reason is not quadratic in reality is because we do some hacks in the eval loop to reuse the same string, but by the fact that strings are immutable it keeps being quadratic in nature. As I see it, there is not a lot of reason to spend a lot of resources to speed up this particular code, especially if the regression involves globals. The correct idiom for this should be:
I am inclined to suggest to close this as "won't fix" unless someone thinks strongly otherwise |
Closing this issue is fine with me as well, but I was surprised to find out global/module level code can be slower in 3.11 than local code. Is the behaviour documented somewhere? I could not find it in the 3.11 release notes or in https://github.com/python/cpython/blob/main/Python/adaptive.md. |
There are many situations where globals are slower than locals, due to the nature of locals. This is nothing new. (Update: I initially wrote faster, should be slower.) |
zephyr111 commentedNov 29, 2022
Hello,
We have found a regression between CPython 3.10.8 and CPython 3.11 resulting in string concatenation to be significantly slower in loops on Windows 10. This is described in details in this StackOverflow post.
Here is a minimal, reproducible example of benchmarking code:
CPython 3.11.0 is about 100 times slower than CPython 3.10.8 due to a quadratic running time (as opposed to a linear running time for CPython 3.10.8).
The analysis shows that CPython 3.10.8 was generating an INPLACE_ADD instruction so
PyUnicode_Append
is called at runtime, while CPython 3.11.0 new generates a BINARY_OP instruction soPyUnicode_Concat
is actually called. The later function creates a new bigger string reducing drastically the performance of the string appending loop in the provided code. This appears to be related to the issue #89799 . I think if we want to replace INPLACE_ADD with a BINARY_OP, then an optimization checking the number of references (so to eventually do an in-place operation) is missing in the code of CPython 3.11.0. What do you think about it?My environment is an embedded CPython 3.10.8 and an embedded CPython 3.11.0, both running on Windows 10 (22H2) with a x86-64 processor (i5-9600KF).
The text was updated successfully, but these errors were encountered: