New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing FOR_ITER bytecode #91432
Comments
What about?
|
Also, we would like to decouple the sense (jumps on exhaustion like |
If you only want one sense of jump, and to avoid the extra backwards branch, this should work (provided we have both a forward and backward version)
|
What is the advantage of having a forward version or a jump-if-exhausted version when the vast majority of executions of the opcode will be (backwards, jump-if-not-exhausted)? Is that symmetry worth the extra code in ceval loop? |
I proposed a similar idea (opcode I have not implemented it because:
|
What if we specialized Then we get the same old readable unquickened bytecode but reduce the number of dispatches per for-loop for quickened bytecode? It would add cache entries to |
This comment was marked as outdated.
This comment was marked as outdated.
After merging up with main and re-running everything, I get these results: Microbenchmarks:
PyperformanceAll together in one table:
Just looking at the FOR_ITER specialization:FOR_ITER specialization:
Just looking at the JUMP_BACKWARD specialization:JUMP_BACKWARD specialization:
Comparing the two:Specializing FOR_ITER --> Specializing JUMP_BACKWARD
|
Changing the issue title to be more general and include specializations without any FOR_END. |
3.11 beta has much slower for loop with range then 3.10 or 3.9 this piece of code:
executes 20% slower with 3.11-dev, 3.11.0b1 as well as 3.11.0b3 I am not sure if this is related to above discussion |
@gahabana No changes have been made relating to this issue, so please open a new issue about a particular performance regression. Make sure to describe your operating system and how you installed python. |
Some more microbenchmark (empty for-loop) results for other
I collected opcode stats here a couple months ago, and I can't imagine much has changed since then. If we manage to merge some version of #91713 (specializing list and range), I would think to do dict items and enumerate next. |
* Adds FOR_ITER_LIST and FOR_ITER_RANGE specializations. * Adds _PyLong_AssignValue() internal function to avoid temporary boxing of ints.
We can execute one less opcode in every iteration (except for the first) of every for-loop by changing from
to
This was suggested by @markshannon here, but it appears to be similar in spirit to Loop Inversion.
There seems to be a small (on the order of 1%) benefit, but I imagine the benefit will be magnified after any specialization.
The text was updated successfully, but these errors were encountered: