New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-91576: Speed up iteration of strings #91574
Conversation
Happy to help review this, let me know when you're ready |
@JelleZijlstra Finished. |
That would add one more branch instruction and I was trying to avoid it and LATIN1 is rare compared to ASCII. |
Given that this is a fixed cost (once per iterator construction) I think the extra branch won't be noticeable. Latin-1 may be rare compared to ASCII but it's still got some common characters and it would be essentially free. |
No, the cost is a branch instruction on each iteration as ascii and latin1 uses different structures. |
Hm, couldn't you just store a pointer to the array of bytes (and another to the end) rather than an index? Or is it possible that the bytes move around somehow? |
See the LATIN1 macro in unicodeobject.c. |
It requires a check if ch is less than 128 then it uses a different array to index depending on the comparison. |
How does this affect performance when ascii and non-ascii are mixed together in the same string? |
Oh, I see. That's a bit unfortunate but I see your point and I guess ASCII strings are somewhat special anyways.
In that case the representation of the whole string will not use the "compact ASCII" format and we'll be using the regular (slow) iterator. @kumaraditya303 Please address the other review comments. |
Added some tests and addressed comments. |
If you want to schedule another build, you need to add the " |
If you want to schedule another build, you need to add the " |
Benchmark Script:
Results:
Closes #91576