gh-91576: Speed up iteration of strings #91574

kumaraditya303 · 2022-04-15T15:28:59Z

Benchmark Script:

from pyperf import Runner, perf_counter

def bench_str(loops, src, length):
    src = src * length
    t0 = perf_counter()
    for _ in range(loops):
        for i in src:
            pass
    return perf_counter() - t0

runner = Runner()
for src in ['a', 'é']:
    for n in [1_000, 10_000]:
        runner.bench_time_func(f"str {src} {n}", bench_str, src, n)

Results:

str a 1000: Mean +- std dev: [base] 9.72 us +- 0.62 us -> [patch] 6.54 us +- 1.05 us: 1.49x faster
str a 10000: Mean +- std dev: [base] 95.8 us +- 5.1 us -> [patch] 63.0 us +- 4.1 us: 1.52x faster
str é 1000: Mean +- std dev: [base] 10.3 us +- 1.6 us -> [patch] 8.60 us +- 0.56 us: 1.19x faster
str é 10000: Mean +- std dev: [base] 103 us +- 9 us -> [patch] 86.1 us +- 4.7 us: 1.19x faster

Geometric mean: 1.34x faster

Closes #91576

…er-str

Objects/object.c

JelleZijlstra · 2022-04-15T16:34:47Z

Happy to help review this, let me know when you're ready

kumaraditya303 · 2022-04-15T16:50:06Z

Happy to help review this, let me know when you're ready

@JelleZijlstra Finished.

gvanrossum

Why not use the specialized iteratie for all Latin-1 strings?

kumaraditya303 · 2022-04-15T17:02:18Z

Why not use the specialized iteratie for all Latin-1 strings?

That would add one more branch instruction and I was trying to avoid it and LATIN1 is rare compared to ASCII.

Objects/unicodeobject.c

gvanrossum

You should just be able to test

(PyUnicode_KIND((unicode)) == PyUnicode_1BYTE_KIND

to decide which iterator to create, right? Or can kind be changed (once the object is "ready")?

Objects/unicodeobject.c

gvanrossum · 2022-04-15T18:23:09Z

That would add one more branch instruction and I was trying to avoid it and LATIN1 is rare compared to ASCII.

Given that this is a fixed cost (once per iterator construction) I think the extra branch won't be noticeable. Latin-1 may be rare compared to ASCII but it's still got some common characters and it would be essentially free.

kumaraditya303 · 2022-04-15T18:42:16Z

Given that this is a fixed cost (once per iterator construction) I think the extra branch won't be noticeable. Latin-1 may be rare compared to ASCII but it's still got some common characters and it would be essentially free.

No, the cost is a branch instruction on each iteration as ascii and latin1 uses different structures.

gvanrossum · 2022-04-15T19:11:44Z

No, the cost is a branch instruction on each iteration as ascii and latin1 uses different structures.

Hm, couldn't you just store a pointer to the array of bytes (and another to the end) rather than an index? Or is it possible that the bytes move around somehow?

kumaraditya303 · 2022-04-15T19:19:36Z

See the LATIN1 macro in unicodeobject.c.

kumaraditya303 · 2022-04-15T19:21:18Z

It requires a check if ch is less than 128 then it uses a different array to index depending on the comparison.

sweeneyde · 2022-04-15T23:23:25Z

How does this affect performance when ascii and non-ascii are mixed together in the same string?

gvanrossum · 2022-04-16T00:26:48Z

It requires a check if ch is less than 128 then it uses a different array to index depending on the comparison.

Oh, I see. That's a bit unfortunate but I see your point and I guess ASCII strings are somewhat special anyways.

How does this affect performance when ascii and non-ascii are mixed together in the same string?

In that case the representation of the whole string will not use the "compact ASCII" format and we'll be using the regular (slow) iterator.

@kumaraditya303 Please address the other review comments.

…er-str

kumaraditya303 · 2022-04-17T09:40:57Z

Added some tests and addressed comments.

bedevere-bot · 2022-04-17T09:57:04Z

🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit ad2d676 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

bedevere-bot · 2022-04-17T10:07:48Z

🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit 56d110c 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

Lib/test/test_unicode.py

erlend-aasland

Looks good!

Objects/unicodeobject.c

Lib/test/test_unicode.py

Objects/unicodeobject.c

gvanrossum

I'm happy now!

kumaraditya303 added 5 commits Apr 15, 2022

add fast iterator

1b92ef3

remove reduntant code

5e99408

microoptimize

f22ebb8

whitespace

4366046

make it static

ed09a6e

bedevere-bot added the awaiting review label Apr 15, 2022

Merge branch 'main' of https://github.com/python/cpython into fast-it…

c902362

…er-str

kumaraditya303 changed the title ~~Fast string iterator~~ gh-91576: Speed up iteration of strings Apr 15, 2022

refcount

b59f0fc

JelleZijlstra self-requested a review Apr 15, 2022

finalize static type

0a84504

arhadthedev reviewed Apr 15, 2022

View changes

Objects/object.c Outdated Show resolved Hide resolved

kumaraditya303 force-pushed the fast-iter-str branch from 6eeeee0 to 0a84504 Compare Apr 15, 2022

kumaraditya303 marked this pull request as ready for review Apr 15, 2022

kumaraditya303 requested a review from gvanrossum Apr 15, 2022

blurb-it bot and others added 2 commits Apr 15, 2022

📜🤖 Added by blurb_it.

81ea999

Merge branch 'main' into fast-iter-str

2e80fe6

gvanrossum reviewed Apr 15, 2022

View changes

erlend-aasland reviewed Apr 15, 2022

View changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

gvanrossum reviewed Apr 15, 2022

View changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

bedevere-bot added the awaiting changes label Apr 16, 2022

kumaraditya303 added 2 commits Apr 17, 2022

move inc out of loop

b9f75d1

Merge branch 'main' of https://github.com/python/cpython into fast-it…

c443507

…er-str

kumaraditya303 requested a review from gvanrossum Apr 17, 2022

add some tests

ad2d676

kumaraditya303 self-assigned this Apr 17, 2022

kumaraditya303 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 17, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 17, 2022

extern typeobject

56d110c

kumaraditya303 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 17, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 17, 2022

JelleZijlstra reviewed Apr 17, 2022

View changes

Lib/test/test_unicode.py Outdated Show resolved Hide resolved

fix test

fb90b7b

kumaraditya303 requested a review from JelleZijlstra Apr 17, 2022

erlend-aasland approved these changes Apr 17, 2022

View changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Objects/unicodeobject.c Show resolved Hide resolved

sweeneyde reviewed Apr 17, 2022

View changes

Lib/test/test_unicode.py Show resolved Hide resolved

sweeneyde reviewed Apr 17, 2022

View changes

Objects/unicodeobject.c Show resolved Hide resolved

JelleZijlstra removed their request for review Apr 17, 2022

kumaraditya303 added 2 commits Apr 18, 2022

more tests

4321569

fix test

09b63ab

sweeneyde approved these changes Apr 18, 2022

View changes

bedevere-bot added awaiting merge and removed awaiting changes labels Apr 18, 2022

gvanrossum approved these changes Apr 18, 2022

View changes

gvanrossum merged commit 8c54c3d into python:main Apr 18, 2022
13 checks passed

bedevere-bot removed the awaiting merge label Apr 18, 2022

kumaraditya303 deleted the fast-iter-str branch Apr 18, 2022

gh-91576: Speed up iteration of strings #91574

gh-91576: Speed up iteration of strings #91574

kumaraditya303 commented Apr 15, 2022 •

edited

JelleZijlstra commented Apr 15, 2022

kumaraditya303 commented Apr 15, 2022

gvanrossum left a comment

kumaraditya303 commented Apr 15, 2022

gvanrossum left a comment

gvanrossum commented Apr 15, 2022

kumaraditya303 commented Apr 15, 2022

gvanrossum commented Apr 15, 2022

kumaraditya303 commented Apr 15, 2022 •

edited

kumaraditya303 commented Apr 15, 2022

sweeneyde commented Apr 15, 2022

gvanrossum commented Apr 16, 2022

kumaraditya303 commented Apr 17, 2022

bedevere-bot commented Apr 17, 2022

bedevere-bot commented Apr 17, 2022

erlend-aasland left a comment

gvanrossum left a comment

gh-91576: Speed up iteration of strings #91574

gh-91576: Speed up iteration of strings #91574

Conversation

kumaraditya303 commented Apr 15, 2022 • edited

JelleZijlstra commented Apr 15, 2022

kumaraditya303 commented Apr 15, 2022

gvanrossum left a comment

kumaraditya303 commented Apr 15, 2022

gvanrossum left a comment

gvanrossum commented Apr 15, 2022

kumaraditya303 commented Apr 15, 2022

gvanrossum commented Apr 15, 2022

kumaraditya303 commented Apr 15, 2022 • edited

kumaraditya303 commented Apr 15, 2022

sweeneyde commented Apr 15, 2022

gvanrossum commented Apr 16, 2022

kumaraditya303 commented Apr 17, 2022

bedevere-bot commented Apr 17, 2022

bedevere-bot commented Apr 17, 2022

erlend-aasland left a comment

gvanrossum left a comment

kumaraditya303 commented Apr 15, 2022 •

edited

kumaraditya303 commented Apr 15, 2022 •

edited