gh-100726: optimize construction of range object for medium sized integers #100727

eendebakpt · 2023-01-03T20:57:33Z

The python range object contains code to compute the length from the start, stop and step values. This calculation is performed with PyLong objects which is expensive. This PR adds a fast path for the common case where start, stop and step all fit into a long value.

Benchmark results:

range(0, 5): Mean +- std dev: [main_range] 110 ns +- 1 ns -> [pr_range] 56.8 ns +- 1.3 ns: 1.93x faster
list(range(0, 5)): Mean +- std dev: [main_range] 243 ns +- 1 ns -> [pr_range] 181 ns +- 3 ns: 1.34x faster
range(-2, 3): Mean +- std dev: [main_range] 110 ns +- 2 ns -> [pr_range] 56.6 ns +- 1.0 ns: 1.94x faster
list(range(-2, 3)): Mean +- std dev: [main_range] 244 ns +- 3 ns -> [pr_range] 182 ns +- 4 ns: 1.34x faster
range(2, 60): Mean +- std dev: [main_range] 112 ns +- 1 ns -> [pr_range] 57.1 ns +- 1.4 ns: 1.96x faster
list(range(2, 60)): Mean +- std dev: [main_range] 535 ns +- 28 ns -> [pr_range] 472 ns +- 27 ns: 1.13x faster
range(2, 60, 2): Mean +- std dev: [main_range] 115 ns +- 1 ns -> [pr_range] 59.4 ns +- 1.4 ns: 1.94x faster
list(range(2, 60, 2)): Mean +- std dev: [main_range] 382 ns +- 11 ns -> [pr_range] 317 ns +- 5 ns: 1.21x faster
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: [main_range] 136 ns +- 1 ns -> [pr_range] 137 ns +- 3 ns: 1.01x slower
list(range(73786976294838206464, 73786976294838206469, 2)): Mean +- std dev: [main_range] 444 ns +- 3 ns -> [pr_range] 440 ns +- 6 ns: 1.01x faster
for loop with range): Mean +- std dev: [main_range] 878 ns +- 12 ns -> [pr_range] 815 ns +- 21 ns: 1.08x faster

Geometric mean: 1.39x faster

Benchmark script

import pyperf
runner = pyperf.Runner()

cases = [(0,5), (-2, 3), (2, 60)]
for s,t in cases:
    time = runner.timeit(name=f'range({s}, {t})', stmt=f"range({s}, {t})")
    time = runner.timeit(name=f'list(range({s}, {t}))', stmt=f"list(range({s}, {t}))")

time = runner.timeit(name=f'range({s}, {t}, 2)', stmt=f"range({s}, {t}, 2)")
time = runner.timeit(name=f'list(range({s}, {t}, 2))', stmt=f"list(range({s}, {t}, 2))")

big = 2**66
s,t=big, big+5
time = runner.timeit(name=f'range({s}, {t}, 2)', stmt=f"range({s}, {t}, 2)")
time = runner.timeit(name=f'list(range({s}, {t}, 2))', stmt=f"list(range({s}, {t}, 2))")
    
stmt="""
v=0
for ii in range(20):
    v += ii*ii
"""
time = runner.timeit(name='for loop with range)', stmt=stmt)

Notes

The macro and inline method IS_MEDIUM_VALUE and medium_value have been moved from longobject.c to the internal header files. The names have been kept the same to avoid too many changes. But since they are now visible outside longobject.c we can rename them to _PyLong_IsMediumValue and _PyLong_MediumValue.

Issue: Optimize Python range object for small integers #100726

mdickinson · 2023-01-04T08:24:20Z

Could you try this using PyLong_AsLong instead of MEDIUM_VALUE? This would cover more cases, be more consistent with the existing long-handling in rangeobject.c, and avoid the internal details of the long object representation leaking out too much. I'm not too comfortable with making the range implementation dependent on the longobject internals.

eendebakpt · 2023-01-04T10:14:19Z

Could you try this using PyLong_AsLong instead of MEDIUM_VALUE? This would cover more cases, be more consistent with the existing long-handling in rangeobject.c, and avoid the internal details of the long object representation leaking out too much. I'm not too comfortable with making the range implementation dependent on the longobject internals.

@mdickinson The PyLong_AsLong indeed has many advantages, but it internally sets error values and needs PyErr_Occurred() to disambiguate which impacts performance.

The PyLong_AsLongAndOverflow is also public and perform a bit less work, so I tried that method. The branch with PyLong_AsLongAndOverflow is also faster than main for the common case (but not as fast as this PR). For the case of large integers PyLong_AsLongAndOverflow is slower than main (but not a lot).

Benchmark results

Main:

range(0, 5): Mean +- std dev: 110 ns +- 1 ns
list(range(0, 5)): Mean +- std dev: 243 ns +- 1 ns
range(-2, 3): Mean +- std dev: 110 ns +- 2 ns
list(range(-2, 3)): Mean +- std dev: 244 ns +- 3 ns
range(2, 60): Mean +- std dev: 112 ns +- 1 ns
list(range(2, 60)): Mean +- std dev: 535 ns +- 28 ns
range(2, 60, 2): Mean +- std dev: 115 ns +- 1 ns
list(range(2, 60, 2)): Mean +- std dev: 382 ns +- 11 ns
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: 136 ns +- 1 ns
list(range(73786976294838206464, 73786976294838206469, 2)): Mean +- std dev: 444 ns +- 3 ns
for loop with range(20): Mean +- std dev: 878 ns +- 12 ns

This PR:

range(0, 5): Mean +- std dev: 56.8 ns +- 1.3 ns
list(range(0, 5)): Mean +- std dev: 181 ns +- 3 ns
range(-2, 3): Mean +- std dev: 56.6 ns +- 1.0 ns
list(range(-2, 3)): Mean +- std dev: 182 ns +- 4 ns
range(2, 60): Mean +- std dev: 57.1 ns +- 1.4 ns
list(range(2, 60)): Mean +- std dev: 472 ns +- 27 ns
range(2, 60, 2): Mean +- std dev: 59.4 ns +- 1.4 ns
list(range(2, 60, 2)): Mean +- std dev: 317 ns +- 5 ns
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: 137 ns +- 3 ns
list(range(73786976294838206464, 73786976294838206469, 2)): Mean +- std dev: 440 ns +- 6 ns
for loop with range(20): Mean +- std dev: 815 ns +- 21 ns

Branch using PyLong_AsLongAndOverflow: main...eendebakpt:cpython:range_fast_path_v2

range(0, 5): Mean +- std dev: 63.3 ns +- 0.8 ns
list(range(0, 5)): Mean +- std dev: 193 ns +- 3 ns
range(-2, 3): Mean +- std dev: 63.9 ns +- 1.3 ns
list(range(-2, 3)): Mean +- std dev: 194 ns +- 3 ns
range(2, 60): Mean +- std dev: 64.0 ns +- 1.7 ns
list(range(2, 60)): Mean +- std dev: 478 ns +- 10 ns
range(2, 60, 2): Mean +- std dev: 66.8 ns +- 3.5 ns
list(range(2, 60, 2)): Mean +- std dev: 333 ns +- 13 ns
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: 149 ns +- 6 ns
list(range(73786976294838206464, 73786976294838206469, 2)): Mean +- std dev: 455 ns +- 8 ns
for loop with range(20): Mean +- std dev: 831 ns +- 17 ns

The benchmark of list(range(73786976294838206464, 73786976294838206469, 2)) slows down from main to the PyLong_AsLongOverflow from 444 to 455 ns. (range(73786976294838206464, 73786976294838206469, 2) slows down as well, but a standalone range is hardly used).

I am fine with accepting a minor slowdown for the uncommon case, although perhaps there are more cases to consider than I included in the benchmark.

Which cases would PyLong_AsLong or PyLong_AsLongOverflow cover more than the IS_MEDIUM_VALUE? According to the documentation the input arguments to compute_range_length should satisfy PyLong_Check(arg)=1. So values of like numpy.int64(3) or Decimal(4,1) are converted at an earlier stage (e.g. in range_from_array).

mdickinson · 2023-01-04T11:13:36Z

Which cases would PyLong_AsLong or PyLong_AsLongOverflow cover more than the IS_MEDIUM_VALUE?

Using a PyLong_AsLong (or a variant) would allow anything that fits in a C long to use the fast path, rather than just those values that fit in a single digit. So on a typical 64-bit machine, that's all values between -2**63 and 2**63-1, rather than just values in the range [-2**30, 2**30] (the "medium" values).

IOW, thinking of the range implementation as a client user of the PyLong implementation, what that client needs is a way to say, efficiently, "tell me whether this object (which I already know is a PyLongObject) fits in a C long, and if so, give me its value as a C long". IS_MEDIUM_VALUE is doing something quite different, that's dependent on the internal representation of a Python long - the range implementation shouldn't know or care about "medium" values.

It's that information about digits, twodigits, medium values, number of bits per digit, etc., that I think we should keep encapsulated within the long implementation - it shouldn't be relevant to users of that implementation. If it starts leaking into other parts of the codebase, that's going to turn into a maintenance headache whenever the long implementation needs tweaking.

If none of the existing public or private long API functions does what we need efficiently (e.g., because it's doing index checks and the like), I'd be open to adding a new function that does answer the "tell me whether this PyLong fits in a C long" question efficiently.

eendebakpt · 2023-01-04T13:31:04Z

@mdickinson I refactored the PyLong_AsLongAndOverflow into a part with error checking and a new (private) _PyLong_AsLongAndOverflow that performs the actual conversion to long and overflow check.

Benchmark of the updated PR against main:

range(0, 5): Mean +- std dev: [main] 110 ns +- 1 ns -> [pr_v4] 61.2 ns +- 1.5 ns: 1.79x faster
list(range(0, 5)): Mean +- std dev: [main] 244 ns +- 2 ns -> [pr_v4] 186 ns +- 1 ns: 1.31x faster
range(-2, 3): Mean +- std dev: [main] 111 ns +- 4 ns -> [pr_v4] 61.6 ns +- 1.5 ns: 1.80x faster
list(range(-2, 3)): Mean +- std dev: [main] 244 ns +- 1 ns -> [pr_v4] 187 ns +- 1 ns: 1.30x faster
range(2, 60): Mean +- std dev: [main] 112 ns +- 1 ns -> [pr_v4] 62.3 ns +- 3.7 ns: 1.79x faster
list(range(2, 60)): Mean +- std dev: [main] 527 ns +- 7 ns -> [pr_v4] 466 ns +- 9 ns: 1.13x faster
range(2, 60, 2): Mean +- std dev: [main] 116 ns +- 1 ns -> [pr_v4] 64.2 ns +- 1.0 ns: 1.80x faster
list(range(2, 60, 2)): Mean +- std dev: [main] 379 ns +- 3 ns -> [pr_v4] 321 ns +- 8 ns: 1.18x faster
range(4398046511104, 4398046511109, 2): Mean +- std dev: [main] 135 ns +- 1 ns -> [pr_v4] 68.1 ns +- 1.9 ns: 1.98x faster
list(range(4398046511104, 4398046511109, 2)): Mean +- std dev: [main] 311 ns +- 4 ns -> [pr_v4] 235 ns +- 4 ns: 1.32x faster
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: [main] 137 ns +- 6 ns -> [pr_v4] 148 ns +- 1 ns: 1.08x slower
list(range(73786976294838206464, 73786976294838206469, 2)): Mean +- std dev: [main] 444 ns +- 2 ns -> [pr_v4] 447 ns +- 12 ns: 1.01x slower
for loop with range(10): Mean +- std dev: [main] 879 ns +- 17 ns -> [pr_v4] 826 ns +- 16 ns: 1.06x faster
for loop with range(20): Mean +- std dev: [main] 881 ns +- 22 ns -> [pr_v4] 824 ns +- 19 ns: 1.07x faster

Geometric mean: 1.35x faster

Most test cases are faster, now also the cases with long values that are not medium. The new check _PyLong_AsLongAndOverflow is a bit slower than IS_MEDIUM_INT, because it has to do a bit more work.

mdickinson · 2023-01-06T14:19:03Z

Thanks for the updates! I should have time to take a proper look this weekend.

eendebakpt · 2023-01-06T21:01:07Z

@mdickinson Thanks for reviewing. The refactoring of PyLong_AsLongAndOverflow works well for this PR, but I am a little concerned than if the _PyLong_AsLongAndOverflow is not inlined in PyLong_AsLongAndOverflow there could be a minor slowdown for the other method calling PyLong_AsLongAndOverflow. I tried the pyperformance benchmark, but my machine is too unstable to give definite results.

I created #100810 as a simplified version of this PR. Benchmark:

range(0, 5): Mean +- std dev: 65.1 ns +- 1.1 ns
list(range(0, 5)): Mean +- std dev: 194 ns +- 2 ns
range(-2, 3): Mean +- std dev: 65.5 ns +- 0.9 ns
list(range(-2, 3)): Mean +- std dev: 195 ns +- 2 ns
range(2, 60): Mean +- std dev: 65.5 ns +- 0.9 ns
list(range(2, 60)): Mean +- std dev: 477 ns +- 16 ns
range(2, 60, 2): Mean +- std dev: 70.6 ns +- 1.0 ns
list(range(2, 60, 2)): Mean +- std dev: 328 ns +- 5 ns
range(4398046511104, 4398046511109, 2): Mean +- std dev: 76.0 ns +- 0.6 ns
list(range(4398046511104, 4398046511109, 2)): Mean +- std dev: 242 ns +- 1 ns
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: 145 ns +- 3 ns
list(range(73786976294838206464, 73786976294838206469, 2)): Mean +- std dev: 450 ns +- 5 ns
range(-1, -1, -1): Mean +- std dev: 67.4 ns +- 1.2 ns
range(-1, -1, 2**66): Mean +- std dev: 248 ns +- 3 ns
for loop with range(10): Mean +- std dev: 826 ns +- 19 ns
for loop with range(20): Mean +- std dev: 827 ns +- 26 ns

The simplified version is not as fast as this PR, but still a solid improvement over current main. I have no strong preference for either one of the two.

The cases that slowdown compared to main (both this PR and the simplified one) involve very large integers that do not fit into a C long, e.g. range(73786976294838206464, 73786976294838206469, 2). I suspect these cases are rare, and if used the computation time will not be determined by the range object itself.

mdickinson · 2023-01-07T15:43:01Z

Thanks for the discussion. I confess that I'm much more comfortable with #100810; I'll continue review there.

optimize construction of range object for medium sized integers

88b601c

bedevere-bot added the awaiting review label Jan 3, 2023

bedevere-bot mentioned this pull request Jan 3, 2023

Optimize Python range object for small integers #100726

Closed

blurb-it bot and others added 4 commits January 3, 2023 20:59

📜🤖 Added by blurb_it.

477f341

fix news item

b755093

fix assert

7b3717c

Merge branch 'main' into range_fast_path

2f914bc

Fidget-Spinner added the performance Performance or resource usage label Jan 4, 2023

Fidget-Spinner requested a review from sweeneyde January 4, 2023 05:58

use PyLong_asLongAndOverflow

81fb4dd

refactor PyLong_AsLongAndOverflow

449b31e

eendebakpt changed the title ~~gh-100726: optimize construction of range object for medium sized integers~~ Draft: gh-100726: optimize construction of range object for medium sized integers Jan 4, 2023

eendebakpt added 2 commits January 4, 2023 15:28

fix error handling

388cc13

make method static

7e765a5

eendebakpt changed the title ~~Draft: gh-100726: optimize construction of range object for medium sized integers~~ gh-100726: optimize construction of range object for medium sized integers Jan 4, 2023

mdickinson self-requested a review January 6, 2023 14:18

eendebakpt closed this Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-100726: optimize construction of range object for medium sized integers #100727

gh-100726: optimize construction of range object for medium sized integers #100727

Uh oh!

eendebakpt commented Jan 3, 2023 •

edited by bedevere-bot

Loading

Uh oh!

mdickinson commented Jan 4, 2023

Uh oh!

eendebakpt commented Jan 4, 2023

Uh oh!

mdickinson commented Jan 4, 2023 •

edited

Loading

Uh oh!

eendebakpt commented Jan 4, 2023

Uh oh!

mdickinson commented Jan 6, 2023

Uh oh!

eendebakpt commented Jan 6, 2023

Uh oh!

mdickinson commented Jan 7, 2023

Uh oh!

Uh oh!

Uh oh!

gh-100726: optimize construction of range object for medium sized integers #100727

gh-100726: optimize construction of range object for medium sized integers #100727

Uh oh!

Conversation

eendebakpt commented Jan 3, 2023 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdickinson commented Jan 4, 2023

Uh oh!

eendebakpt commented Jan 4, 2023

Uh oh!

mdickinson commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eendebakpt commented Jan 4, 2023

Uh oh!

mdickinson commented Jan 6, 2023

Uh oh!

eendebakpt commented Jan 6, 2023

Uh oh!

mdickinson commented Jan 7, 2023

Uh oh!

Uh oh!

eendebakpt commented Jan 3, 2023 •

edited by bedevere-bot

Loading

mdickinson commented Jan 4, 2023 •

edited

Loading