bpo-26828: Add __length_hint__() to builtin map iterator #14432

nmusolino · 2019-06-27T20:30:18Z

Add __length_hint__() method to the iterator returned by builtin function map. This method returns a length hint if all the user-supplied arguments to map have length hints available.

https://bugs.python.org/issue26828

the-knights-who-say-ni · 2019-06-27T20:30:22Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

vstinner · 2019-06-27T21:35:04Z

This change is an optimization. We require benchmarks to prove that it makes Python faster. Test very shorts lists (ex: 0, 5 and 10 items), and some other lengthd (ex: 100, 1000, 10 000).

nmusolino · 2019-06-27T23:12:25Z

Thank you, @vstinner , for your comment. I can run some ad-hoc benchmarks with timeit, and also try running pyperformance.

It is conceivable that this change would significantly improve performance for large sizes, but make performance worse for very short iterators (under 8 elements), since more work is being done to compute the length hint, with no (re)allocation savings.

vstinner · 2019-06-27T23:19:25Z

Thank you, @vstinner , for your comment. I can run some ad-hoc benchmarks with timeit, and also try running pyperformance.

Micro-benchmarks using pyperf time would be acceptable for me ;-) (I don't expect any significant impact on pyperformance benchmarks.)

It is conceivable that this change would significantly improve performance for large sizes, but make performance worse for very short iterators (under 8 elements), since more work is being done to compute the length hint, with no (re)allocation savings.

Well, https://bugs.python.org/issue26828 has been closed...

nmusolino · 2019-06-29T22:20:42Z

Here are the results of a microbenchmark as suggested by @vstinner. It took me some time to get a PGO/LTO build working.

master: commit 97d15b1
with-length-hint: commit d6d90d1

Both were built with:

mkdir build_$(git log -n1 --format="%h")
cd build_$(git log -n1 --format="%h")
../configure --enable-optimizations --with-lto --with-openssl=$(brew --prefix openssl)
emacs -nw Makefile  # Add `-x test_os` to exclude test_os from PGO profile generation run
make -s -j20 profile-opt

System specs: iMac with 3.8 GHz Intel Core i5, 16 GB DDR4 memory, mac OS 10.13.6

Profiled the following expression with timeit: list(map(lambda x:x, range(n)))

# profiled expression:   
for i in 0 1 5 10 20 25 50 100 250 500 1000 10_000 100_000 1_000_000 5_000_000 10_000_000 50_000_000; do
    printf "%10s %24s %24s\n" $i \
        "$(./build_97d15b1ee0/python.exe -m timeit "list(map(lambda x:x, range($i)))" | cut -f2 -d':')" \
        "$(./build_d6d90d1d3b/python.exe -m timeit "list(map(lambda x:x, range($i)))" | cut -f2 -d':')";
done

         n           CPYTHON MASTER              THIS BRANCH
         0        398 nsec per loop        367 nsec per loop
         1        446 nsec per loop        486 nsec per loop
         5        646 nsec per loop        686 nsec per loop
        10        838 nsec per loop        900 nsec per loop
        20       1.33 usec per loop       1.33 usec per loop
        25       1.52 usec per loop       1.55 usec per loop
        50       2.66 usec per loop       2.56 usec per loop
       100       5.13 usec per loop       4.74 usec per loop
       250       11.8 usec per loop       10.9 usec per loop
       500       25.5 usec per loop       24.1 usec per loop
      1000       51.6 usec per loop       49.9 usec per loop
    10_000        519 usec per loop        519 usec per loop
   100_000       6.88 msec per loop       6.46 msec per loop
 1_000_000       72.5 msec per loop       70.4 msec per loop
 5_000_000        364 msec per loop        359 msec per loop
10_000_000        735 msec per loop        719 msec per loop
50_000_000        3.67 sec per loop        3.72 sec per loop

My takeaways are:

For very short lists (fewer than ten elements), using a length hint is not helpful. There is an additional cost of calling the length hint function, which seems to be around 50 ns.
- By default, if a length hint is not available list() allocates enough memory for 8 elements.
- if the __length_hint__ method for the iterator were implemented in pure Python, this would presumably be a higher cost.
The breakeven is iterables around ~25 elements.
For medium-length iterators (e.g. n = 50, 100, 250, 500), the implementation with __length_hint__ is around 5% faster.

Next step will be pyperformance benchmarks.

nmusolino · 2019-06-29T23:46:22Z

Here's output for the pyperformance comparison, filtering on only changes labeled "Significant":

$ pyperformance compare   ./build_97d15b1ee0/pyperformance_output_build_97d15b1ee0.json ./build_d6d90d1d3b/pyperformance_output_build_d6d90d1d3b.json  | grep -B3 'Significant'

Output shows two benchmarks slower and two faster.

### genshi_text ###
Mean +- std dev: 33.4 ms +- 0.4 ms -> 34.1 ms +- 1.1 ms: 1.02x slower
Significant (t=-4.67)
--

### richards ###
Mean +- std dev: 84.3 ms +- 1.2 ms -> 86.2 ms +- 1.6 ms: 1.02x slower
Significant (t=-7.29)
--

### sqlite_synth ###
Mean +- std dev: 3.15 us +- 0.04 us -> 3.05 us +- 0.05 us: 1.03x faster
Significant (t=11.38)
--

### unpickle_list ###
Mean +- std dev: 4.18 us +- 0.04 us -> 4.09 us +- 0.03 us: 1.02x faster
Significant (t=14.37)

vstinner · 2019-07-01T08:43:47Z

For very short lists (fewer than ten elements), using a length hint is not helpful.

That's why the idea of using PyObject_LengthHint() has been abandoned. Instead, _PyObject_HasLen() was introduced to check if it's possible to get an object length without having to call a Python function (a Python function call is "slow" in such context):

https://bugs.python.org/issue33234
commit 372d705

commit 372d705d958964289d762953d0a61622755f5386
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Sun Oct 28 20:16:26 2018 +0000

    [bpo-33234](https://bugs.python.org/issue33234) Improve list() pre-sizing for inputs with known lengths (GH-9846)
    
    The list() constructor isn't taking full advantage of known input
    lengths or length hints. This commit makes the constructor
    pre-size and not over-allocate when the input size is known (the
    input collection implements __len__). One on the main advantages is
    that this provides 12% difference in memory savings due to the difference
    between overallocating and allocating exactly the input size.
    
    For efficiency purposes and to avoid a performance regression for small
    generators and collections, the size of the input object is calculated using
    __len__ and not __length_hint__, as the later is considerably slower.

As I wrote previously, if you would like to explore the length hint path, you should experiment to add a new tp_length_hint slot to PyTypeObject (or maybe to PySequenceMethods and/or PyMappingMethods). The overhead of a Python function call is too high.

vstinner · 2019-07-01T08:43:56Z

cc @pablogsal

iritkatriel · 2022-04-11T22:59:53Z

https://bugs.python.org/issue26828 was rejected. Can this be closed?

nmusolino · 2022-04-12T14:03:08Z

https://bugs.python.org/issue26828 was rejected. Can this be closed?

Yes, we can close this MR. I never had the change to add tp_length_hint to PyTypeObject or PySequenceMethods as suggested above by @vstinner .

bedevere-bot added the awaiting review label Jun 27, 2019

the-knights-who-say-ni added the CLA not signed label Jun 27, 2019

nmusolino force-pushed the bpo-37435-map-length-hint branch from ccdd8d9 to 3229bda Compare June 27, 2019 21:13

nmusolino changed the title ~~bpo-37435: Add __length_hint__() to builtin map iterator~~ bpo-26828: Add __length_hint__() to builtin map iterator Jun 27, 2019

bpo-26828: Add __length_hint__() to builtin map iterator

d6d90d1

nmusolino force-pushed the bpo-37435-map-length-hint branch from 3229bda to d6d90d1 Compare June 27, 2019 21:40

brettcannon removed the CLA not signed label Aug 8, 2019

the-knights-who-say-ni added the CLA signed label Aug 8, 2019

csabella requested a review from pablogsal December 1, 2019 19:13

csabella requested review from pablogsal and removed request for pablogsal January 17, 2020 12:20

nmusolino closed this Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-26828: Add __length_hint__() to builtin map iterator #14432

bpo-26828: Add __length_hint__() to builtin map iterator #14432

Uh oh!

nmusolino commented Jun 27, 2019 •

edited

Loading

Uh oh!

the-knights-who-say-ni commented Jun 27, 2019

Uh oh!

vstinner commented Jun 27, 2019

Uh oh!

nmusolino commented Jun 27, 2019

Uh oh!

vstinner commented Jun 27, 2019

Uh oh!

nmusolino commented Jun 29, 2019 •

edited

Loading

Uh oh!

nmusolino commented Jun 29, 2019 •

edited

Loading

Uh oh!

vstinner commented Jul 1, 2019 •

edited by bedevere-bot

Loading

Uh oh!

vstinner commented Jul 1, 2019

Uh oh!

iritkatriel commented Apr 11, 2022

Uh oh!

nmusolino commented Apr 12, 2022

Uh oh!

Uh oh!

Uh oh!

bpo-26828: Add __length_hint__() to builtin map iterator #14432

bpo-26828: Add __length_hint__() to builtin map iterator #14432

Uh oh!

Conversation

nmusolino commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Jun 27, 2019

Uh oh!

vstinner commented Jun 27, 2019

Uh oh!

nmusolino commented Jun 27, 2019

Uh oh!

vstinner commented Jun 27, 2019

Uh oh!

nmusolino commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmusolino commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jul 1, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jul 1, 2019

Uh oh!

iritkatriel commented Apr 11, 2022

Uh oh!

nmusolino commented Apr 12, 2022

Uh oh!

Uh oh!

nmusolino commented Jun 27, 2019 •

edited

Loading

nmusolino commented Jun 29, 2019 •

edited

Loading

nmusolino commented Jun 29, 2019 •

edited

Loading

vstinner commented Jul 1, 2019 •

edited by bedevere-bot

Loading