Skip to content

bpo-26828: Add __length_hint__() to builtin map iterator #14432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nmusolino
Copy link

@nmusolino nmusolino commented Jun 27, 2019

Add __length_hint__() method to the iterator returned by builtin function map. This method returns a length hint if all the user-supplied arguments to map have length hints available.

https://bugs.python.org/issue26828

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@nmusolino nmusolino force-pushed the bpo-37435-map-length-hint branch from ccdd8d9 to 3229bda Compare June 27, 2019 21:13
@nmusolino nmusolino changed the title bpo-37435: Add __length_hint__() to builtin map iterator bpo-26828: Add __length_hint__() to builtin map iterator Jun 27, 2019
@vstinner
Copy link
Member

This change is an optimization. We require benchmarks to prove that it makes Python faster. Test very shorts lists (ex: 0, 5 and 10 items), and some other lengthd (ex: 100, 1000, 10 000).

@nmusolino nmusolino force-pushed the bpo-37435-map-length-hint branch from 3229bda to d6d90d1 Compare June 27, 2019 21:40
@nmusolino
Copy link
Author

Thank you, @vstinner , for your comment. I can run some ad-hoc benchmarks with timeit, and also try running pyperformance.

It is conceivable that this change would significantly improve performance for large sizes, but make performance worse for very short iterators (under 8 elements), since more work is being done to compute the length hint, with no (re)allocation savings.

@vstinner
Copy link
Member

Thank you, @vstinner , for your comment. I can run some ad-hoc benchmarks with timeit, and also try running pyperformance.

Micro-benchmarks using pyperf time would be acceptable for me ;-) (I don't expect any significant impact on pyperformance benchmarks.)

It is conceivable that this change would significantly improve performance for large sizes, but make performance worse for very short iterators (under 8 elements), since more work is being done to compute the length hint, with no (re)allocation savings.

Well, https://bugs.python.org/issue26828 has been closed...

@nmusolino
Copy link
Author

nmusolino commented Jun 29, 2019

Here are the results of a microbenchmark as suggested by @vstinner. It took me some time to get a PGO/LTO build working.

master: commit 97d15b1
with-length-hint: commit d6d90d1

Both were built with:

mkdir build_$(git log -n1 --format="%h")
cd build_$(git log -n1 --format="%h")
../configure --enable-optimizations --with-lto --with-openssl=$(brew --prefix openssl)
emacs -nw Makefile  # Add `-x test_os` to exclude test_os from PGO profile generation run
make -s -j20 profile-opt

System specs: iMac with 3.8 GHz Intel Core i5, 16 GB DDR4 memory, mac OS 10.13.6

Profiled the following expression with timeit: list(map(lambda x:x, range(n)))

# profiled expression:   
for i in 0 1 5 10 20 25 50 100 250 500 1000 10_000 100_000 1_000_000 5_000_000 10_000_000 50_000_000; do
    printf "%10s %24s %24s\n" $i \
        "$(./build_97d15b1ee0/python.exe -m timeit "list(map(lambda x:x, range($i)))" | cut -f2 -d':')" \
        "$(./build_d6d90d1d3b/python.exe -m timeit "list(map(lambda x:x, range($i)))" | cut -f2 -d':')";
done
         n           CPYTHON MASTER              THIS BRANCH
         0        398 nsec per loop        367 nsec per loop
         1        446 nsec per loop        486 nsec per loop
         5        646 nsec per loop        686 nsec per loop
        10        838 nsec per loop        900 nsec per loop
        20       1.33 usec per loop       1.33 usec per loop
        25       1.52 usec per loop       1.55 usec per loop
        50       2.66 usec per loop       2.56 usec per loop
       100       5.13 usec per loop       4.74 usec per loop
       250       11.8 usec per loop       10.9 usec per loop
       500       25.5 usec per loop       24.1 usec per loop
      1000       51.6 usec per loop       49.9 usec per loop
    10_000        519 usec per loop        519 usec per loop
   100_000       6.88 msec per loop       6.46 msec per loop
 1_000_000       72.5 msec per loop       70.4 msec per loop
 5_000_000        364 msec per loop        359 msec per loop
10_000_000        735 msec per loop        719 msec per loop
50_000_000        3.67 sec per loop        3.72 sec per loop

My takeaways are:

  • For very short lists (fewer than ten elements), using a length hint is not helpful. There is an additional cost of calling the length hint function, which seems to be around 50 ns.
    • By default, if a length hint is not available list() allocates enough memory for 8 elements.
    • if the __length_hint__ method for the iterator were implemented in pure Python, this would presumably be a higher cost.
  • The breakeven is iterables around ~25 elements.
  • For medium-length iterators (e.g. n = 50, 100, 250, 500), the implementation with __length_hint__ is around 5% faster.

Next step will be pyperformance benchmarks.

@nmusolino
Copy link
Author

nmusolino commented Jun 29, 2019

Here's output for the pyperformance comparison, filtering on only changes labeled "Significant":

$ pyperformance compare   ./build_97d15b1ee0/pyperformance_output_build_97d15b1ee0.json ./build_d6d90d1d3b/pyperformance_output_build_d6d90d1d3b.json  | grep -B3 'Significant'

Output shows two benchmarks slower and two faster.

### genshi_text ###
Mean +- std dev: 33.4 ms +- 0.4 ms -> 34.1 ms +- 1.1 ms: 1.02x slower
Significant (t=-4.67)
--

### richards ###
Mean +- std dev: 84.3 ms +- 1.2 ms -> 86.2 ms +- 1.6 ms: 1.02x slower
Significant (t=-7.29)
--

### sqlite_synth ###
Mean +- std dev: 3.15 us +- 0.04 us -> 3.05 us +- 0.05 us: 1.03x faster
Significant (t=11.38)
--

### unpickle_list ###
Mean +- std dev: 4.18 us +- 0.04 us -> 4.09 us +- 0.03 us: 1.02x faster
Significant (t=14.37)

@vstinner
Copy link
Member

vstinner commented Jul 1, 2019

For very short lists (fewer than ten elements), using a length hint is not helpful.

That's why the idea of using PyObject_LengthHint() has been abandoned. Instead, _PyObject_HasLen() was introduced to check if it's possible to get an object length without having to call a Python function (a Python function call is "slow" in such context):

commit 372d705d958964289d762953d0a61622755f5386
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Sun Oct 28 20:16:26 2018 +0000

    [bpo-33234](https://bugs.python.org/issue33234) Improve list() pre-sizing for inputs with known lengths (GH-9846)
    
    The list() constructor isn't taking full advantage of known input
    lengths or length hints. This commit makes the constructor
    pre-size and not over-allocate when the input size is known (the
    input collection implements __len__). One on the main advantages is
    that this provides 12% difference in memory savings due to the difference
    between overallocating and allocating exactly the input size.
    
    For efficiency purposes and to avoid a performance regression for small
    generators and collections, the size of the input object is calculated using
    __len__ and not __length_hint__, as the later is considerably slower.

As I wrote previously, if you would like to explore the length hint path, you should experiment to add a new tp_length_hint slot to PyTypeObject (or maybe to PySequenceMethods and/or PyMappingMethods). The overhead of a Python function call is too high.

@vstinner
Copy link
Member

vstinner commented Jul 1, 2019

cc @pablogsal

@csabella csabella requested a review from pablogsal December 1, 2019 19:13
@csabella csabella requested review from pablogsal and removed request for pablogsal January 17, 2020 12:20
@iritkatriel
Copy link
Member

https://bugs.python.org/issue26828 was rejected. Can this be closed?

@nmusolino
Copy link
Author

https://bugs.python.org/issue26828 was rejected. Can this be closed?

Yes, we can close this MR. I never had the change to add tp_length_hint to PyTypeObject or PySequenceMethods as suggested above by @vstinner .

@nmusolino nmusolino closed this Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants