bpo-47009: Streamline list.append for the common case #31864

sweeneyde · 2022-03-14T04:58:43Z

https://bugs.python.org/issue47009

sweeneyde · 2022-03-14T04:59:56Z

Benchmarks are good:

from pyperf import Runner, perf_counter

def bench_listcomp(loops, length):
    src = list(map(float, range(length)))
    t0 = perf_counter()
    for i in range(loops):
        [x for x in src]
    return perf_counter() - t0

def bench_append(loops, length):
    src = list(map(float, range(length)))
    t0 = perf_counter()
    for i in range(loops):
        arr = []
        for x in src:
            arr.append(x)
    return perf_counter() - t0

runner = Runner()
for n in [100, 1_000, 10_000, 100_000]:
    runner.bench_time_func(f"listcomp {n}", bench_listcomp, n)
    runner.bench_time_func(f"append {n}", bench_append, n)

Results from GCC on WSL with --enable-optimizations --with-lto

Faster (8):
- listcomp 10000: 118 us +- 2 us -> 92.6 us +- 1.5 us: 1.28x faster
- listcomp 100000: 1.16 ms +- 0.02 ms -> 916 us +- 26 us: 1.27x faster
- listcomp 1000: 12.3 us +- 0.2 us -> 9.89 us +- 0.41 us: 1.25x faster
- listcomp 100: 1.59 us +- 0.03 us -> 1.32 us +- 0.05 us: 1.21x faster
- append 100000: 1.69 ms +- 0.05 ms -> 1.45 ms +- 0.04 ms: 1.17x faster
- append 10000: 168 us +- 4 us -> 145 us +- 3 us: 1.16x faster
- append 1000: 17.4 us +- 0.3 us -> 15.2 us +- 0.6 us: 1.14x faster
- append 100: 2.03 us +- 0.06 us -> 1.81 us +- 0.08 us: 1.12x faster

Geometric mean: 1.20x faster

markshannon

I think this would be simpler, and just as fast, by renaming app1 to _PyList_AppendTakeRef (with appropriate refcount adjustments).
No need for the inline functions.

markshannon · 2022-03-18T11:47:34Z

Include/internal/pycore_list.h

+PyAPI_FUNC(int)
+_PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem);
+
+static inline int


No need for an inline function.
Let the LTO pass decide if it wants to inline it.

markshannon · 2022-03-18T11:47:34Z

Objects/listobject.c

-app1(PyListObject *self, PyObject *v)
+/* internal, used by _PyList_AppendTakeRef */
+int
+_PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem)


Do we need the extra function?
Converting app1 to _PyList_AppendTakeRef would seem to be enough.

I tried something like that and the results are on the bpo issue, but it seems there is a significant benefit to having the separate as-small-as-possible function that always gets inlined as opposed to a slightly longer version that we let the compiler figure out.

Presumably the compiler isn't inlining the function because it isn't hot enough.
PRECALL_NO_KW_LIST_APPEND and LIST_APPEND are only 0.3% of all instructions executed, so maybe it is best not to inline.

Do you have benchmark numbers for the standard suite?

https://gist.github.com/sweeneyde/6cbbe1c9d216d117370a809c704b6cfc

Geometric mean: 1.00x faster

Also, IMO there is a relative lack of list comprehensions in the pyperformance suite.

Include/internal/pycore_list.h

sweeneyde · 2022-03-31T21:17:31Z

@markshannon Since pyperformance results were negligible and this has a significant speedup on operations that live in hot loops (albeit not in pyperformance), is it okay if I merge this?

markshannon · 2022-04-01T10:17:18Z

Sorry, this dropped off my radar.

sweeneyde requested a review from markshannon as a code owner Mar 14, 2022

bedevere-bot added the awaiting core review label Mar 14, 2022

the-knights-who-say-ni added the CLA signed label Mar 14, 2022

sweeneyde requested a review from methane Mar 14, 2022

sweeneyde added the performance Performance or resource usage label Mar 14, 2022

Streamline list.append for the common case

a240df1

sweeneyde force-pushed the listappend branch from f1951b6 to a240df1 Compare Mar 14, 2022

📜🤖 Added by blurb_it.

e038176

markshannon reviewed Mar 18, 2022

View changes

kumaraditya303 reviewed Mar 21, 2022

View changes

Include/internal/pycore_list.h Outdated Show resolved Hide resolved

sweeneyde and others added 2 commits Mar 21, 2022

PyAPI_FUNC --> extern

8ad202f

Merge branch 'main' into listappend

ea3f634

markshannon merged commit a0ea7a1 into python:main Apr 1, 2022
12 checks passed

bedevere-bot removed the awaiting core review label Apr 1, 2022

sweeneyde deleted the listappend branch Apr 1, 2022

bpo-47009: Streamline list.append for the common case #31864

bpo-47009: Streamline list.append for the common case #31864

sweeneyde commented Mar 14, 2022 •

edited by bedevere-bot

sweeneyde commented Mar 14, 2022 •

edited

markshannon left a comment

markshannon Mar 18, 2022

markshannon Mar 18, 2022

sweeneyde Mar 18, 2022

markshannon Mar 18, 2022

sweeneyde Mar 18, 2022

sweeneyde Mar 18, 2022

sweeneyde commented Mar 31, 2022

markshannon commented Apr 1, 2022

bpo-47009: Streamline list.append for the common case #31864

bpo-47009: Streamline list.append for the common case #31864

Conversation

sweeneyde commented Mar 14, 2022 • edited by bedevere-bot

sweeneyde commented Mar 14, 2022 • edited

markshannon left a comment

markshannon Mar 18, 2022

Choose a reason for hiding this comment

markshannon Mar 18, 2022

Choose a reason for hiding this comment

sweeneyde Mar 18, 2022

Choose a reason for hiding this comment

markshannon Mar 18, 2022

Choose a reason for hiding this comment

sweeneyde Mar 18, 2022

Choose a reason for hiding this comment

sweeneyde Mar 18, 2022

Choose a reason for hiding this comment

sweeneyde commented Mar 31, 2022

markshannon commented Apr 1, 2022

sweeneyde commented Mar 14, 2022 •

edited by bedevere-bot

sweeneyde commented Mar 14, 2022 •

edited