Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-47009: Let PRECALL_NO_KW_LIST_APPEND do its own POP_TOP #32239

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Copy link
Member

@sweeneyde sweeneyde commented Apr 1, 2022

Most code won't do y = L.append(x) or whatnot, so PRECALL_NO_KW_LIST_APPEND is almost always followed by POP_TOP. We can verify at specialization time.

This saves a Py_INCREF(Py_None), a SET_TOP(Py_None), and POP_TOP's Py_DECREF(POP()); DISPATCH();.

Some microbenchmarks:

from pyperf import Runner, perf_counter

def bench_append(loops, length):
    src = list(map(float, range(length)))
    arr = []
    t0 = perf_counter()

    for i in range(loops):
        arr.clear()
        for x in src:
            arr.append(x)

    return perf_counter() - t0

def bench_append_less_gc(loops, length):
    src = list(map(float, range(length)))
    out = [None] * loops
    t0 = perf_counter()

    for i in range(loops):
        arr = []
        for x in src:
            arr.append(x)
        out[i] = arr

    return perf_counter() - t0

runner = Runner()
for n in [100, 1_000, 10_000, 100_000]:
    runner.bench_time_func(f"append {n}", bench_append, n, inner_loops=n)
    runner.bench_time_func(f"append-less-gc {n}", bench_append_less_gc, n, inner_loops=n)

From GCC, --enable-optimizations, --with-lto:

- append 100000: 14.9 ns +- 0.3 ns -> 13.3 ns +- 0.4 ns: 1.12x faster
- append 10000: 15.1 ns +- 0.3 ns -> 13.6 ns +- 0.5 ns: 1.11x faster
- append-less-gc 100000: 16.4 ns +- 0.5 ns -> 14.9 ns +- 0.4 ns: 1.10x faster
- append 1000: 15.6 ns +- 0.3 ns -> 14.2 ns +- 0.3 ns: 1.09x faster
- append 100: 18.9 ns +- 0.6 ns -> 17.3 ns +- 0.6 ns: 1.09x faster
- append-less-gc 100: 27.4 ns +- 1.1 ns -> 25.2 ns +- 1.2 ns: 1.09x faster
- append-less-gc 10000: 19.2 ns +- 0.3 ns -> 17.8 ns +- 0.2 ns: 1.08x faster
- append-less-gc 1000: 22.0 ns +- 0.6 ns -> 20.8 ns +- 0.3 ns: 1.06x faster

Geometric mean: 1.09x faster

https://bugs.python.org/issue47009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants