New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-47009: Streamline list.append for the common case #31864
Conversation
Benchmarks are good: from pyperf import Runner, perf_counter
def bench_listcomp(loops, length):
src = list(map(float, range(length)))
t0 = perf_counter()
for i in range(loops):
[x for x in src]
return perf_counter() - t0
def bench_append(loops, length):
src = list(map(float, range(length)))
t0 = perf_counter()
for i in range(loops):
arr = []
for x in src:
arr.append(x)
return perf_counter() - t0
runner = Runner()
for n in [100, 1_000, 10_000, 100_000]:
runner.bench_time_func(f"listcomp {n}", bench_listcomp, n)
runner.bench_time_func(f"append {n}", bench_append, n) Results from GCC on WSL with
|
PyAPI_FUNC(int) | ||
_PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem); | ||
|
||
static inline int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an inline function.
Let the LTO pass decide if it wants to inline it.
app1(PyListObject *self, PyObject *v) | ||
/* internal, used by _PyList_AppendTakeRef */ | ||
int | ||
_PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the extra function?
Converting app1
to _PyList_AppendTakeRef
would seem to be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something like that and the results are on the bpo issue, but it seems there is a significant benefit to having the separate as-small-as-possible function that always gets inlined as opposed to a slightly longer version that we let the compiler figure out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably the compiler isn't inlining the function because it isn't hot enough.
PRECALL_NO_KW_LIST_APPEND
and LIST_APPEND
are only 0.3% of all instructions executed, so maybe it is best not to inline.
Do you have benchmark numbers for the standard suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://gist.github.com/sweeneyde/6cbbe1c9d216d117370a809c704b6cfc
Geometric mean: 1.00x faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, IMO there is a relative lack of list comprehensions in the pyperformance suite.
@markshannon Since pyperformance results were negligible and this has a significant speedup on operations that live in hot loops (albeit not in pyperformance), is it okay if I merge this? |
Sorry, this dropped off my radar. |
https://bugs.python.org/issue47009