Skip to content

gh-91247: improve performance of list and tuple repeat (with specialization for n=1) #91482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 12, 2022

Compared to current main branch there are 3 improvements

  • For the list inplace repeat there is an optimized copy method
  • This PR solves the inefficient pointer chasing for small (size 2 to 7) lists and tuples
  • It shares code between the implementations of list repeat, tuple repeat and list inplace repeat

This PR keeps the specializations for repeats of lists and tuples with length 1. An alternative PR without specialization for n=1 (#32045) was rejected because the specializations might yield a speedup.

For performance comparisons to main and the version without specializations and more detailed discussion see issue #91247.

The inefficient pointer chasing can be seen in the following performance benchmark:

pc

(results obtained with runner.timeit(name=f"list({n}) repeat {r}", stmt=f"x=a*{r}", setup=f'a=list(range({n}))'))

@eendebakpt eendebakpt marked this pull request as draft April 12, 2022 19:14
@eendebakpt eendebakpt marked this pull request as ready for review April 12, 2022 21:26
@eendebakpt eendebakpt force-pushed the performance/list_repeat_v2_specialized branch from 05e2b70 to 5445cfb Compare April 21, 2022 18:20
@eendebakpt
Copy link
Contributor Author

@sweeneyde Could you have a look at this ticket? The approach is similar to #31856 and #31999 which you reviewed.

@eendebakpt eendebakpt force-pushed the performance/list_repeat_v2_specialized branch from 1491705 to 66cc8ae Compare May 11, 2022 08:39
@sweeneyde sweeneyde requested review from sweeneyde and vstinner May 24, 2022 12:28
@sweeneyde
Copy link
Member

@eendebakpt Sorry this slipped under my radar

@vstinner, would you mind reviewing? I think the change is good in principle but I want another reviewer and to make sure the header file changes are appropriate.

Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

eendebakpt and others added 2 commits July 12, 2022 11:00
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
…91.tJcy-P.rst

Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
eendebakpt and others added 2 commits July 22, 2022 22:16
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits, but this is looking good. Thanks for your patience and persistence on this.

eendebakpt and others added 3 commits July 24, 2022 14:51
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
@sweeneyde
Copy link
Member

I verified a benchmark on a Windows machine since this might be a bit compiler-specific.

pyperf timeit -s "a = [0,1]" "a * 10_000"
Mean +- std dev: [before] 32.9 us +- 0.3 us -> [after] 21.8 us +- 0.3 us: 1.51x faster

Looks great!

@sweeneyde sweeneyde added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 25, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @sweeneyde for commit 516f091 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 25, 2022
*dest++ = *src++;
}

_Py_memory_repeat((char *)np->ob_item, sizeof(PyObject *)*output_size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this cannot overflow because list_new_prealloc uses PyMem_New, and so this multiplication already succeeded at some point.

@sweeneyde sweeneyde merged commit 2ef73be into python:main Jul 26, 2022
@eendebakpt eendebakpt deleted the performance/list_repeat_v2_specialized branch July 26, 2022 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants