Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-37295: Use constant-time comb() for larger n depending on k #30305

Merged
merged 7 commits into from Jan 9, 2022

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Dec 30, 2021

@serhiy-storchaka serhiy-storchaka marked this pull request as ready for review Dec 31, 2021
Modules/mathmodule.c Outdated Show resolved Hide resolved
@mdickinson
Copy link
Member

@mdickinson mdickinson commented Dec 31, 2021

Code changes look good to me (modulo the compiler warnings on Windows). I'm running some timings.

@mdickinson
Copy link
Member

@mdickinson mdickinson commented Dec 31, 2021

I'm running some timings.

Running the same timings scripts that are posted to the issue, I see roughly a 5% slowdown for small comb(n, k) computations (0 <= k <= n <= 67) on this branch, compared with master. But that's compensated for by the wider applicability of the fast path, so I'm fine with that. (EDIT: Fixed the upper limit for n.)

Copy link
Member

@mdickinson mdickinson left a comment

LGTM

Modules/mathmodule.c Outdated Show resolved Hide resolved
@tim-one
Copy link
Member

@tim-one tim-one commented Dec 31, 2021

Running the same timings scripts that are posted to the issue, I see roughly a 5% slowdown for small comb(n, k) computations (0 <= k <= n <= 67)

That's really hard to swallow, Mark. At. e.g., comb(50, 15), the pre-existing fast path did 14 64-bit multiplies and - more significantly - also 14 64-bit divides. How on Earth could that be faster than doing instead 2 multiplies and a shift, pretty much no matter how slow popcount is? A similar argument applies for almost all cases taken over by the newer fast path, the more astonishing to see a slowdown the larger k. For k == 1 the new fast path is clearly slower, and k == 2 is unclear without timing, but the new path seems a clear win for k > 2.

Related: it may or may not be a timing win to set fast_comb_limits1[1] to 0, so k==1 takes the second fast path, which just returns n without any arithmetic. Possibly also for fast_comb_limits1[2], depending on how slow division by 2 is.

@mdickinson
Copy link
Member

@mdickinson mdickinson commented Dec 31, 2021

That's really hard to swallow, Mark.

I'm comparing this branch with the main branch. (Sorry, I said "master" above, but I meant "main".) The main branch already has the fast mod-2**64 + popcount code in it. This branch has the same, but with some extra up-front indirection, so it's a bit slower.

@tim-one
Copy link
Member

@tim-one tim-one commented Dec 31, 2021

I'm comparing this branch with the main branch.

Ah, that explains it - sorry for the noise! 😃

Modules/mathmodule.c Outdated Show resolved Hide resolved
Modules/mathmodule.c Outdated Show resolved Hide resolved
Modules/mathmodule.c Outdated Show resolved Hide resolved
/* Maps k to the maximal n so that 2*k-1 <= n <= 127 and C(n, k)*k
* fits into a long long (which is at least 64 bit). Only contains
* items larger than in fast_comb_limits1. */
static const unsigned long long fast_comb_limits2[] = {
Copy link
Contributor

@arhadthedev arhadthedev Jan 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is at least 64 bit

C99 provides types like uint_least64_t that express such comments explicitly. So I think it will be better to #include <stdint.h> and use it:

Suggested change
/* Maps k to the maximal n so that 2*k-1 <= n <= 127 and C(n, k)*k
* fits into a long long (which is at least 64 bit). Only contains
* items larger than in fast_comb_limits1. */
static const unsigned long long fast_comb_limits2[] = {
/* Maps k to the maximal n so that 2*k-1 <= n <= 127 and C(n, k)*k
* fits into a long long. Only contains
* items larger than in fast_comb_limits1. */
static const uint_least64_t fast_comb_limits2[] = {

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Jan 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should contain at least LLONG_MAX. There is no guarantee that uint_least64_t is at least so large as positive long long.

Copy link
Member

@mdickinson mdickinson left a comment

Still LGTM

@serhiy-storchaka serhiy-storchaka merged commit 2d78797 into python:main Jan 9, 2022
11 checks passed
@serhiy-storchaka serhiy-storchaka deleted the fast_comb branch Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed performance skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants