Description
Benchmark for computing the LCM of 1000 random numbers from 1 to 1,000,000:
6.19 ms ± 0.09 ms current_Python
6.14 ms ± 0.12 ms current_C
3.46 ms ± 0.16 ms proposal_Python
Code:
def current_C(xs):
return lcm(*xs)
def current_Python(xs):
# re-implementation of the current C implementation
res = 1
for x in xs:
res = lcm(res, x)
return res
def proposal_Python(xs):
res = 1
for x in xs:
res = lcm(x, res)
return res
full benchmark code
from math import lcm
from random import randint
from timeit import repeat
from statistics import mean, stdev
def current_C(xs):
return lcm(*xs)
def current_Python(xs):
res = 1
for x in xs:
res = lcm(res, x)
return res
def proposal_Python(xs):
res = 1
for x in xs:
res = lcm(x, res)
return res
funcs = current_C, current_Python, proposal_Python
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in times[f][10:]]
return f'{mean(ts):4.2f} ms ± {stdev(ts):4.2f} ms '
for _ in range(20):
xs = [randint(1, 10**6) for _ in range(1000)]
for f in funcs:
t = min(repeat(lambda: f(xs), number=1))
times[f].append(t)
for f in sorted(funcs, key=stats, reverse=True):
print(stats(f), f.__name__)
The difference is using lcm(x, res)
instead of lcm(res, x)
. When computing the LCM of multiple/many values, res
tends to be(come) larger than x
. And then lcm(x, res)
is faster than lcm(res, x)
.
Why? Because lcm(a, b)
is computed as (a // g) * b
, where g = gcd(a, b)
. Let A, B and G be the respective bit lengths, and let's pretend there's no Karatsuba. Then a // g
takes AG
time and results in a number with A-G
bits. Multiplying that with b
takes (A-G)B
time. So overall AG + (A-G)B
= AG + AB - BG
= AB + (A-B)G
time. So it's better to have B > A
.
The simple optimization would be to swap the arguments here:
Line 786 in 81bf10e
In addition to this "likely we have res > x
" rule of thumb, the two-argument lcm
could check which one is longer and swap them if b
is shorter. And then call gcd(b, a)
instead of gcd(a, b)
so that gcd
doesn't have to swap them right back.