Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45225: use map function instead of genexpr in capwords #28342

Merged
merged 2 commits into from Sep 16, 2021
Merged

bpo-45225: use map function instead of genexpr in capwords #28342

merged 2 commits into from Sep 16, 2021

Conversation

@speedrun-program
Copy link
Contributor

@speedrun-program speedrun-program commented Sep 14, 2021

In string.py, the capwords function passes str.join a generator expression, but the map function
could be used instead. This is how capwords is currently written:


def capwords(s, sep=None):
    """capwords(s [,sep]) -> string
    
    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join.  If the optional second argument sep is absent or None,
    runs of whitespace characters are replaced by a single space
    and leading and trailing whitespace are removed, otherwise
    sep is used to split and join the words.
    
    """
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))

This is how capwords could be written:


def capwords(s, sep=None):
    """capwords(s [,sep]) -> string
    
    Split the argument into words using split, capitalize each
    word using capitalize, and join the capitalized words using
    join.  If the optional second argument sep is absent or None,
    runs of whitespace characters are replaced by a single space
    and leading and trailing whitespace are removed, otherwise
    sep is used to split and join the words.
    
    """
    return (sep or ' ').join(map(str.capitalize, s.split(sep)))

These are the benefits:

  1. Faster performance which increases with the number of times the str is split.

  2. Very slightly smaller .py and .pyc file sizes.

  3. Source code is slightly more concise.

This is the performance test code in ipython:


def capwords_current(s, sep=None):
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))
​
def capwords_new(s, sep=None):
    return (sep or ' ').join(map(str.capitalize, s.split(sep)))
​
tests = ["a " * 10**n for n in range(9)]
tests.append("a " * (10**9 // 2)) # I only have 16GB of RAM

These are the results of a performance test using %timeit in ipython:


%timeit x = capwords_current("")
835 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit x = capwords_new("")
758 ns ± 35.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


%timeit x = capwords_current(tests[0])
977 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit x = capwords_new(tests[0])
822 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


%timeit x = capwords_current(tests[1])
3.07 µs ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit x = capwords_new(tests[1])
2.17 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


%timeit x = capwords_current(tests[2])
28 µs ± 896 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit x = capwords_new(tests[2])
19.4 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


%timeit x = capwords_current(tests[3])
236 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit x = capwords_new(tests[3])
153 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


%timeit x = capwords_current(tests[4])
2.12 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit x = capwords_new(tests[4])
1.5 ms ± 9.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit x = capwords_current(tests[5])
23.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit x = capwords_new(tests[5])
15.6 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%timeit x = capwords_current(tests[6])
271 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[6])
192 ms ± 807 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit x = capwords_current(tests[7])
2.66 s ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[7])
1.95 s ± 26.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%timeit x = capwords_current(tests[8])
25.9 s ± 80.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[8])
18.4 s ± 123 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%timeit x = capwords_current(tests[9])
6min 17s ± 29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit x = capwords_new(tests[9])
5min 36s ± 24.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


https://bugs.python.org/issue45225

In string.py, the capwords function passes str.join a generator expression, but the map function
could be used instead. This is how capwords is currently written:

--------------------

def capwords(s, sep=None):
    """
    docstring text
    """
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))

--------------------

This is how capwords could be written:

--------------------

def capwords(s, sep=None):
    """
    docstring text
    """
    return (sep or ' ').join(map(str.capitalize, s.split(sep)))

--------------------

These are the benefits:

1. Faster performance which increases with the number of times the str is split.

2. Very slightly smaller .py and .pyc file sizes.

3. Source code is slightly more concise.

This is the performance test code:

--------------------

from timeit import timeit

setup = """
def capwords_current(s, sep=None):
    return (sep or ' ').join(x.capitalize() for x in s.split(sep))

def capwords_new(s, sep=None):
    return (sep or ' ').join(map(str.capitalize, s.split(sep)))

tests = ["a " * 10**n for n in range(9)]
tests.append("a " * (10**9 // 2)) # I only have 16GB of RAM
"""

print("empty str without map:", timeit(setup=setup, stmt="x = capwords_current('')", number=1))
print("empty str with map   :", timeit(setup=setup, stmt="x = capwords_new('')", number=1))
for n in range(9):
    print("- " * 20)
    print(f"10**{n} without map:", timeit(setup=setup, stmt=f"x = capwords_current(tests[{n}])", number=1))
    print(f"10**{n} with map   :", timeit(setup=setup, stmt=f"x = capwords_new(tests[{n}])", number=1))
print("- " * 20)
print("10**9 // 2 without map:", timeit(setup=setup, stmt="x = capwords_current(tests[9])", number=1))
print("10**9 // 2 with map   :", timeit(setup=setup, stmt="x = capwords_new(tests[9])", number=1))

print("done")

--------------------

These are the results of a performance test:

--------------------

empty str without map: 2.0000000000020002e-05
empty str with map   : 1.8100000000020877e-05
- - - - - - - - - - - - - - - - - - - - 
10**0 without map: 1.6600000000033255e-05
10**0 with map   : 1.650000000008589e-05
- - - - - - - - - - - - - - - - - - - - 
10**1 without map: 2.0399999999920482e-05
10**1 with map   : 1.889999999993286e-05
- - - - - - - - - - - - - - - - - - - - 
10**2 without map: 5.489999999985784e-05
10**2 with map   : 4.6400000000001995e-05
- - - - - - - - - - - - - - - - - - - - 
10**3 without map: 0.00026530000000013487
10**3 with map   : 0.0001765000000002459
- - - - - - - - - - - - - - - - - - - - 
10**4 without map: 0.0026298000000002375
10**4 with map   : 0.0014880999999999922
- - - - - - - - - - - - - - - - - - - - 
10**5 without map: 0.023361799999999988
10**5 with map   : 0.016615499999999894
- - - - - - - - - - - - - - - - - - - - 
10**6 without map: 0.24672029999999978
10**6 with map   : 0.1923338999999995
- - - - - - - - - - - - - - - - - - - - 
10**7 without map: 2.562209
10**7 with map   : 1.8905919000000004
- - - - - - - - - - - - - - - - - - - - 
10**8 without map: 26.3537843
10**8 with map   : 18.781561099999998
- - - - - - - - - - - - - - - - - - - - 
10**9 // 2 without map: 349.0668948
10**9 // 2 with map   : 312.15139230000005
done

--------------------
@the-knights-who-say-ni
Copy link

@the-knights-who-say-ni the-knights-who-say-ni commented Sep 14, 2021

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@speedrun-program

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@speedrun-program speedrun-program changed the title use map function instead of genexpr in capwords bpo-45225: use map function instead of genexpr in capwords Sep 16, 2021
@rhettinger rhettinger merged commit a59ede2 into python:main Sep 16, 2021
12 checks passed
@speedrun-program speedrun-program deleted the patch-1 branch Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants