New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45225: use map function instead of genexpr in capwords #28342
Conversation
In string.py, the capwords function passes str.join a generator expression, but the map function could be used instead. This is how capwords is currently written: -------------------- def capwords(s, sep=None): """ docstring text """ return (sep or ' ').join(x.capitalize() for x in s.split(sep)) -------------------- This is how capwords could be written: -------------------- def capwords(s, sep=None): """ docstring text """ return (sep or ' ').join(map(str.capitalize, s.split(sep))) -------------------- These are the benefits: 1. Faster performance which increases with the number of times the str is split. 2. Very slightly smaller .py and .pyc file sizes. 3. Source code is slightly more concise. This is the performance test code: -------------------- from timeit import timeit setup = """ def capwords_current(s, sep=None): return (sep or ' ').join(x.capitalize() for x in s.split(sep)) def capwords_new(s, sep=None): return (sep or ' ').join(map(str.capitalize, s.split(sep))) tests = ["a " * 10**n for n in range(9)] tests.append("a " * (10**9 // 2)) # I only have 16GB of RAM """ print("empty str without map:", timeit(setup=setup, stmt="x = capwords_current('')", number=1)) print("empty str with map :", timeit(setup=setup, stmt="x = capwords_new('')", number=1)) for n in range(9): print("- " * 20) print(f"10**{n} without map:", timeit(setup=setup, stmt=f"x = capwords_current(tests[{n}])", number=1)) print(f"10**{n} with map :", timeit(setup=setup, stmt=f"x = capwords_new(tests[{n}])", number=1)) print("- " * 20) print("10**9 // 2 without map:", timeit(setup=setup, stmt="x = capwords_current(tests[9])", number=1)) print("10**9 // 2 with map :", timeit(setup=setup, stmt="x = capwords_new(tests[9])", number=1)) print("done") -------------------- These are the results of a performance test: -------------------- empty str without map: 2.0000000000020002e-05 empty str with map : 1.8100000000020877e-05 - - - - - - - - - - - - - - - - - - - - 10**0 without map: 1.6600000000033255e-05 10**0 with map : 1.650000000008589e-05 - - - - - - - - - - - - - - - - - - - - 10**1 without map: 2.0399999999920482e-05 10**1 with map : 1.889999999993286e-05 - - - - - - - - - - - - - - - - - - - - 10**2 without map: 5.489999999985784e-05 10**2 with map : 4.6400000000001995e-05 - - - - - - - - - - - - - - - - - - - - 10**3 without map: 0.00026530000000013487 10**3 with map : 0.0001765000000002459 - - - - - - - - - - - - - - - - - - - - 10**4 without map: 0.0026298000000002375 10**4 with map : 0.0014880999999999922 - - - - - - - - - - - - - - - - - - - - 10**5 without map: 0.023361799999999988 10**5 with map : 0.016615499999999894 - - - - - - - - - - - - - - - - - - - - 10**6 without map: 0.24672029999999978 10**6 with map : 0.1923338999999995 - - - - - - - - - - - - - - - - - - - - 10**7 without map: 2.562209 10**7 with map : 1.8905919000000004 - - - - - - - - - - - - - - - - - - - - 10**8 without map: 26.3537843 10**8 with map : 18.781561099999998 - - - - - - - - - - - - - - - - - - - - 10**9 // 2 without map: 349.0668948 10**9 // 2 with map : 312.15139230000005 done --------------------
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
In string.py, the capwords function passes str.join a generator expression, but the map function
could be used instead. This is how capwords is currently written:
This is how capwords could be written:
These are the benefits:
Faster performance which increases with the number of times the str is split.
Very slightly smaller .py and .pyc file sizes.
Source code is slightly more concise.
This is the performance test code in ipython:
These are the results of a performance test using %timeit in ipython:
%timeit x = capwords_current("")
835 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_new("")
758 ns ± 35.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_current(tests[0])
977 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_new(tests[0])
822 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_current(tests[1])
3.07 µs ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit x = capwords_new(tests[1])
2.17 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit x = capwords_current(tests[2])
28 µs ± 896 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x = capwords_new(tests[2])
19.4 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit x = capwords_current(tests[3])
236 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x = capwords_new(tests[3])
153 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x = capwords_current(tests[4])
2.12 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = capwords_new(tests[4])
1.5 ms ± 9.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x = capwords_current(tests[5])
23.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit x = capwords_new(tests[5])
15.6 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = capwords_current(tests[6])
271 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[6])
192 ms ± 807 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit x = capwords_current(tests[7])
2.66 s ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[7])
1.95 s ± 26.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_current(tests[8])
25.9 s ± 80.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[8])
18.4 s ± 123 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_current(tests[9])
6min 17s ± 29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[9])
5min 36s ± 24.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
https://bugs.python.org/issue45225
The text was updated successfully, but these errors were encountered: