Skip to content

bpo-26415: reduce peak memory consumption by the parser #10995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tyomitch
Copy link
Contributor

@tyomitch tyomitch commented Dec 6, 2018

Compress some branchless stretches in the parse tree into a single node.
AST is not affected.

https://bugs.python.org/issue26415

@the-knights-who-say-ni

This comment has been minimized.

@tyomitch
Copy link
Contributor Author

Pyperformance (0.7.0) reports significant improvement of memory footprint, ranging from "1.03x smaller" to "1.44x smaller", for every benchmark apart from sympy_integrate:

### 2to3 ###
Mean +- std dev: 8597.4 kB +- 32.6 kB -> 6969.4 kB +- 13.3 kB: 1.23x smaller
Significant (t=206.57)

### chaos ###
Mean +- std dev: 9543.2 kB +- 60.5 kB -> 7886.2 kB +- 40.5 kB: 1.21x smaller
Significant (t=101.74)

### crypto_pyaes ###
Mean +- std dev: 9078.2 kB +- 83.4 kB -> 7434.6 kB +- 55.5 kB: 1.22x smaller
Significant (t=73.36)

### deltablue ###
Mean +- std dev: 9219.6 kB +- 45.4 kB -> 8025.4 kB +- 15.8 kB: 1.15x smaller
Significant (t=111.01)

### django_template ###
Mean +- std dev: 20.3 MB +- 97.8 kB -> 18.5 MB +- 32.3 kB: 1.10x smaller
Significant (t=79.60)

### dulwich_log ###
Mean +- std dev: 12.7 MB +- 71.5 kB -> 11.2 MB +- 29.9 kB: 1.14x smaller
Significant (t=90.93)

### fannkuch ###
Mean +- std dev: 8574.4 kB +- 27.8 kB -> 6969.8 kB +- 18.1 kB: 1.23x smaller
Significant (t=216.07)

### float ###
Mean +- std dev: 22.1 MB +- 44.8 kB -> 20.5 MB +- 25.8 kB: 1.08x smaller
Significant (t=138.63)

### go ###
Mean +- std dev: 10.2 MB +- 36.1 kB -> 8995.6 kB +- 39.7 kB: 1.16x smaller
Significant (t=122.30)

### hexiom ###
Mean +- std dev: 8778.2 kB +- 22.0 kB -> 7248.4 kB +- 21.5 kB: 1.21x smaller
Significant (t=222.50)

### html5lib ###
Mean +- std dev: 21.5 MB +- 67.3 kB -> 19.8 MB +- 37.5 kB: 1.09x smaller
Significant (t=104.50)

### json_dumps ###
Mean +- std dev: 9760.8 kB +- 56.1 kB -> 8188.2 kB +- 42.4 kB: 1.19x smaller
Significant (t=100.05)

### json_loads ###
Mean +- std dev: 9190.0 kB +- 43.3 kB -> 7607.4 kB +- 23.3 kB: 1.21x smaller
Significant (t=143.88)

### logging_format ###
Mean +- std dev: 9194.6 kB +- 34.0 kB -> 8670.2 kB +- 199.4 kB: 1.06x smaller
Significant (t=11.59)

### logging_silent ###
Mean +- std dev: 8906.8 kB +- 36.9 kB -> 7312.2 kB +- 23.8 kB: 1.22x smaller
Significant (t=162.35)

### logging_simple ###
Mean +- std dev: 9131.8 kB +- 40.6 kB -> 8290.8 kB +- 103.1 kB: 1.10x smaller
Significant (t=33.96)

### mako ###
Mean +- std dev: 16.4 MB +- 583.3 kB -> 14.9 MB +- 416.7 kB: 1.10x smaller
Significant (t=9.27)

### meteor_contest ###
Mean +- std dev: 10.5 MB +- 46.2 kB -> 9184.0 kB +- 37.4 kB: 1.17x smaller
Significant (t=115.84)

### nbody ###
Mean +- std dev: 8653.8 kB +- 61.6 kB -> 7232.4 kB +- 45.7 kB: 1.20x smaller
Significant (t=82.90)

### nqueens ###
Mean +- std dev: 8828.2 kB +- 42.8 kB -> 7417.8 kB +- 52.4 kB: 1.19x smaller
Significant (t=93.29)

### pathlib ###
Mean +- std dev: 10.3 MB +- 84.6 kB -> 9027.6 kB +- 68.3 kB: 1.17x smaller
Significant (t=61.66)

### pickle ###
Mean +- std dev: 9473.2 kB +- 64.0 kB -> 7970.4 kB +- 74.5 kB: 1.19x smaller
Significant (t=68.39)

### pickle_dict ###
Mean +- std dev: 9550.6 kB +- 107.8 kB -> 7859.0 kB +- 70.4 kB: 1.22x smaller
Significant (t=58.76)

### pickle_list ###
Mean +- std dev: 9490.6 kB +- 54.5 kB -> 7799.0 kB +- 25.8 kB: 1.22x smaller
Significant (t=125.48)

### pickle_pure_python ###
Mean +- std dev: 9429.2 kB +- 73.2 kB -> 7836.8 kB +- 19.6 kB: 1.20x smaller
Significant (t=94.00)

### pidigits ###
Mean +- std dev: 8747.4 kB +- 50.3 kB -> 7234.8 kB +- 43.2 kB: 1.21x smaller
Significant (t=102.04)

### python_startup ###
Mean +- std dev: 10.5 MB +- 22.7 kB -> 7604.2 kB +- 26.4 kB: 1.42x smaller
Significant (t=407.98)

### python_startup_no_site ###
Mean +- std dev: 10.0 MB +- 19.0 kB -> 7106.6 kB +- 29.3 kB: 1.44x smaller
Significant (t=404.06)

### raytrace ###
Mean +- std dev: 9054.4 kB +- 61.1 kB -> 7533.4 kB +- 86.7 kB: 1.20x smaller
Significant (t=64.12)

### regex_compile ###
Mean +- std dev: 9840.8 kB +- 23.3 kB -> 8361.4 kB +- 54.5 kB: 1.18x smaller
Significant (t=111.65)

### regex_dna ###
Mean +- std dev: 14.7 MB +- 51.6 kB -> 13.2 MB +- 72.1 kB: 1.11x smaller
Significant (t=74.04)

### regex_effbot ###
Mean +- std dev: 9273.6 kB +- 21.1 kB -> 7749.8 kB +- 28.3 kB: 1.20x smaller
Significant (t=193.05)

### regex_v8 ###
Mean +- std dev: 9785.2 kB +- 54.8 kB -> 8162.8 kB +- 39.7 kB: 1.20x smaller
Significant (t=107.14)

### richards ###
Mean +- std dev: 8731.4 kB +- 43.6 kB -> 7262.2 kB +- 56.1 kB: 1.20x smaller
Significant (t=92.50)

### scimark_fft ###
Mean +- std dev: 8784.6 kB +- 19.9 kB -> 7247.8 kB +- 34.4 kB: 1.21x smaller
Significant (t=173.03)

### scimark_lu ###
Mean +- std dev: 8807.6 kB +- 23.2 kB -> 7222.8 kB +- 34.2 kB: 1.22x smaller
Significant (t=171.59)

### scimark_monte_carlo ###
Mean +- std dev: 8745.2 kB +- 21.1 kB -> 7147.2 kB +- 32.5 kB: 1.22x smaller
Significant (t=184.62)

### scimark_sor ###
Mean +- std dev: 8726.8 kB +- 16.0 kB -> 7151.6 kB +- 30.5 kB: 1.22x smaller
Significant (t=204.58)

### scimark_sparse_mat_mult ###
Mean +- std dev: 9240.2 kB +- 30.5 kB -> 7652.8 kB +- 29.3 kB: 1.21x smaller
Significant (t=168.01)

### spectral_norm ###
Mean +- std dev: 8583.4 kB +- 27.5 kB -> 7090.2 kB +- 39.4 kB: 1.21x smaller
Significant (t=139.03)

### sqlalchemy_declarative ###
Mean +- std dev: 19.5 MB +- 34.6 kB -> 17.9 MB +- 21.6 kB: 1.09x smaller
Significant (t=177.78)

### sqlalchemy_imperative ###
Mean +- std dev: 18.9 MB +- 67.8 kB -> 17.2 MB +- 35.5 kB: 1.10x smaller
Significant (t=100.06)

### sqlite_synth ###
Mean +- std dev: 8818.4 kB +- 42.1 kB -> 7376.6 kB +- 15.4 kB: 1.20x smaller
Significant (t=143.69)

### sympy_expand ###
Mean +- std dev: 33.0 MB +- 60.3 kB -> 31.5 MB +- 33.3 kB: 1.05x smaller
Significant (t=103.54)

### sympy_integrate ###
Mean +- std dev: 32.2 MB +- 47.6 kB -> 31.6 MB +- 61.4 kB: 1.02x smaller
Not significant

### sympy_str ###
Mean +- std dev: 33.8 MB +- 43.1 kB -> 32.2 MB +- 68.0 kB: 1.05x smaller
Significant (t=91.21)

### sympy_sum ###
Mean +- std dev: 57.0 MB +- 58.1 kB -> 55.4 MB +- 63.6 kB: 1.03x smaller
Significant (t=86.36)

### telco ###
Mean +- std dev: 9172.2 kB +- 30.2 kB -> 7515.8 kB +- 68.0 kB: 1.22x smaller
Significant (t=99.61)

### tornado_http ###
Mean +- std dev: 25.7 MB +- 267.7 kB -> 24.1 MB +- 102.6 kB: 1.07x smaller
Significant (t=25.77)

### unpack_sequence ###
Mean +- std dev: 8913.4 kB +- 38.7 kB -> 7315.2 kB +- 13.1 kB: 1.22x smaller
Significant (t=174.87)

### unpickle ###
Mean +- std dev: 9483.8 kB +- 47.4 kB -> 7769.6 kB +- 23.2 kB: 1.22x smaller
Significant (t=145.30)

### unpickle_list ###
Mean +- std dev: 9474.8 kB +- 55.5 kB -> 7773.4 kB +- 27.2 kB: 1.22x smaller
Significant (t=123.16)

### unpickle_pure_python ###
Mean +- std dev: 9489.4 kB +- 92.1 kB -> 7857.8 kB +- 27.8 kB: 1.21x smaller
Significant (t=75.81)

### xml_etree_generate ###
Mean +- std dev: 14.1 MB +- 85.2 kB -> 12.2 MB +- 123.0 kB: 1.15x smaller
Significant (t=56.98)

### xml_etree_iterparse ###
Mean +- std dev: 13.5 MB +- 30.3 kB -> 11.8 MB +- 87.5 kB: 1.15x smaller
Significant (t=85.90)

### xml_etree_parse ###
Mean +- std dev: 13.6 MB +- 60.6 kB -> 11.8 MB +- 66.8 kB: 1.15x smaller
Significant (t=91.00)

### xml_etree_process ###
Mean +- std dev: 14.2 MB +- 29.9 kB -> 12.5 MB +- 88.0 kB: 1.13x smaller
Significant (t=83.13)

Effects on benchmark times are less substantial, and range from "1.18x faster" for python_startup_no_site, to "1.22x slower" for unpack_sequence. The latter is likely pure noise because the measured times are in nanoseconds range.

@tyomitch
Copy link
Contributor Author

When I leave my machine alone while it's running the benchmarks, the results are even more exciting: all significant differences are in favour of the patched parser, ranging from "1.03x faster" to "1.26x faster".

### 2to3 ###
Mean +- std dev: 709 ms +- 62 ms -> 674 ms +- 59 ms: 1.05x faster
Significant (t=3.17)

### chaos ###
Mean +- std dev: 308 ms +- 36 ms -> 292 ms +- 39 ms: 1.06x faster
Significant (t=2.45)

### crypto_pyaes ###
Mean +- std dev: 318 ms +- 31 ms -> 280 ms +- 43 ms: 1.14x faster
Significant (t=5.55)

### deltablue ###
Mean +- std dev: 18.5 ms +- 2.0 ms -> 16.3 ms +- 2.5 ms: 1.13x faster
Significant (t=5.13)

### django_template ###
Mean +- std dev: 286 ms +- 16 ms -> 265 ms +- 25 ms: 1.08x faster
Significant (t=5.34)

### dulwich_log ###
Mean +- std dev: 133 ms +- 11 ms -> 131 ms +- 10 ms: 1.01x faster
Not significant

### fannkuch ###
Mean +- std dev: 1.52 sec +- 0.08 sec -> 1.36 sec +- 0.11 sec: 1.12x faster
Significant (t=8.86)

### float ###
Mean +- std dev: 353 ms +- 55 ms -> 298 ms +- 72 ms: 1.18x faster
Significant (t=4.72)

### go ###
Mean +- std dev: 658 ms +- 48 ms -> 562 ms +- 69 ms: 1.17x faster
Significant (t=8.86)

### hexiom ###
Mean +- std dev: 26.8 ms +- 6.8 ms -> 24.9 ms +- 5.1 ms: 1.08x faster
Not significant

### html5lib ###
Mean +- std dev: 223 ms +- 28 ms -> 198 ms +- 29 ms: 1.12x faster
Significant (t=4.74)

### json_dumps ###
Mean +- std dev: 34.3 ms +- 4.3 ms -> 31.9 ms +- 4.7 ms: 1.08x faster
Significant (t=2.98)

### json_loads ###
Mean +- std dev: 59.8 us +- 4.4 us -> 56.8 us +- 7.4 us: 1.05x faster
Significant (t=2.74)

### logging_format ###
Mean +- std dev: 22.4 us +- 1.6 us -> 21.3 us +- 2.6 us: 1.05x faster
Significant (t=2.82)

### logging_silent ###
Mean +- std dev: 511 ns +- 70 ns -> 475 ns +- 79 ns: 1.07x faster
Significant (t=2.60)

### logging_simple ###
Mean +- std dev: 20.9 us +- 1.4 us -> 19.7 us +- 1.9 us: 1.06x faster
Significant (t=3.64)

### mako ###
Mean +- std dev: 47.9 ms +- 6.7 ms -> 41.2 ms +- 8.0 ms: 1.16x faster
Significant (t=4.95)

### meteor_contest ###
Mean +- std dev: 247 ms +- 31 ms -> 227 ms +- 33 ms: 1.08x faster
Significant (t=3.24)

### nbody ###
Mean +- std dev: 464 ms +- 65 ms -> 370 ms +- 82 ms: 1.26x faster
Significant (t=7.01)

### nqueens ###
Mean +- std dev: 307 ms +- 28 ms -> 266 ms +- 48 ms: 1.16x faster
Significant (t=5.82)

### pathlib ###
Mean +- std dev: 47.4 ms +- 5.2 ms -> 44.4 ms +- 5.9 ms: 1.07x faster
Significant (t=2.89)

### pickle ###
Mean +- std dev: 18.8 us +- 2.3 us -> 17.6 us +- 2.4 us: 1.07x faster
Significant (t=2.77)

### pickle_dict ###
Mean +- std dev: 62.9 us +- 4.9 us -> 58.3 us +- 7.6 us: 1.08x faster
Significant (t=3.97)

### pickle_list ###
Mean +- std dev: 8.65 us +- 0.53 us -> 8.22 us +- 0.77 us: 1.05x faster
Significant (t=3.59)

### pickle_pure_python ###
Mean +- std dev: 1.22 ms +- 0.14 ms -> 1.12 ms +- 0.16 ms: 1.08x faster
Significant (t=3.51)

### pidigits ###
Mean +- std dev: 202 ms +- 4 ms -> 205 ms +- 9 ms: 1.02x slower
Not significant

### python_startup ###
Mean +- std dev: 12.5 ms +- 1.1 ms -> 12.2 ms +- 1.1 ms: 1.02x faster
Not significant

### python_startup_no_site ###
Mean +- std dev: 8.74 ms +- 0.79 ms -> 8.86 ms +- 0.70 ms: 1.01x slower
Not significant

### raytrace ###
Mean +- std dev: 1.30 sec +- 0.08 sec -> 1.18 sec +- 0.11 sec: 1.11x faster
Significant (t=7.10)

### regex_compile ###
Mean +- std dev: 449 ms +- 42 ms -> 427 ms +- 46 ms: 1.05x faster
Significant (t=2.72)

### regex_dna ###
Mean +- std dev: 546 ms +- 43 ms -> 520 ms +- 51 ms: 1.05x faster
Significant (t=3.03)

### regex_effbot ###
Mean +- std dev: 12.1 ms +- 1.2 ms -> 11.1 ms +- 1.4 ms: 1.09x faster
Significant (t=4.10)

### regex_v8 ###
Mean +- std dev: 77.9 ms +- 6.4 ms -> 74.5 ms +- 8.6 ms: 1.05x faster
Significant (t=2.50)

### richards ###
Mean +- std dev: 179 ms +- 20 ms -> 169 ms +- 26 ms: 1.06x faster
Significant (t=2.42)

### scimark_fft ###
Mean +- std dev: 1.32 sec +- 0.09 sec -> 1.21 sec +- 0.11 sec: 1.10x faster
Significant (t=6.25)

### scimark_lu ###
Mean +- std dev: 519 ms +- 55 ms -> 485 ms +- 55 ms: 1.07x faster
Significant (t=3.34)

### scimark_monte_carlo ###
Mean +- std dev: 339 ms +- 44 ms -> 305 ms +- 47 ms: 1.11x faster
Significant (t=4.11)

### scimark_sor ###
Mean +- std dev: 643 ms +- 39 ms -> 577 ms +- 58 ms: 1.11x faster
Significant (t=7.27)

### scimark_sparse_mat_mult ###
Mean +- std dev: 20.9 ms +- 2.7 ms -> 18.9 ms +- 4.1 ms: 1.10x faster
Significant (t=3.10)

### spectral_norm ###
Mean +- std dev: 484 ms +- 52 ms -> 433 ms +- 64 ms: 1.12x faster
Significant (t=4.80)

### sqlalchemy_declarative ###
Mean +- std dev: 297 ms +- 23 ms -> 291 ms +- 28 ms: 1.02x faster
Not significant

### sqlalchemy_imperative ###
Mean +- std dev: 47.4 ms +- 3.6 ms -> 45.8 ms +- 4.3 ms: 1.03x faster
Significant (t=2.18)

### sqlite_synth ###
Mean +- std dev: 9.07 us +- 1.43 us -> 8.52 us +- 1.16 us: 1.06x faster
Significant (t=2.31)

### sympy_expand ###
Mean +- std dev: 923 ms +- 62 ms -> 859 ms +- 71 ms: 1.07x faster
Significant (t=5.28)

### sympy_integrate ###
Mean +- std dev: 40.8 ms +- 4.2 ms -> 38.3 ms +- 5.2 ms: 1.07x faster
Significant (t=2.95)

### sympy_str ###
Mean +- std dev: 430 ms +- 35 ms -> 394 ms +- 47 ms: 1.09x faster
Significant (t=4.77)

### sympy_sum ###
Mean +- std dev: 195 ms +- 12 ms -> 180 ms +- 19 ms: 1.09x faster
Significant (t=5.42)

### telco ###
Mean +- std dev: 17.4 ms +- 1.8 ms -> 16.6 ms +- 1.6 ms: 1.05x faster
Significant (t=2.61)

### tornado_http ###
Mean +- std dev: 319 ms +- 14 ms -> 310 ms +- 21 ms: 1.03x faster
Significant (t=2.63)

### unpack_sequence ###
Mean +- std dev: 216 ns +- 46 ns -> 185 ns +- 50 ns: 1.17x faster
Significant (t=3.57)

### unpickle ###
Mean +- std dev: 31.9 us +- 4.0 us -> 31.3 us +- 4.6 us: 1.02x faster
Not significant

### unpickle_list ###
Mean +- std dev: 9.72 us +- 0.94 us -> 9.33 us +- 0.98 us: 1.04x faster
Significant (t=2.22)

### unpickle_pure_python ###
Mean +- std dev: 1.01 ms +- 0.13 ms -> 0.91 ms +- 0.14 ms: 1.11x faster
Significant (t=4.07)

### xml_etree_generate ###
Mean +- std dev: 293 ms +- 32 ms -> 278 ms +- 34 ms: 1.05x faster
Significant (t=2.37)

### xml_etree_iterparse ###
Mean +- std dev: 312 ms +- 29 ms -> 297 ms +- 37 ms: 1.05x faster
Significant (t=2.44)

### xml_etree_parse ###
Mean +- std dev: 445 ms +- 40 ms -> 430 ms +- 43 ms: 1.03x faster
Not significant

### xml_etree_process ###
Mean +- std dev: 237 ms +- 18 ms -> 224 ms +- 27 ms: 1.06x faster
Significant (t=3.29)

@tyomitch tyomitch force-pushed the master branch 3 times, most recently from 2c88029 to 57a74b7 Compare March 26, 2019 11:13
@tyomitch tyomitch requested a review from a team as a code owner March 26, 2019 11:13
@methane methane added the performance Performance or resource usage label Apr 4, 2019
@methane
Copy link
Member

methane commented Apr 5, 2019

I can not believe this affects runtime performance. I tested this PR on my machine (dedicated Linux box).

$ pyperf compare_to master.json 10995.json -G --min-speed=1
Slower (2):
- pickle_dict: 53.2 us +- 0.2 us -> 54.0 us +- 2.2 us: 1.02x slower (+2%)
- scimark_monte_carlo: 262 ms +- 2 ms -> 265 ms +- 4 ms: 1.01x slower (+1%)

Faster (1):
- unpickle: 30.2 us +- 2.3 us -> 28.7 us +- 2.2 us: 1.05x faster (-5%)

Benchmark hidden because not significant (54):

@tyomitch
Copy link
Contributor Author

tyomitch commented Apr 5, 2019

I can not believe this affects runtime performance. I tested this PR on my machine (dedicated Linux box).

Even if it doesn't, it still affects the memory footprint.

Btw, now that @pablogsal has rewritten the parser generator into Python, this patch should be a bit easier to review: the changes to parsermodule.c have reduced from 183 to just 3 LOC.

@methane
Copy link
Member

methane commented Apr 5, 2019

Could you use static const instead of static in gramvalid.c?

@tyomitch
Copy link
Contributor Author

tyomitch commented Apr 7, 2019

Could you use static const instead of static in gramvalid.c?

Thank you for the suggestion!
Submitted for review at #12713

Compress some branchless stretches in the parse tree into a single node.
AST is not affected.
@csabella
Copy link
Contributor

@tyomitch, please resolve the merge conflicts. Thank you!

@csabella
Copy link
Contributor

@tyomitch ping

@pablogsal
Copy link
Member

Given that we have a new parser in 3.9 and the old parser will be removed in 3.10, this PR is obsolete so I will proceed to close it. Thanks for thinking about this problem, though!

@pablogsal pablogsal closed this May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants