bpo-46841: Quicken code in-place #31888

brandtbucher · 2022-03-15T05:35:59Z

This moves the bytecode to the end of the corresponding PyCodeObject, and quickens it in-place.

Related changes:

PyCodeObject is now more compact. I've removed the almost-always-unused co_varnames, co_freevars, and co_cellvars member caches, and rearranged some int members to fill some holes in the struct on 64-bit builds. co_code has been removed and replaced with _PyCode_GetCode, and co_quickened and co_firstinstr have been replaced with _PyCode_CODE.
_PyOpcode_Deopt is a new mapping from all opcodes to their un-quickened forms.
_PyOpcode_InlineCacheEntries is renamed to _PyOpcode_Caches, _Py_IncrementCountAndMaybeQuicken is renamed to _PyCode_Warmup, _Py_Quicken is renamed to _PyCode_Quicken, and _co_quickened is renamed to _co_code_adaptive (and is now a read-only memoryview).
We don't emit unused nonzero opargs anymore in the compiler.

https://bugs.python.org/issue46841

brandtbucher · 2022-03-15T22:51:39Z

It looks like this results in a 3% memory improvement across all benchmarks:

Slower (1):
- regex_dna: 13.9 MB +- 374.4 kB -> 14.8 MB +- 311.7 kB: 1.07x slower

Faster (49):
- xml_etree_generate: 12.8 MB +- 340.8 kB -> 11.5 MB +- 329.9 kB: 1.11x faster
- logging_simple: 16.4 MB +- 1428.4 kB -> 14.9 MB +- 1906.2 kB: 1.10x faster
- html5lib: 27.4 MB +- 1000.0 kB -> 25.6 MB +- 1662.1 kB: 1.07x faster
- xml_etree_process: 12.8 MB +- 467.9 kB -> 12.1 MB +- 371.3 kB: 1.06x faster
- pidigits: 7594.4 kB +- 233.7 kB -> 7238.7 kB +- 100.3 kB: 1.05x faster
- regex_compile: 8885.3 kB +- 424.5 kB -> 8524.6 kB +- 159.0 kB: 1.04x faster
- unpickle: 8016.9 kB +- 384.9 kB -> 7694.8 kB +- 178.5 kB: 1.04x faster
- telco: 8009.3 kB +- 358.4 kB -> 7690.8 kB +- 327.4 kB: 1.04x faster
- pickle_pure_python: 8018.5 kB +- 390.8 kB -> 7706.8 kB +- 231.7 kB: 1.04x faster
- json_loads: 7637.5 kB +- 239.3 kB -> 7350.9 kB +- 186.3 kB: 1.04x faster
- chaos: 8442.3 kB +- 116.6 kB -> 8125.8 kB +- 39.7 kB: 1.04x faster
- logging_silent: 8057.6 kB +- 264.3 kB -> 7761.4 kB +- 169.1 kB: 1.04x faster
- fannkuch: 7315.6 kB +- 213.7 kB -> 7061.8 kB +- 186.3 kB: 1.04x faster
- pickle_dict: 7948.3 kB +- 257.2 kB -> 7677.0 kB +- 219.3 kB: 1.04x faster
- scimark_lu: 8120.4 kB +- 340.0 kB -> 7843.8 kB +- 348.3 kB: 1.04x faster
- sympy_integrate: 60.9 MB +- 424.5 kB -> 58.8 MB +- 456.4 kB: 1.04x faster
- meteor_contest: 9609.0 kB +- 374.0 kB -> 9283.7 kB +- 125.0 kB: 1.04x faster
- scimark_fft: 8204.3 kB +- 217.3 kB -> 7927.2 kB +- 227.9 kB: 1.03x faster
- spectral_norm: 7379.3 kB +- 241.6 kB -> 7147.8 kB +- 271.3 kB: 1.03x faster
- nbody: 7564.3 kB +- 252.4 kB -> 7327.0 kB +- 259.3 kB: 1.03x faster
- scimark_sor: 8102.5 kB +- 196.4 kB -> 7852.7 kB +- 352.7 kB: 1.03x faster
- crypto_pyaes: 7926.6 kB +- 120.6 kB -> 7689.8 kB +- 206.4 kB: 1.03x faster
- sympy_str: 61.3 MB +- 39.1 kB -> 59.4 MB +- 51.4 kB: 1.03x faster
- richards: 7951.8 kB +- 286.5 kB -> 7715.4 kB +- 244.9 kB: 1.03x faster
- unpack_sequence: 9026.1 kB +- 296.4 kB -> 8760.5 kB +- 387.8 kB: 1.03x faster
- django_template: 37.8 MB +- 116.9 kB -> 36.7 MB +- 109.5 kB: 1.03x faster
- sympy_expand: 60.1 MB +- 44.5 kB -> 58.4 MB +- 50.4 kB: 1.03x faster
- sympy_sum: 71.9 MB +- 2189.4 kB -> 69.9 MB +- 2175.3 kB: 1.03x faster
- raytrace: 8276.0 kB +- 152.7 kB -> 8050.4 kB +- 161.2 kB: 1.03x faster
- dulwich_log: 15.3 MB +- 94.2 kB -> 14.9 MB +- 99.3 kB: 1.03x faster
- pickle_list: 7874.5 kB +- 191.8 kB -> 7666.5 kB +- 173.0 kB: 1.03x faster
- pickle: 7937.0 kB +- 203.7 kB -> 7731.4 kB +- 386.8 kB: 1.03x faster
- regex_effbot: 8133.9 kB +- 222.4 kB -> 7924.1 kB +- 237.5 kB: 1.03x faster
- unpickle_list: 7853.8 kB +- 211.0 kB -> 7656.6 kB +- 201.3 kB: 1.03x faster
- deltablue: 9946.5 kB +- 169.6 kB -> 9698.5 kB +- 98.9 kB: 1.03x faster
- sqlite_synth: 9487.7 kB +- 41.4 kB -> 9255.3 kB +- 40.3 kB: 1.03x faster
- unpickle_pure_python: 7912.4 kB +- 187.2 kB -> 7720.2 kB +- 209.8 kB: 1.02x faster
- tornado_http: 29.9 MB +- 899.9 kB -> 29.2 MB +- 831.3 kB: 1.02x faster
- scimark_sparse_mat_mult: 8554.4 kB +- 114.1 kB -> 8358.8 kB +- 340.0 kB: 1.02x faster
- go: 9354.8 kB +- 372.5 kB -> 9142.5 kB +- 367.4 kB: 1.02x faster
- xml_etree_iterparse: 12.4 MB +- 233.6 kB -> 12.1 MB +- 307.8 kB: 1.02x faster
- pathlib: 9057.6 kB +- 173.9 kB -> 8863.2 kB +- 166.3 kB: 1.02x faster
- chameleon: 20.0 MB +- 346.1 kB -> 19.6 MB +- 316.4 kB: 1.02x faster
- regex_v8: 13.2 MB +- 143.0 kB -> 12.9 MB +- 199.2 kB: 1.02x faster
- python_startup_no_site: 11.3 MB +- 34.3 kB -> 11.1 MB +- 23.3 kB: 1.02x faster
- python_startup: 11.3 MB +- 39.6 kB -> 11.1 MB +- 24.2 kB: 1.02x faster
- xml_etree_parse: 11.7 MB +- 90.7 kB -> 11.5 MB +- 464.2 kB: 1.02x faster
- 2to3: 22.6 MB +- 39.0 kB -> 22.3 MB +- 45.5 kB: 1.01x faster
- json_dumps: 9413.9 kB +- 48.8 kB -> 9294.5 kB +- 325.8 kB: 1.01x faster

Benchmark hidden because not significant (7): float, hexiom, logging_format, mako, nqueens, pyflate, scimark_monte_carlo

Geometric mean: 1.03x faster

Ignore pyperf's incorrect "faster"/"slower" terminology... we're measuring memory usage here. I'm still waiting on actual performance numbers for this.

Tools/scripts/deepfreeze.py

brandtbucher · 2022-03-16T21:08:06Z

1% perf improvement too:

Slower (11):
- pickle_dict: 27.6 us +- 0.2 us -> 28.4 us +- 0.3 us: 1.03x slower
- html5lib: 65.3 ms +- 2.7 ms -> 67.2 ms +- 2.8 ms: 1.03x slower
- pickle_list: 4.33 us +- 0.05 us -> 4.46 us +- 0.05 us: 1.03x slower
- regex_v8: 23.1 ms +- 0.2 ms -> 23.8 ms +- 0.2 ms: 1.03x slower
- regex_dna: 217 ms +- 1 ms -> 223 ms +- 4 ms: 1.03x slower
- scimark_lu: 111 ms +- 2 ms -> 113 ms +- 2 ms: 1.02x slower
- regex_effbot: 3.46 ms +- 0.06 ms -> 3.50 ms +- 0.05 ms: 1.01x slower
- json_dumps: 12.6 ms +- 0.1 ms -> 12.8 ms +- 0.2 ms: 1.01x slower
- fannkuch: 397 ms +- 3 ms -> 400 ms +- 5 ms: 1.01x slower
- json_loads: 28.1 us +- 0.3 us -> 28.3 us +- 0.3 us: 1.01x slower
- xml_etree_iterparse: 105 ms +- 1 ms -> 105 ms +- 1 ms: 1.01x slower

Faster (43):
- go: 149 ms +- 1 ms -> 139 ms +- 1 ms: 1.07x faster
- logging_simple: 5.38 us +- 0.10 us -> 5.16 us +- 0.08 us: 1.04x faster
- pickle: 9.89 us +- 0.14 us -> 9.53 us +- 0.10 us: 1.04x faster
- pycparser: 1.24 sec +- 0.02 sec -> 1.19 sec +- 0.02 sec: 1.04x faster
- thrift: 780 us +- 13 us -> 754 us +- 8 us: 1.03x faster
- deltablue: 3.85 ms +- 0.05 ms -> 3.73 ms +- 0.05 ms: 1.03x faster
- unpack_sequence: 48.5 ns +- 0.5 ns -> 47.1 ns +- 0.9 ns: 1.03x faster
- scimark_sparse_mat_mult: 4.97 ms +- 0.15 ms -> 4.83 ms +- 0.11 ms: 1.03x faster
- pyflate: 451 ms +- 3 ms -> 438 ms +- 4 ms: 1.03x faster
- xml_etree_process: 56.7 ms +- 0.8 ms -> 55.2 ms +- 0.7 ms: 1.03x faster
- pickle_pure_python: 323 us +- 3 us -> 314 us +- 3 us: 1.03x faster
- telco: 6.80 ms +- 0.09 ms -> 6.65 ms +- 0.16 ms: 1.02x faster
- scimark_sor: 120 ms +- 1 ms -> 118 ms +- 1 ms: 1.02x faster
- pidigits: 194 ms +- 0 ms -> 190 ms +- 0 ms: 1.02x faster
- logging_format: 5.87 us +- 0.08 us -> 5.74 us +- 0.09 us: 1.02x faster
- unpickle_pure_python: 238 us +- 2 us -> 233 us +- 2 us: 1.02x faster
- xml_etree_generate: 80.0 ms +- 0.6 ms -> 78.4 ms +- 0.7 ms: 1.02x faster
- meteor_contest: 108 ms +- 3 ms -> 106 ms +- 1 ms: 1.02x faster
- regex_compile: 139 ms +- 1 ms -> 136 ms +- 1 ms: 1.02x faster
- hexiom: 6.96 ms +- 0.03 ms -> 6.83 ms +- 0.02 ms: 1.02x faster
- sympy_sum: 163 ms +- 2 ms -> 160 ms +- 1 ms: 1.02x faster
- tornado_http: 98.2 ms +- 1.3 ms -> 96.5 ms +- 1.4 ms: 1.02x faster
- dulwich_log: 65.8 ms +- 0.4 ms -> 64.7 ms +- 0.5 ms: 1.02x faster
- sympy_integrate: 20.9 ms +- 0.1 ms -> 20.6 ms +- 0.1 ms: 1.02x faster
- scimark_fft: 340 ms +- 4 ms -> 334 ms +- 4 ms: 1.02x faster
- 2to3: 267 ms +- 1 ms -> 263 ms +- 1 ms: 1.02x faster
- scimark_monte_carlo: 69.7 ms +- 1.2 ms -> 68.7 ms +- 0.8 ms: 1.01x faster
- django_template: 35.0 ms +- 0.5 ms -> 34.5 ms +- 0.5 ms: 1.01x faster
- chaos: 71.7 ms +- 0.6 ms -> 70.7 ms +- 0.6 ms: 1.01x faster
- nbody: 94.0 ms +- 1.7 ms -> 92.8 ms +- 1.8 ms: 1.01x faster
- raytrace: 310 ms +- 2 ms -> 306 ms +- 3 ms: 1.01x faster
- sqlalchemy_declarative: 141 ms +- 3 ms -> 140 ms +- 3 ms: 1.01x faster
- float: 76.7 ms +- 0.8 ms -> 75.8 ms +- 1.0 ms: 1.01x faster
- sympy_str: 291 ms +- 2 ms -> 287 ms +- 3 ms: 1.01x faster
- richards: 47.5 ms +- 1.2 ms -> 47.0 ms +- 1.1 ms: 1.01x faster
- sympy_expand: 485 ms +- 6 ms -> 480 ms +- 3 ms: 1.01x faster
- python_startup_no_site: 6.02 ms +- 0.00 ms -> 5.96 ms +- 0.00 ms: 1.01x faster
- chameleon: 6.63 ms +- 0.07 ms -> 6.57 ms +- 0.06 ms: 1.01x faster
- crypto_pyaes: 83.9 ms +- 0.7 ms -> 83.2 ms +- 1.1 ms: 1.01x faster
- spectral_norm: 102 ms +- 1 ms -> 101 ms +- 1 ms: 1.01x faster
- python_startup: 8.41 ms +- 0.01 ms -> 8.34 ms +- 0.01 ms: 1.01x faster
- nqueens: 86.1 ms +- 1.2 ms -> 85.5 ms +- 0.8 ms: 1.01x faster
- pathlib: 18.3 ms +- 0.2 ms -> 18.2 ms +- 0.3 ms: 1.01x faster

Benchmark hidden because not significant (8): json, logging_silent, mako, sqlalchemy_imperative, sqlite_synth, unpickle, unpickle_list, xml_etree_parse

Geometric mean: 1.01x faster

Include/cpython/code.h

markshannon · 2022-03-17T09:42:40Z

This is still marked as draft, what is left to do?

brandtbucher · 2022-03-17T14:56:16Z

This is still marked as draft, what is left to do?

There is still an awkward spot in _gen_throw where we walk back f_lasti to the previous SEND instruction and perform the jump ourselves when in a yield from (which is sort of a strange control-flow path that isn’t reflected in the CFG/bytecode/dis). I also don’t think the current implementation handles EXTENDED_ARGs correctly.

I’m still trying to understand the code better and figure out a cleaner way of doing this. Any ideas?

brandtbucher · 2022-03-17T20:17:02Z

There is still an awkward spot in _gen_throw where we walk back f_lasti to the previous SEND instruction and perform the jump ourselves when in a yield from (which is sort of a strange control-flow path that isn’t reflected in the CFG/bytecode/dis). I also don’t think the current implementation handles EXTENDED_ARGs correctly.

I’m still trying to understand the code better and figure out a cleaner way of doing this. Any ideas?

#31968 should help.

markshannon · 2022-03-18T17:29:36Z

I think it is OK to sort out the peculiarities of gen.throw() after this PR is merged.

bedevere-bot · 2022-03-18T21:30:17Z

🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit c8054b9 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

Objects/codeobject.c

markshannon · 2022-03-21T11:02:55Z

Buildbot failures are all pre-existing failures and unrelated to this PR.

* Moves the bytecode to the end of the corresponding PyCodeObject, and quickens it in-place. * Removes the almost-always-unused co_varnames, co_freevars, and co_cellvars member caches * _PyOpcode_Deopt is a new mapping from all opcodes to their un-quickened forms. * _PyOpcode_InlineCacheEntries is renamed to _PyOpcode_Caches * _Py_IncrementCountAndMaybeQuicken is renamed to _PyCode_Warmup * _Py_Quicken is renamed to _PyCode_Quicken * _co_quickened is renamed to _co_code_adaptive (and is now a read-only memoryview). * Do not emit unused nonzero opargs anymore in the compiler.

brandtbucher added 26 commits Mar 10, 2022

Move bytecode into the code object

6ca0d42

Clean things up a bit

a77a124

Bump the magic number

975b8d1

co_bytecode -> _co_code

40ddf39

Generate specialization table

bfcba6d

Clean things up a bit

0376822

Pack code objects more efficiently

3e77b8d

Fix typo

0a598a7

More cleanup

2fda3b8

Try a different approach

42810dd

Clean up the diff

7df4934

Support equality comparisons again

5fa0ca2

Never un-quicken!

1fc2282

More renaming and cleanup

b40e300

Revert marshal format changes

af27670

More cleanup

629bf8b

Clean up the diff

59cda59

Catch up with main

73c33c1

Miscellaneous cleanup

ecfb193

Remove outdated comment

824b2da

Properly skip over EXTENDED_ARG instructions

8164f41

Make sure that f_lasti is always valid

932a3f2

Add some comments

c0c5498

Catch up with main

f62a395

Check opargs during size calculations

e7464a3

Add another TODO

4f51fdd

brandtbucher added performance DO-NOT-MERGE labels Mar 15, 2022

bedevere-bot added the awaiting core review label Mar 15, 2022

the-knights-who-say-ni added the CLA signed label Mar 15, 2022

brandtbucher added 2 commits Mar 16, 2022

Naming is hard

e70819f

Catch up with main

001eb53

markshannon mentioned this pull request Mar 16, 2022

Lazily create code object co_code attribute. faster-cpython/ideas#85

Closed

brandtbucher requested a review from gvanrossum Mar 16, 2022

brandtbucher added 2 commits Mar 16, 2022

make patchcheck

6b96204

blurb add

3087025

gvanrossum reviewed Mar 16, 2022

View changes

Tools/scripts/deepfreeze.py Outdated Show resolved Hide resolved

Reuse the PyCodeObject definition for deepfreeze

6f3bc38

gvanrossum reviewed Mar 17, 2022

View changes

Include/cpython/code.h Show resolved Hide resolved

Clean up TODO

c8054b9

brandtbucher marked this pull request as ready for review Mar 18, 2022

brandtbucher requested review from tiran and 1st1 as code owners Mar 18, 2022

brandtbucher added 🔨 test-with-buildbots and removed DO-NOT-MERGE labels Mar 18, 2022

bedevere-bot removed the 🔨 test-with-buildbots label Mar 18, 2022

markshannon reviewed Mar 21, 2022

View changes

Objects/codeobject.c Show resolved Hide resolved

markshannon reviewed Mar 21, 2022

View changes

Objects/codeobject.c Show resolved Hide resolved

markshannon merged commit 2bde682 into python:main Mar 21, 2022
83 of 87 checks passed

bedevere-bot removed the awaiting core review label Mar 21, 2022

brandtbucher mentioned this pull request Apr 10, 2022

Inline bytecode caches #90997

Open

python / cpython Public

bpo-46841: Quicken code in-place #31888

bpo-46841: Quicken code in-place #31888

brandtbucher commented Mar 15, 2022 •

edited

brandtbucher commented Mar 15, 2022

brandtbucher commented Mar 16, 2022

markshannon commented Mar 17, 2022

brandtbucher commented Mar 17, 2022

brandtbucher commented Mar 17, 2022

markshannon commented Mar 18, 2022

bedevere-bot commented Mar 18, 2022

markshannon commented Mar 21, 2022

python / cpython Public

bpo-46841: Quicken code in-place #31888

bpo-46841: Quicken code in-place #31888

Conversation

brandtbucher commented Mar 15, 2022 • edited

brandtbucher commented Mar 15, 2022

brandtbucher commented Mar 16, 2022

markshannon commented Mar 17, 2022

brandtbucher commented Mar 17, 2022

brandtbucher commented Mar 17, 2022

markshannon commented Mar 18, 2022

bedevere-bot commented Mar 18, 2022

markshannon commented Mar 21, 2022

brandtbucher commented Mar 15, 2022 •

edited