Implement `dataclass` code caching #92650

brandtbucher · 2022-05-11T00:14:59Z

This is a minimal working implementation of "code-caching" for dataclasses. It's heavily inspired by https://github.com/dabeaz/dataklasses, and works by reusing generated code objects for dataclasses that differ only in the names of their fields. "Template" code objects are lazily created with placeholder values (__field_0__, __field_1__) that are patched at method generation time using their replace method. Annotations and default arguments for __init__ methods are assigned manually, as well.

I thought I would stop here and gather feedback/review before going further. A bit more information:

For microbenchmarks on "simple" dataclasses with 1-10 elements and no "special" fields, this branch results in 2x-3x faster class generation time. The test_dataclasses suite, which contains lots of examples of advanced use-cases and actually does some real work with them, runs about 40% faster vs. main.

I've also included some counters for measuring cache stats. These indicate that when running test_dataclasses, 1,428 methods are generated, but only 112 don't have suitable templates in the code cache yet and need to be constructed using exec. So even for the wide range of dataclasses present in this program, we're still able to maintain a hit rate above 90% (__init__ methods are, predictably, the source of most of the misses).

gpshead · 2022-05-14T17:25:34Z

Could this be further sped up by having dataclasses.py come with a pre-seeded code cache from inlined code that'd already be part of the .pyc file thus avoiding runtime calls to exec() entirely for things that match its shapes?

brandtbucher added 8 commits May 10, 2022

Initial implementation of code-caching

1f7ffa8

More cleanup

17dd0d0

More more cleanup

7096466

Smaller diff

52113c5

More diff cleanup

62d0c87

Revert unnecessary changes

af64d69

Remove unnecessary code

e377393

Simplify tuple patching

9d01e5c

brandtbucher added performance stdlib 3.12 labels May 11, 2022

brandtbucher requested a review from ericvsmith May 11, 2022

bedevere-bot added the awaiting core review label May 11, 2022

brandtbucher added 2 commits May 11, 2022

Further improve tuple patching

c9acd33

Simplify cache keying

82f1c75

python / cpython Public

Implement `dataclass` code caching #92650

Implement `dataclass` code caching #92650

brandtbucher commented May 11, 2022

gpshead commented May 14, 2022

python / cpython Public

Implement dataclass code caching #92650

Are you sure you want to change the base?

Implement dataclass code caching #92650

Conversation

brandtbucher commented May 11, 2022

gpshead commented May 14, 2022

Implement `dataclass` code caching #92650

Implement `dataclass` code caching #92650