Implement dataclass
code caching
#92650
Draft
+106
−65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
This is a minimal working implementation of "code-caching" for
dataclasses
. It's heavily inspired by https://github.com/dabeaz/dataklasses, and works by reusing generated code objects for dataclasses that differ only in the names of their fields. "Template" code objects are lazily created with placeholder values (__field_0__
,__field_1__
) that are patched at method generation time using theirreplace
method. Annotations and default arguments for__init__
methods are assigned manually, as well.I thought I would stop here and gather feedback/review before going further. A bit more information:
For microbenchmarks on "simple" dataclasses with 1-10 elements and no "special" fields, this branch results in 2x-3x faster class generation time. The
test_dataclasses
suite, which contains lots of examples of advanced use-cases and actually does some real work with them, runs about 40% faster vs.main
.I've also included some counters for measuring cache stats. These indicate that when running
test_dataclasses
, 1,428 methods are generated, but only 112 don't have suitable templates in the code cache yet and need to be constructed usingexec
. So even for the wide range of dataclasses present in this program, we're still able to maintain a hit rate above 90% (__init__
methods are, predictably, the source of most of the misses).