New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-99554: marshal
bytecode more efficiently
#99555
base: main
Are you sure you want to change the base?
Conversation
Hm, that's annoying. It appears that I might need to find a |
Python/marshal.c
Outdated
assert(0x00 <= oparg && oparg < 0x100); | ||
buffer[i++] = _Py_MAKECODEUNIT(opcode, oparg); | ||
for (int j = 0; j < _PyOpcode_Caches[opcode]; j++) { | ||
buffer[i++] = _Py_MAKECODEUNIT(CACHE, oparg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert that i < size
here, since there is potential for a buffer overrun here.
TBH, I'm a bit torn. This looks brilliant, but there's one thing that worries me -- unless you have the correct table of inline cache entries and know the value of HAVE_ARGUMENT
you can't even skip the bytecode, which means you can't do anything else with the code object (e.g. looking at co_consts
or co_names
or even co_firstlineno
).
That would be easier if the size you wrote was the number of bytes written instead of the number of code units including cache; but then r_bytecode()
would have to make a guess at the size of the bytes
object it is allocating and perhaps reallocate it later, which isn't ideal either.
How would you feel about writing both numbers to the file? That wastes 4 bytes per code object, but makes decoding more robust.
FWIW: Here's a wild scheme for avoiding the extra copy of the bytecode (which would presumably save a little time and memory on all imports). Once you have read the size of the bytecode you could just allocate a blank PyCodeObject
object of the right size, and then you could deposit the expanded bytecode directly into that. Then the struct _PyCodeConstructor
-based API would have to allow a field where you pass in that object and it would just initialize the rest based on the various given fields and return it (or dealloc it if there's another error). A little fiddly and not something for this PR, but given how heroic this all already sounds, perhaps worth the effort in the future.
bytecode = bytearray() | ||
while len(bytecode) < nbytes: | ||
opcode_byte = self.r_byte() | ||
if opcode.HAVE_ARGUMENT <= opcode_byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird. I'm so used to if x >= N
that seeing if N <= x
just feels wrong.
if opcode.HAVE_ARGUMENT <= opcode_byte: | |
if opcode_byte >= opcode.HAVE_ARGUMENT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const == variable
is a C-ism to prevent accidentally writing =
instead of ==
. No need in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a weird habit where I always use <
and <=
for comparisons, regardless of the placement of constants, etc. I guess I like the parallel with chained comparisons like lo <= x < hi
, which almost always use <
and <=
.
I'll change it, though.
@@ -17,13 +17,13 @@ | |||
from typing import Dict, FrozenSet, TextIO, Tuple | |||
|
|||
import umarshal | |||
import opcode_for_build as opcode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a normal import so it shouldn't look like one. Maybe something like opcode = opcode_finder.get_opcodes()
bytecode = bytearray() | ||
while len(bytecode) < nbytes: | ||
opcode_byte = self.r_byte() | ||
if opcode.HAVE_ARGUMENT <= opcode_byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const == variable
is a C-ism to prevent accidentally writing =
instead of ==
. No need in Python.
buffer[i++] = _Py_MAKECODEUNIT(opcode, oparg); | ||
buffer[i].opcode = opcode; | ||
buffer[i++].oparg = oparg; | ||
for (int j = 0; j < _PyOpcode_Caches[opcode]; j++) { | ||
buffer[i++] = _Py_MAKECODEUNIT(CACHE, oparg); | ||
buffer[i].opcode = CACHE; | ||
buffer[i++].oparg = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'd rather see the i++
on separate lines instead of this hybrid approach.
This shrinks the size of
.pyc
files by about 25%. It also removes an intermediate copy of the bytecode created during marshalling.Next steps will be removing the intermediate
bytes
object created during unmarshalling and performing the work of_PyCode_Quicken
as part of this same move..pyc
files are larger than they need to be #99554