New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.11] gh-93382: Cache result of PyCode_GetCode
in codeobject (GH-93383)
#93493
base: 3.11
Are you sure you want to change the base?
[3.11] gh-93382: Cache result of PyCode_GetCode
in codeobject (GH-93383)
#93493
Conversation
…nGH-93383) Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
PyCode_GetCode
in codeobject (GH-93383)PyCode_GetCode
in codeobject (GH-93383)
@pablogsal and @nedbat This PR improves coverage performance on 3.11 by 5-7%. Using Ned's benchmarks
An observation: bm_spectral_norm improved less than bm_sudoku because the size of its |
Unfortunately this means that whatever is making Python 3.11 slower with coverage, unfortunately is not only this :( Great investigation @Fidget-Spinner and thanks for working on this I will try to review ASAP but it would be great if @markshannon @iritkatriel @brandtbucher or @ericsnowcurrently can take a look. |
I'm on mobile right now (probably can't get to a computer today), but here are a few thoughts based on a first look:
@@ -88,6 +88,7 @@ typedef uint16_t _Py_CODEUNIT; | |||
PyObject *co_qualname; /* unicode (qualname, for reference) */ \ | |||
PyObject *co_linetable; /* bytes object that holds location info */ \ | |||
PyObject *co_weakreflist; /* to support weakrefs to code objects */ \ | |||
void *_co_code; /* cached co_code object/attribute */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not PyObject *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as why co_extra
is void *
:
/*
Type is a void* to keep the format private in codeobject.c to force \
people to go through the proper APIs. */
I also didn't want people thinking they can just replace their current co_code
accesses with _co_code
. So this should force them to use PyCode_GetCode
or PyObject_GetAttrString(co, "co_code")
Objects/codeobject.c
Outdated
PyObject *code = PyBytes_FromStringAndSize((const char *)_PyCode_CODE(co), | ||
_PyCode_NBYTES(co)); | ||
if (code == NULL) { | ||
return NULL; | ||
} | ||
deopt_code((_Py_CODEUNIT *)PyBytes_AS_STRING(code), Py_SIZE(co)); | ||
assert(co->_co_code == NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct... could the allocation above have triggered a garbage collection that could have ended up populating this member? I'd be more confident just removing this assert and using Py_XSETREF
or something instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I'm not sure how that would work. But I see your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes
objects are not gc tracked (no PyGC_Head
) so its allocation cannot trigger a gc collection as it uses PyObject_Malloc
not PyObject_GC_New
.
Also this PR is a backport so it would be better to not deviate it from the 3.12 PR else both should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backport part isn't an issue because I can just fix it on the main branch.
Anyways, I'll wait for Brandt's reply to see what he says.
I also ran the benchmarks using #93493:
It's a slight improvement, but isn't solving the problem. |
Yeah I'm aware. I sent this PR in because 10% improvement on macrobenchmarks is still something! Even cProfile slowed down by 60% when profiling code in 3.11. My hunch is that accessing the full |
(cherry picked from commit d52ffc1)