bpo-45256: Avoid C calls for more Python to Python calls. #28937
Conversation
Did anyone discover what caused the slight slowdown in the initial PR's benchmark? |
…onsume the argument references regardless of whether they succeed or fail.
Maybe the additional tests for consuming references? |
If you want to schedule another build, you need to add the " |
If you want to schedule another build, you need to add the " |
@@ -5480,7 +5459,7 @@ initialize_locals(PyThreadState *tstate, PyFrameConstructor *con, | |||
_PyErr_Format(tstate, PyExc_TypeError, | |||
"%U() keywords must be strings", | |||
con->fc_qualname); | |||
goto fail; | |||
goto kw_fail; |
I find this a bit confusing. How this is this cleaning the positional arguments in the case where some positional args have been copied and we fail to copy some of the keywords?
All the positional arguments have been copied at this point.
Yeah exactly, so if the call fails you need to increment those references so the cleanup succeeds, no? This is because if the references have been stolen, we own the references back to the positional arguments
If steal_args
is true
, then we always consume the references. #28937 (comment)
By this point the references to all positional arguments have been consumed, as have all references of kwargs up to i
(exclusive).
kw_fail
consumes the references to kwargs from i
(inclusive) to kwcount
(exclusive) so it can them jump to fail_late
as all references will have been consumed.
Ah, wait a minute, I misread what we are doing! We are consuming references now, while before we were increasing them so the stack cleans them.
Ah, we are avoiding the double free by shrinking the stack here, no?
STACK_SHRINK(stackadj);
// The frame has stolen all the arguments from the stack,
// so there is no need to clean them up.
Py_XDECREF(kwnames);
Py_DECREF(function);
@@ -1702,6 +1702,11 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, InterpreterFrame *frame, int thr | |||
switch (opcode) { | |||
#endif | |||
|
|||
/* Variables used for making calls */ | |||
PyObject *kwnames; |
Could we put this into a struct instead of having more locals? It makes it a bit more clean to read and contextualize and it will have the same performance as long as the struct is stack allocated
Do we want to rely on all compilers being able to do perfect escape analysis here?
I'd rather leave these as local variables so the compiler can easily allocate them to registers?
There is no way that given the size of this function this is going to end in registers. Indeed: I checked with GCC, clang, ICC and xlc in all different optimization level and not a single one places these locals on registers.
Up to you anyway, I don't feel super strongly about it but I think it helps with clarity and organization.
How did you determine which of these were in registers? Once in SSA form, there are 15 (I think) variables here.
gdb/dbx + breakpoint for this function + disas
+ info registers
. I checked if any of the values in these locals are stored or partially stored in registers.
I only checked x86-64 thought.
I left some minor comments, otherwise LGTM.
Great work! I have not checked for refleaks, will do this afterwards but I also scheduled a buildbot run
I ran the buildbots earlier. All good except the Gentoo ones failing for tkinter stuff. |
Extends the approach used for
CALL_FUNCTION
toCALL_FUNCTION_KW
,CALL_METHOD
andCALL_METHOD_KW
.Also modifies
initialize_locals
and_PyTuple_FromArraySteal
to have the same behavior w.r.t. reference counting regardless of whether they succeed or fail.https://bugs.python.org/issue45256
The text was updated successfully, but these errors were encountered: