bpo-45256: Remove the usage of the C stack in Python to Python calls #28488
Conversation
If you want to schedule another build, you need to add the " |
@@ -31,6 +31,7 @@ typedef struct _interpreter_frame { | |||
int f_lasti; /* Last instruction if called */ | |||
int stacktop; /* Offset of TOS from localsplus */ | |||
PyFrameState f_state; /* What state the frame is in */ | |||
int depth; /* Depth of the frame in a ceval loop */ |
The depth
is on the frame object and not on the cframe
because this is much easier to fetch from gdb and other debugging tools.
Why is tstate->cframe->depth
any harder than tstate->frame->depth
?
Because is not tstate->cframe->depth
or tstate->frame->depth
what gdb fetches. gdb
gets the frame from a local variable from the stack so you just start with the frame pointer. In general tools fetch the frame because that is what they are interested into. If we place it in the cframe now we need a way to correlate the frames with the number, and that relationship is one to many, which is more difficult to maintain correctly. Gdb also constructs the frame from a pointer, but if we now need the cframe
this forces it to also take it as an argument, and that is not always available in the places where frames are constructed, like when you walk the list backwards or forwards, creating new objects.
Also, if you place it in tstate->cframe->depth
now you need to fetch both (cframe
and frame
) and match them accordingly, which is much more complicated that it needs to be, and this also forces the tool to keep the cframe
struct up to date (when not using gdb, but when using DWARF or the offsets in the struct).
Looking good. A few minor issues.
static InterpreterFrame* | ||
_PyEval_FrameFromPyFunctionAndArgs(PyThreadState *tstate, PyObject* const *args, int nargs, PyObject *function) { | ||
assert(PyFunction_Check(function)); | ||
size_t nargsf = nargs | PY_VECTORCALL_ARGUMENTS_OFFSET; |
nargsf
and vector_nargs
below are only used in asserts.
I think all the asserts and assignments to nargsf
and vector_nargs
could be replaced with assert(nargs == 0 || args != NULL);
@@ -31,6 +31,7 @@ typedef struct _interpreter_frame { | |||
int f_lasti; /* Last instruction if called */ | |||
int stacktop; /* Offset of TOS from localsplus */ | |||
PyFrameState f_state; /* What state the frame is in */ | |||
int depth; /* Depth of the frame in a ceval loop */ |
This is not really a property of the frame, but its place in the call stack. A generator frame could have different depths during its lifetime.
It also takes space and time on the fast path of calls.
Could you move the depth back to the CFrame
?
No, check this comment:
This is here because using this information from debugging tools is much much easier this way. While I was fixing the gdb stuff, it became clear that having it into the cframe generates a considerable amount of complexity.
I profiled this and it has no measurable impact on runtime performance.
That's probably more a comment on our ability to measure small changes
We can always change it later if it is slower.
What I'm more worried about is the redundancy and bugs due to out of date depth information on frames that outlive the call.
What we could do is, instead of having a depth
value, have an entry
flag.
On entry to _PyEval_EvalFrameDefault
, set the entry
flag, and unset it on exit.
I think that would be safer, and no slower.
What I'm more worried about is the redundancy and bugs due to out of date depth information on frames that outlive the call.
As in generators or coroutines? Those will always have depth=0
At the moment sure, but we will want to flatten next/yield
in the same way as we are doing with call/return
here.
but we will want to flatten
next/yield
in the same way as we are doing withcall/return
here.
But then it will be easy: on the exit path set it to -1
or 0
.
@@ -5309,7 +5382,7 @@ get_exception_handler(PyCodeObject *code, int index, int *level, int *handler, i | |||
static int | |||
initialize_locals(PyThreadState *tstate, PyFrameConstructor *con, | |||
PyObject **localsplus, PyObject *const *args, | |||
Py_ssize_t argcount, PyObject *kwnames) | |||
Py_ssize_t argcount, PyObject *kwnames, int steal_args) |
When do we not steal references? Would it be cleaner to have this function always steal the references, and increment them in the caller when necessary?
The cleanup would be more complex, see #28488 (comment)
I don't agree.
Cleanup is simpler if initialize_locals
is consistent and always steals the references to args
regardless of whether it succeeds or fails.
initialize_locals
(or any other function for that matter) should leave things in an consistent state.
But then what do you suggest? That initialize_locals
always steals and _PyEvalFramePushAndInit
is the one that increases or decreases the refcount?
I'm suggesting that both functions have consistent behavior w.r.t. reference counts regardless of whether they succeed or fail.
Otherwise they are just traps for the unwary.
I think it will make the code more efficient, and probably cleaner, if you make both steal the references to args
.
I am not fully convinced of what you want to achieve here. Who is in charge of deciding if the caller owns back the references? The caller of _PyEvalFramePushAndInit
?
Also, the code to retake ownership will be a bit complex because part of what we steal is not flattened in frame->localsplus
but sometimes in a dictionary on frame->localsplus
(in case of the variables passed by keywords).
So what you propose forces us to increment the reference count on the frame->localsplus
array + handle the keyword dict and iterate over that using the dict API, which is not optimal.
Also, this function doesn't leave anything "inconsistent". The contract is easy:
On failure, the caller always owns all references to the arguments. On success, the caller owns them if steal is false.
No other function has such a contract. They either steal all the references, or they steal none of them.
Anything more complicated than that is too easy to misuse.
These functions will get more use as we extend this to CALL_FUNCTION_KW
, CALL_METHOD
and CALL_METHOD_KW
, so they need an easy to use interface.
Ok, I trust your judgement but I would like to discuss this together so we are on the same page because I still have some concerns. Let's schedule a call.
May I add my 2 cent? I have been bitten very much with such inconsistencies, a few (20) years ago. Consistency with simplicity is a very good concept.
if (initialize_locals(tstate, con, localsarray, args, argcount, kwnames, steal_args)) { | ||
if (steal_args) { | ||
for (int i = 0; i < frame->stacktop; i++) { | ||
Py_XINCREF(frame->localsplus[i]); |
Why is this necessary?
So if the function fails the caller still owns all references. This simplifies a lot of failure scenarios. For instance, imagine that yo have a function that takes 5 parameters but you pass 10. In this case, the function will return NULL but it would have only stolen 5 parameters, while the caller is still in charge of decrementing the other 5. But the problem is that the caller needs to know if this was the problem or something else (because the error could have been something different). THis way, on failure, the caller always have to decrement the references, which is very easy to do in the normal paths
When you say "the function" do you mean _PyEvalFramePushAndInit
or initialize_locals
?
_PyEvalFramePushAndInit
with steal = 1
But the argument can be extended to initialize_locals
because _PyEvalFramePushAndInit
doesn't know what went wrong if it returns NULL
When you're done making the requested changes, leave the comment: |
@markshannon Damn, unfortunately something is going on with Windows:
Apparently that is a |
@markshannon Oh, I found the source of the Windows failure: it was an existing bug in the That was challenging to find indeed! |
If you want to schedule another build, you need to add the " |
This is really cool. I just have a few comments on readability. No idea why ASAN is failing tbh.
One other concern I have is that this slows down non-python function calls very slightly (with the two additional checks). I highly doubt it's measurable on pyperformance though.
Python/ceval.c
Outdated
Py_DECREF(function); | ||
_PyFrame_SetStackPointer(frame, stack_pointer); | ||
new_frame->depth = frame->depth + 1; | ||
tstate->frame = frame = new_frame; |
Note to self, no need to pop frame here because it's done on eval frame exit at exit_eval_frame
.
I cannot reproduce the ASAN bug. :( |
There're refleaks but I can't pinpoint where :(
|
Ths commit inlines calls to Python functions in the eval loop and steals all the arguments in the call from the caller for performance.
Updated |
The refleaks are on the main branch, opened: #28493 |
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
|
Drop test for tracing and inline coroutine creation
if ((code->co_flags & (CO_GENERATOR | CO_COROUTINE | CO_ASYNC_GENERATOR)) == 0) { | ||
InterpreterFrame *new_frame = _PyEval_FrameFromPyFunctionAndArgs(tstate, stack_pointer+1, oparg, function); | ||
if (new_frame == NULL) { | ||
// When we exit here, we own all variables in the stack (the frame creation has not stolen |
Wait, since you called STACK_SHRINK(oparg+1)
above, don't you need to Py_DECREF
all the function arguments and function objects too? I don't think error
will decref them automatically for you since it looks from TOS down.
That's true (and my fault).
We need to make functions that steal references, always steal those references.
Otherwise this sort of error is just too easy to make.
Co-authored-by: Brett Cannon <brett@python.org>
https://bugs.python.org/issue45256
The text was updated successfully, but these errors were encountered: