New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-98003: Inline call frames for CALL_FUNCTION_EX #98004
base: main
Are you sure you want to change the base?
gh-98003: Inline call frames for CALL_FUNCTION_EX #98004
Conversation
Fidget-Spinner
commented
Oct 6, 2022
•
edited by bedevere-bot
edited by bedevere-bot
- Issue: Inline call frames for CALL_FUNCTION_EX #98003
Looks like not much difference:
Perhaps we should try inlining all |
@markshannon gentle ping for a review please. |
The approach I would prefer is to:
There is a lot unnecessary dismantling of tuples and dicts in the current approach. This is not a problem with this PR, but I think it needs to be sorted out before "inlining" the call in Ideally
Where |
I'm confused. The first step is to create a frame, but then in your example code you show it creating another frame again. Why does it do that? |
Because we don't really create a frame. We push a frame and then initialize it. Pushing a frame is simple. It is initializing that is complex and we want to factor out. |
Just curious, has this stalled out for now? At the very least, the changes in Totally get it if you're busy with school or something right now though! :) Just making sure that you aren't blocked on reviews or design discussions. If so, maybe we can move this forward (I'd really like to see it in)! |
Yeah my exams end next week. I plan to start contributing actively again after then! |
@brandtbucher and @markshannon I think this is ready for your reviews (when you have time of course). |
This approach creates a lot of intermediate objects.
I'm not sure if that is a result of the transformations in this PR, or if the original code also did created those intermediate objects.
If the objects were already being created, then maybe we should streamline it in another PR.
@Fidget-Spinner what do you think?
int code_flags = ((PyCodeObject *)PyFunction_GET_CODE(func))->co_flags; | ||
PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(func)); | ||
|
||
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_Ex(tstate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to move the unpacking of the tuple into _PyEvalFramePushAndInit_Ex
.
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_Ex(tstate, func, locals, callargs, kwargs).
CALL_FUNCTION_EX
is quite a large instruction. Pushing more code into _PyEvalFramePushAndInit_Ex
would keep it smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unpacking which tuple?
If you mean move the code above in, that would break convention with the rest of the code base. The other inline calls do these
int code_flags = ((PyCodeObject *)PyFunction_GET_CODE(func))->co_flags;
PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(func));
before creating the new frame.
Py_INCREF(PyTuple_GET_ITEM(callargs, i)); | ||
} | ||
} | ||
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than transforming callargs
and kwargs
into a shape suitable for _PyEvalFramePushAndInit
, it might make sense to push the frame, then unpack the tuple and dict into the the frame without the intermediate objects.
It complicates the errror handling a bit, and the frame will need to be cleaned up if there is an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can work on this in another PR.
Thanks for this PR. Avoiding C stack consumption for these calls is definitely something we want. |
Yes these are the semantics of the original CALL_FUNCTION_EX. Most of these were copy pasted. I was surprised at how allocation heavy CALL_FUNCTION_EX is and have been thinking of ways to improve it. |