-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-36839: Support the buffer protocol in code objects #13177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Objects/frameobject.c
Outdated
@@ -316,10 +324,17 @@ frame_setlineno(PyFrameObject *f, PyObject* p_new_lineno, void *Py_UNUSED(ignore | |||
delta--; | |||
} | |||
|
|||
if (view.obj != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The latest C code style recommends using {}
always, even for single line clauses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. I've left a number of comments about things that seemed to be missing.
I'm going to approve the PR, but with the understanding that you will address them (or dismiss them as misunderstandings). :)
Lib/test/test_code.py
Outdated
|
||
code = f.__code__ | ||
ct = type(f.__code__) | ||
c = ct(code.co_argcount, code.co_posonlyargcount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much respect! :)
Lib/test/test_code.py
Outdated
ct = type(f.__code__) | ||
c = ct(code.co_argcount, code.co_posonlyargcount, | ||
code.co_kwonlyargcount, code.co_nlocals, code.co_stacksize, | ||
code.co_flags, array.array('B', code.co_code), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, inlining the array hides it. I knew to look for it due to the context of this PR. So it would be worth making a separate variable for it above.
Lib/test/test_dis.py
Outdated
except ImportError: | ||
return | ||
from tempfile import NamedTemporaryFile | ||
with NamedTemporaryFile('wb', suffix='.code') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, "f" as a name is a bit ambiguous in this file. :) However, it isn't confusing here.
Objects/codeobject.c
Outdated
code_check_buffer(PyObject *code) { | ||
Py_buffer codebuffer = {}; | ||
if (!PyBytes_Check(code)) { | ||
if (PyObject_GetBuffer(code, &codebuffer, PyBUF_SIMPLE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing braces (see PEP 7)
@@ -367,7 +382,8 @@ code_new(PyTypeObject *type, PyObject *args, PyObject *kw) | |||
int firstlineno; | |||
PyObject *lnotab; | |||
|
|||
if (!PyArg_ParseTuple(args, "iiiiiiSO!O!O!UUiS|O!O!:code", | |||
|
|||
if (!PyArg_ParseTuple(args, "iiiiiiOO!O!O!UUiS|O!O!:code", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure I understood, you only changed the first "S" to "O" and nothing else, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I wish github had some inline highlighting here
code = (unsigned char *)view.buf; | ||
code_len = view.len; | ||
} else { | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to set an error here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which test is checking the cases you've added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really be an unreachable condition - the construction of code objects checks and makes sure we have a reasonable buffer. It should definitely be setting an error, I'll see if I can poke around and produce an error through extra evil measures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, another possible way to hit this would be in low memory conditions, but there's not great infrastructure to test that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one seems to be generally unreachable by a well-behaved buffer protocol object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually add an assert(condition)
to the code in these cases just so it shows up if we ever manage to trigger such a condition. (Our non-release builds at work for unittests use an assertion enabled interpreter - so it isn't just the python.org buildbots that would notice)
Objects/frameobject.c
Outdated
@@ -199,15 +199,23 @@ frame_setlineno(PyFrameObject *f, PyObject* p_new_lineno, void *Py_UNUSED(ignore | |||
} | |||
|
|||
/* We're now ready to look at the bytecode. */ | |||
PyBytes_AsStringAndSize(f->f_code->co_code, (char **)&code, &code_len); | |||
Py_buffer view = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to factor this out to something like a unsigned char * _PyCode_GetBytes(PyObject *)
helper? We could use it here and in genobject.c, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about this... I think the sig needs to actually be unsigned char* _PyCode_GetBytes(PyObject *, Py_ssize_t *len, Py_buffer *view) which I feel like is getting complicated enough that it's not going to save much. But let me see if it actually helps much.
assert(_Py_IS_ALIGNED(code_view.buf, sizeof(_Py_CODEUNIT))); | ||
first_instr = (_Py_CODEUNIT *) code_view.buf; | ||
} else { | ||
goto exit_eval_frame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As before, don't we need to set an error here? Where are the two new code paths tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need this case? In the original code this was covered by an assert... I'm not opposed to an exception instead of an insert though if you think this give us a better experience. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should handle the case where PyObject_GetBuffer fails as I think it could require allocating, so will definitely add the missing error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I was able to hit this case, by closing the memory mapped file and then re-running the function.
assert(PyBytes_GET_SIZE(co->co_code) % sizeof(_Py_CODEUNIT) == 0); | ||
assert(_Py_IS_ALIGNED(PyBytes_AS_STRING(co->co_code), sizeof(_Py_CODEUNIT))); | ||
first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code); | ||
} else if (!PyObject_GetBuffer(co->co_code, &code_view, PyBUF_SIMPLE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this isn't your fault, but I keep getting thrown off by the return value of PyObject_GetBuffer()
. :)
Objects/genobject.c
Outdated
@@ -345,7 +354,11 @@ _PyGen_yf(PyGenObject *gen) | |||
return NULL; | |||
} | |||
|
|||
if (code[f->f_lasti + sizeof(_Py_CODEUNIT)] != YIELD_FROM) | |||
char opcode = code[f->f_lasti + sizeof(_Py_CODEUNIT)]; | |||
if (view.obj != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs braces
Objects/codeobject.c
Outdated
if (PyObject_GetBuffer(code, &codebuffer, PyBUF_SIMPLE)) | ||
return 0; | ||
|
||
int contiguous = PyBuffer_IsContiguous(&codebuffer, 'C'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right.
The buffer should be read-only and one dimensional, not just contiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well tighten this up to:
codebuffer.readonly &&
codebuffer.ndim == 1 &&
codebuffer.strides == NULL;
I think that should satisfy the readonly check because per the documentation it needs to be consistent for all consumers: https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a well-behaved exporter, PyBUF_SIMPLE
implies C
, ndim==1
, format "B"
and readonly
.
I know the code base has these (redundant) checks in other places, but at some point we may want to use assert()
. I'm not suggesting to eliminate the extra safety check here and now, just clarifying.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
1788c75
to
3a93c27
Compare
4e817e0
to
6b4380b
Compare
d33b039
to
f7c2e36
Compare
object supporting the buffer protocol. This allows environments which are heavily based on fork and exec to share memory between code objects rather than having it end up on pages which get copy-on-written
Hi @DinoV Do you want to fix the conflicts? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't merge this until the benefit is proofed. (see b.p.o)
Currently, the idea is just armchair philosophy.
When you're done making the requested changes, leave the comment: |
bpo-36839: Support the buffer protocol in code objects
Adds support for code objects to be backed by any objects which implement the buffer protocol. This relatively small core interpreter change makes it possible to build importer/loaders which can store their code objects in memory mapped files. When used in an environment with fork/exec'd processes this can allow the memory to be successfully shared between all of the processes.
https://bugs.python.org/issue36839