Skip to content

bpo-36839: Support the buffer protocol in code objects #13177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

DinoV
Copy link
Contributor

@DinoV DinoV commented May 7, 2019

bpo-36839: Support the buffer protocol in code objects

Adds support for code objects to be backed by any objects which implement the buffer protocol. This relatively small core interpreter change makes it possible to build importer/loaders which can store their code objects in memory mapped files. When used in an environment with fork/exec'd processes this can allow the memory to be successfully shared between all of the processes.

https://bugs.python.org/issue36839

@DinoV DinoV requested a review from 1st1 as a code owner May 7, 2019 20:56
@DinoV DinoV changed the title Support the buffer protocol in code objects Support the buffer protocol in code objects bpo-36839 May 7, 2019
@@ -316,10 +324,17 @@ frame_setlineno(PyFrameObject *f, PyObject* p_new_lineno, void *Py_UNUSED(ignore
delta--;
}

if (view.obj != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The latest C code style recommends using {} always, even for single line clauses.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. I've left a number of comments about things that seemed to be missing.

I'm going to approve the PR, but with the understanding that you will address them (or dismiss them as misunderstandings). :)


code = f.__code__
ct = type(f.__code__)
c = ct(code.co_argcount, code.co_posonlyargcount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much respect! :)

ct = type(f.__code__)
c = ct(code.co_argcount, code.co_posonlyargcount,
code.co_kwonlyargcount, code.co_nlocals, code.co_stacksize,
code.co_flags, array.array('B', code.co_code),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, inlining the array hides it. I knew to look for it due to the context of this PR. So it would be worth making a separate variable for it above.

except ImportError:
return
from tempfile import NamedTemporaryFile
with NamedTemporaryFile('wb', suffix='.code') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, "f" as a name is a bit ambiguous in this file. :) However, it isn't confusing here.

code_check_buffer(PyObject *code) {
Py_buffer codebuffer = {};
if (!PyBytes_Check(code)) {
if (PyObject_GetBuffer(code, &codebuffer, PyBUF_SIMPLE))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing braces (see PEP 7)

@@ -367,7 +382,8 @@ code_new(PyTypeObject *type, PyObject *args, PyObject *kw)
int firstlineno;
PyObject *lnotab;

if (!PyArg_ParseTuple(args, "iiiiiiSO!O!O!UUiS|O!O!:code",

if (!PyArg_ParseTuple(args, "iiiiiiOO!O!O!UUiS|O!O!:code",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understood, you only changed the first "S" to "O" and nothing else, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I wish github had some inline highlighting here

code = (unsigned char *)view.buf;
code_len = view.len;
} else {
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to set an error here, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test is checking the cases you've added here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be an unreachable condition - the construction of code objects checks and makes sure we have a reasonable buffer. It should definitely be setting an error, I'll see if I can poke around and produce an error through extra evil measures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, another possible way to hit this would be in low memory conditions, but there's not great infrastructure to test that either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one seems to be generally unreachable by a well-behaved buffer protocol object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually add an assert(condition) to the code in these cases just so it shows up if we ever manage to trigger such a condition. (Our non-release builds at work for unittests use an assertion enabled interpreter - so it isn't just the python.org buildbots that would notice)

@@ -199,15 +199,23 @@ frame_setlineno(PyFrameObject *f, PyObject* p_new_lineno, void *Py_UNUSED(ignore
}

/* We're now ready to look at the bytecode. */
PyBytes_AsStringAndSize(f->f_code->co_code, (char **)&code, &code_len);
Py_buffer view = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to factor this out to something like a unsigned char * _PyCode_GetBytes(PyObject *) helper? We could use it here and in genobject.c, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this... I think the sig needs to actually be unsigned char* _PyCode_GetBytes(PyObject *, Py_ssize_t *len, Py_buffer *view) which I feel like is getting complicated enough that it's not going to save much. But let me see if it actually helps much.

assert(_Py_IS_ALIGNED(code_view.buf, sizeof(_Py_CODEUNIT)));
first_instr = (_Py_CODEUNIT *) code_view.buf;
} else {
goto exit_eval_frame;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, don't we need to set an error here? Where are the two new code paths tested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this case? In the original code this was covered by an assert... I'm not opposed to an exception instead of an insert though if you think this give us a better experience. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle the case where PyObject_GetBuffer fails as I think it could require allocating, so will definitely add the missing error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I was able to hit this case, by closing the memory mapped file and then re-running the function.

assert(PyBytes_GET_SIZE(co->co_code) % sizeof(_Py_CODEUNIT) == 0);
assert(_Py_IS_ALIGNED(PyBytes_AS_STRING(co->co_code), sizeof(_Py_CODEUNIT)));
first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code);
} else if (!PyObject_GetBuffer(co->co_code, &code_view, PyBUF_SIMPLE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't your fault, but I keep getting thrown off by the return value of PyObject_GetBuffer(). :)

@@ -345,7 +354,11 @@ _PyGen_yf(PyGenObject *gen)
return NULL;
}

if (code[f->f_lasti + sizeof(_Py_CODEUNIT)] != YIELD_FROM)
char opcode = code[f->f_lasti + sizeof(_Py_CODEUNIT)];
if (view.obj != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs braces

if (PyObject_GetBuffer(code, &codebuffer, PyBUF_SIMPLE))
return 0;

int contiguous = PyBuffer_IsContiguous(&codebuffer, 'C');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right.
The buffer should be read-only and one dimensional, not just contiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well tighten this up to:

codebuffer.readonly &&
codebuffer.ndim == 1 &&
codebuffer.strides == NULL;

I think that should satisfy the readonly check because per the documentation it needs to be consistent for all consumers: https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE

Copy link
Contributor

@skrah skrah May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a well-behaved exporter, PyBUF_SIMPLE implies C, ndim==1, format "B" and readonly.

I know the code base has these (redundant) checks in other places, but at some point we may want to use assert(). I'm not suggesting to eliminate the extra safety check here and now, just clarifying.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

DinoV added 2 commits May 23, 2019 13:09
object supporting the buffer protocol.  This allows environments
which are heavily based on fork and exec to share memory
between code objects rather than having it end up on pages
which get copy-on-written
@brettcannon brettcannon changed the title Support the buffer protocol in code objects bpo-36839 bpo-36839: Support the buffer protocol in code objects May 30, 2019
@matrixise
Copy link
Member

Hi @DinoV

Do you want to fix the conflicts? Thanks

Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge this until the benefit is proofed. (see b.p.o)
Currently, the idea is just armchair philosophy.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.