bpo-36839: Support the buffer protocol in code objects #13177

DinoV · 2019-05-07T20:56:31Z

bpo-36839: Support the buffer protocol in code objects

Adds support for code objects to be backed by any objects which implement the buffer protocol. This relatively small core interpreter change makes it possible to build importer/loaders which can store their code objects in memory mapped files. When used in an environment with fork/exec'd processes this can allow the memory to be successfully shared between all of the processes.

https://bugs.python.org/issue36839

1st1 · 2019-05-07T20:59:54Z

Objects/frameobject.c

@@ -316,10 +324,17 @@ frame_setlineno(PyFrameObject *f, PyObject* p_new_lineno, void *Py_UNUSED(ignore
        delta--;
    }

+    if (view.obj != NULL)


nit: The latest C code style recommends using {} always, even for single line clauses.

ericsnowcurrently

Overall, LGTM. I've left a number of comments about things that seemed to be missing.

I'm going to approve the PR, but with the understanding that you will address them (or dismiss them as misunderstandings). :)

Lib/dis.py

ericsnowcurrently · 2019-05-08T13:35:03Z

Lib/test/test_code.py

+
+        code = f.__code__
+        ct = type(f.__code__)
+        c = ct(code.co_argcount, code.co_posonlyargcount,


much respect! :)

ericsnowcurrently · 2019-05-08T13:37:01Z

Lib/test/test_code.py

+        ct = type(f.__code__)
+        c = ct(code.co_argcount, code.co_posonlyargcount,
+               code.co_kwonlyargcount, code.co_nlocals, code.co_stacksize,
+               code.co_flags, array.array('B', code.co_code),


FWIW, inlining the array hides it. I knew to look for it due to the context of this PR. So it would be worth making a separate variable for it above.

ericsnowcurrently · 2019-05-08T13:40:58Z

Lib/test/test_dis.py

+        except ImportError:
+            return
+        from tempfile import NamedTemporaryFile
+        with NamedTemporaryFile('wb', suffix='.code') as f:


FWIW, "f" as a name is a bit ambiguous in this file. :) However, it isn't confusing here.

ericsnowcurrently · 2019-05-08T13:42:59Z

Objects/codeobject.c

+code_check_buffer(PyObject *code) {
+    Py_buffer codebuffer = {};
+    if (!PyBytes_Check(code)) {
+        if (PyObject_GetBuffer(code, &codebuffer, PyBUF_SIMPLE))


missing braces (see PEP 7)

ericsnowcurrently · 2019-05-08T13:45:02Z

Objects/codeobject.c

@@ -367,7 +382,8 @@ code_new(PyTypeObject *type, PyObject *args, PyObject *kw)
    int firstlineno;
    PyObject *lnotab;

-    if (!PyArg_ParseTuple(args, "iiiiiiSO!O!O!UUiS|O!O!:code",
+
+    if (!PyArg_ParseTuple(args, "iiiiiiOO!O!O!UUiS|O!O!:code",


Just to be sure I understood, you only changed the first "S" to "O" and nothing else, right?

Correct, I wish github had some inline highlighting here

ericsnowcurrently · 2019-05-08T13:49:15Z

Objects/frameobject.c

+        code = (unsigned char *)view.buf;
+        code_len = view.len;
+    } else {
+        return -1;


You need to set an error here, no?

Which test is checking the cases you've added here?

This should really be an unreachable condition - the construction of code objects checks and makes sure we have a reasonable buffer. It should definitely be setting an error, I'll see if I can poke around and produce an error through extra evil measures.

Oh, another possible way to hit this would be in low memory conditions, but there's not great infrastructure to test that either.

And this one seems to be generally unreachable by a well-behaved buffer protocol object.

I usually add an assert(condition) to the code in these cases just so it shows up if we ever manage to trigger such a condition. (Our non-release builds at work for unittests use an assertion enabled interpreter - so it isn't just the python.org buildbots that would notice)

ericsnowcurrently · 2019-05-08T13:54:44Z

Objects/frameobject.c

@@ -199,15 +199,23 @@ frame_setlineno(PyFrameObject *f, PyObject* p_new_lineno, void *Py_UNUSED(ignore
    }

    /* We're now ready to look at the bytecode. */
-    PyBytes_AsStringAndSize(f->f_code->co_code, (char **)&code, &code_len);
+    Py_buffer view = {};


Would it make sense to factor this out to something like a unsigned char * _PyCode_GetBytes(PyObject *) helper? We could use it here and in genobject.c, etc.

Let me think about this... I think the sig needs to actually be unsigned char* _PyCode_GetBytes(PyObject *, Py_ssize_t *len, Py_buffer *view) which I feel like is getting complicated enough that it's not going to save much. But let me see if it actually helps much.

ericsnowcurrently · 2019-05-08T13:57:29Z

Python/ceval.c

+        assert(_Py_IS_ALIGNED(code_view.buf, sizeof(_Py_CODEUNIT)));
+        first_instr = (_Py_CODEUNIT *) code_view.buf;
+    } else {
+        goto exit_eval_frame;


As before, don't we need to set an error here? Where are the two new code paths tested?

Do we even need this case? In the original code this was covered by an assert... I'm not opposed to an exception instead of an insert though if you think this give us a better experience. :)

I think we should handle the case where PyObject_GetBuffer fails as I think it could require allocating, so will definitely add the missing error.

And I was able to hit this case, by closing the memory mapped file and then re-running the function.

ericsnowcurrently · 2019-05-08T14:01:20Z

Python/ceval.c

+        assert(PyBytes_GET_SIZE(co->co_code) % sizeof(_Py_CODEUNIT) == 0);
+        assert(_Py_IS_ALIGNED(PyBytes_AS_STRING(co->co_code), sizeof(_Py_CODEUNIT)));
+        first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code);
+    } else if (!PyObject_GetBuffer(co->co_code, &code_view, PyBUF_SIMPLE)) {


I realize this isn't your fault, but I keep getting thrown off by the return value of PyObject_GetBuffer(). :)

markshannon · 2019-05-08T11:39:36Z

Objects/genobject.c

@@ -345,7 +354,11 @@ _PyGen_yf(PyGenObject *gen)
            return NULL;
        }

-        if (code[f->f_lasti + sizeof(_Py_CODEUNIT)] != YIELD_FROM)
+        char opcode = code[f->f_lasti + sizeof(_Py_CODEUNIT)];
+        if (view.obj != NULL)


Also needs braces

markshannon · 2019-05-08T12:04:14Z

Objects/codeobject.c

+        if (PyObject_GetBuffer(code, &codebuffer, PyBUF_SIMPLE))
+            return 0;
+
+        int contiguous = PyBuffer_IsContiguous(&codebuffer, 'C');


This doesn't seem right.
The buffer should be read-only and one dimensional, not just contiguous.

Well tighten this up to:

codebuffer.readonly &&
codebuffer.ndim == 1 &&
codebuffer.strides == NULL;

I think that should satisfy the readonly check because per the documentation it needs to be consistent for all consumers: https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE

For a well-behaved exporter, PyBUF_SIMPLE implies C, ndim==1, format "B" and readonly.

I know the code base has these (redundant) checks in other places, but at some point we may want to use assert(). I'm not suggesting to eliminate the extra safety check here and now, just clarifying.

the-knights-who-say-ni · 2019-05-22T05:01:47Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

object supporting the buffer protocol. This allows environments which are heavily based on fork and exec to share memory between code objects rather than having it end up on pages which get copy-on-written

matrixise · 2019-09-11T17:58:23Z

Hi @DinoV

Do you want to fix the conflicts? Thanks

methane

Please don't merge this until the benefit is proofed. (see b.p.o)
Currently, the idea is just armchair philosophy.

bedevere-bot · 2019-12-25T13:22:29Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

DinoV requested a review from 1st1 as a code owner May 7, 2019 20:56

the-knights-who-say-ni added the CLA signed label May 7, 2019

bedevere-bot added the awaiting core review label May 7, 2019

DinoV changed the title ~~Support the buffer protocol in code objects~~ Support the buffer protocol in code objects bpo-36839 May 7, 2019

1st1 reviewed May 7, 2019

View reviewed changes

1st1 approved these changes May 7, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels May 7, 2019

ericsnowcurrently approved these changes May 8, 2019

View reviewed changes

markshannon reviewed May 8, 2019

View reviewed changes

DinoV force-pushed the master branch from 31ddb7f to cff6f1e Compare May 22, 2019 05:01

the-knights-who-say-ni removed the CLA signed label May 22, 2019

the-knights-who-say-ni added the CLA not signed label May 22, 2019

DinoV force-pushed the master branch 3 times, most recently from 1788c75 to 3a93c27 Compare May 22, 2019 05:06

the-knights-who-say-ni added CLA not signed and removed CLA not signed labels May 22, 2019

DinoV force-pushed the master branch 2 times, most recently from 4e817e0 to 6b4380b Compare May 22, 2019 23:33

the-knights-who-say-ni added CLA not signed and removed CLA not signed labels May 22, 2019

the-knights-who-say-ni added the CLA not signed label May 23, 2019

DinoV force-pushed the master branch from 6b4380b to b324893 Compare May 23, 2019 18:56

the-knights-who-say-ni added CLA not signed and removed CLA not signed labels May 23, 2019

DinoV force-pushed the master branch 2 times, most recently from d33b039 to f7c2e36 Compare May 23, 2019 19:04

the-knights-who-say-ni added CLA signed and removed CLA not signed labels May 23, 2019

DinoV added 2 commits May 23, 2019 13:09

Adds support for code objects which are backed by any

e000285

object supporting the buffer protocol. This allows environments which are heavily based on fork and exec to share memory between code objects rather than having it end up on pages which get copy-on-written

Cleaning up code review feedback

985e7d0

DinoV force-pushed the master branch from f7c2e36 to 985e7d0 Compare May 23, 2019 20:50

📜🤖 Added by blurb_it.

d6442c3

brettcannon changed the title ~~Support the buffer protocol in code objects bpo-36839~~ bpo-36839: Support the buffer protocol in code objects May 30, 2019

gpshead assigned DinoV Sep 28, 2019

methane requested changes Dec 25, 2019

View reviewed changes

bedevere-bot removed the awaiting merge label Dec 25, 2019

bedevere-bot added the awaiting changes label Dec 25, 2019

ezio-melotti removed the CLA signed label Jul 13, 2022

DinoV closed this Jan 11, 2024

DinoV mentioned this pull request Apr 10, 2022

Support the buffer protocol in code objects #81020

Closed

DinoV deleted the master branch May 31, 2024 18:24

Uh oh!

bpo-36839: Support the buffer protocol in code objects #13177

bpo-36839: Support the buffer protocol in code objects #13177

Uh oh!

Conversation

DinoV commented May 7, 2019 • edited by brettcannon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericsnowcurrently left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skrah May 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

the-knights-who-say-ni commented May 22, 2019

Uh oh!

matrixise commented Sep 11, 2019

Uh oh!

methane left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Dec 25, 2019

Uh oh!

Uh oh!

DinoV commented May 7, 2019 •

edited by brettcannon

Loading

skrah May 29, 2019 •

edited

Loading