Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.11] gh-93382: Cache result of PyCode_GetCode in codeobject (GH-93383) #93493

Open
wants to merge 3 commits into
base: 3.11
Choose a base branch
from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jun 4, 2022

(cherry picked from commit d52ffc1)

Fidget-Spinner and others added 2 commits Jun 4, 2022
…nGH-93383)

Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
@Fidget-Spinner Fidget-Spinner requested a review from markshannon as a code owner Jun 4, 2022
@Fidget-Spinner Fidget-Spinner changed the title [3.11] Cache result of PyCode_GetCode in codeobject (GH-93383) [3.11] gh-93382: Cache result of PyCode_GetCode in codeobject (GH-93383) Jun 4, 2022
@Fidget-Spinner Fidget-Spinner requested a review from brandtbucher Jun 4, 2022
@Fidget-Spinner
Copy link
Member Author

@Fidget-Spinner Fidget-Spinner commented Jun 4, 2022

@pablogsal and @nedbat

This PR improves coverage performance on 3.11 by 5-7%. Using Ned's benchmarks

# 3.11 branch at 1497d7fdefff8207b8ccde82e96b6f471416c284
Median for bm_sudoku.py, python3.11, cov=none: 10.476s
Median for bm_sudoku.py, python3.11, cov=6.4.1: 69.124s
Median for bm_spectral_norm.py, python3.11, cov=none: 9.072s
Median for bm_spectral_norm.py, python3.11, cov=6.4.1: 72.143s

# 3.11 co_code_cached
Median for bm_sudoku.py, python3.11, cov=none: 10.363s
Median for bm_sudoku.py, python3.11, cov=6.4.1: 64.726s
Median for bm_spectral_norm.py, python3.11, cov=none: 9.325s
Median for bm_spectral_norm.py, python3.11, cov=6.4.1: 69.713s

An observation: bm_spectral_norm improved less than bm_sudoku because the size of its co_code is smaller. So the cost of getting a new one every time isn't as high. I think real-world code sizes are more likely to be like bm_sudoku or larger. So I'd guestimate we will see ~10% improvement in real-world code running coverage in 3.11.

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Jun 4, 2022

Unfortunately this means that whatever is making Python 3.11 slower with coverage, unfortunately is not only this :(

Great investigation @Fidget-Spinner and thanks for working on this ♥️

I will try to review ASAP but it would be great if @markshannon @iritkatriel @brandtbucher or @ericsnowcurrently can take a look.

Copy link
Member

@brandtbucher brandtbucher left a comment

I'm on mobile right now (probably can't get to a computer today), but here are a few thoughts based on a first look:

@@ -88,6 +88,7 @@ typedef uint16_t _Py_CODEUNIT;
PyObject *co_qualname; /* unicode (qualname, for reference) */ \
PyObject *co_linetable; /* bytes object that holds location info */ \
PyObject *co_weakreflist; /* to support weakrefs to code objects */ \
void *_co_code; /* cached co_code object/attribute */ \
Copy link
Member

@brandtbucher brandtbucher Jun 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not PyObject *?

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as why co_extra is void *:

/*
       Type is a void* to keep the format private in codeobject.c to force     \
       people to go through the proper APIs. */

I also didn't want people thinking they can just replace their current co_code accesses with _co_code. So this should force them to use PyCode_GetCode or PyObject_GetAttrString(co, "co_code")

PyObject *code = PyBytes_FromStringAndSize((const char *)_PyCode_CODE(co),
_PyCode_NBYTES(co));
if (code == NULL) {
return NULL;
}
deopt_code((_Py_CODEUNIT *)PyBytes_AS_STRING(code), Py_SIZE(co));
assert(co->_co_code == NULL);
Copy link
Member

@brandtbucher brandtbucher Jun 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct... could the allocation above have triggered a garbage collection that could have ended up populating this member? I'd be more confident just removing this assert and using Py_XSETREF or something instead.

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I'm not sure how that would work. But I see your point.

Copy link
Contributor

@kumaraditya303 kumaraditya303 Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes objects are not gc tracked (no PyGC_Head) so its allocation cannot trigger a gc collection as it uses PyObject_Malloc not PyObject_GC_New.
Also this PR is a backport so it would be better to not deviate it from the 3.12 PR else both should be changed.

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Jun 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backport part isn't an issue because I can just fix it on the main branch.

Anyways, I'll wait for Brandt's reply to see what he says.

@nedbat
Copy link
Member

@nedbat nedbat commented Jun 5, 2022

I also ran the benchmarks using #93493:

cov proj python3.10 python3.11 gh93493 3.11 vs 3.10 gh93493 vs 3.10
none bug1339.py 0.193 s 0.155 s 0.143 s 0.803 0.743
none bm_sudoku.py 10.686 s 10.393 s 11.867 s 0.973 1.111
none bm_spectral_norm.py 16.051 s 10.940 s 10.987 s 0.682 0.684
6.4.1 bug1339.py 0.439 s 0.842 s 0.771 s 1.918 1.757
6.4.1 bm_sudoku.py 30.148 s 61.392 s 61.606 s 2.036 2.043
6.4.1 bm_spectral_norm.py 40.672 s 79.562 s 73.221 s 1.956 1.800

It's a slight improvement, but isn't solving the problem.

@Fidget-Spinner
Copy link
Member Author

@Fidget-Spinner Fidget-Spinner commented Jun 5, 2022

It's a slight improvement, but isn't solving the problem.

Yeah I'm aware. I sent this PR in because 10% improvement on macrobenchmarks is still something! Even cProfile slowed down by 60% when profiling code in 3.11. My hunch is that accessing the full PyFrameObject is signifcantly more expensive now in 3.11. However, I can't fix that because it's part of the tracing Py_tracefunc C API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants