Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-44187: Quickening infrastructure #26264

Merged
merged 30 commits into from Jun 7, 2021

Conversation

@markshannon
Copy link
Contributor

@markshannon markshannon commented May 20, 2021

First step toward implementing PEP 659.

https://bugs.python.org/issue44187

@markshannon markshannon requested a review from python/windows-team as a code owner May 20, 2021
Python/specialize.c Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
@bratao
Copy link

@bratao bratao commented May 22, 2021

@markshannon one thing I noticed is the introduction of HotPy prefix that do not exist on CPython. I understand that the heritage of this code based on some previous projects.
But would not be better to simply call PyCacheEntry, for example? I can see some developer trying to make sense of the Hot prefix, as it have a meaning in JITs.

Include/internal/pycore_code.h Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Copy link
Member

@gvanrossum gvanrossum left a comment

Here's my first round of comments; I haven't gotten to specialize.c yet, but I figured I'd send this in case I don't get to that tonight (which is likely).

Include/cpython/code.h Outdated Show resolved Hide resolved
Include/cpython/code.h Outdated Show resolved Hide resolved
Include/cpython/code.h Show resolved Hide resolved
Include/cpython/code.h Outdated Show resolved Hide resolved
Include/internal/pycore_code.h Outdated Show resolved Hide resolved
Include/internal/pycore_code.h Show resolved Hide resolved
Include/internal/pycore_code.h Outdated Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Copy link
Member

@gvanrossum gvanrossum left a comment

Okay, I got through everything this time, and I think I understand it. Some suggestions to make it easier to understand for new readers.

Python/specialize.c Show resolved Hide resolved
Python/specialize.c Show resolved Hide resolved
static uint8_t adaptive[256] = { 0 };

static uint8_t cache_requirements[256] = { 0 };
Comment on lines 40 to 42

This comment has been minimized.

@gvanrossum

gvanrossum May 26, 2021
Member

These variables could use some comment explaining their purpose. (Also, maybe we should plan to generate these from info added to opcode.py, like opcode_targets.h?

cache_offset = i/2;
}
else if (oparg > 255) {
/* Cannot access required cache_offset */

This comment has been minimized.

@gvanrossum

gvanrossum May 26, 2021
Member

Maybe in some kind of debug mode it would be nice to report whether this happens at all? If we see this frequently we need to change the strategy. OTOH maybe we never expect it and we could put assert(0) here???

Python/specialize.c Show resolved Hide resolved
/* Cannot access required cache_offset */
continue;
}
cache_offset += need;

This comment has been minimized.

@gvanrossum

gvanrossum May 26, 2021
Member

Looks like this will over-count if there are eligible opcodes with an EXTENDED_ARG prefix.

Python/specialize.c Show resolved Hide resolved
Python/ceval.c Show resolved Hide resolved
@@ -1343,6 +1343,14 @@ eval_frame_handle_pending(PyThreadState *tstate)
#define JUMPTO(x) (next_instr = first_instr + (x))
#define JUMPBY(x) (next_instr += (x))

/* Get opcode and opcode from original instructions, not quickened form. */

This comment has been minimized.

@iritkatriel

iritkatriel May 26, 2021
Member

Suggested change
/* Get opcode and opcode from original instructions, not quickened form. */
/* Get opcode and oparg from original instructions, not quickened form. */
Include/internal/pycore_code.h Show resolved Hide resolved
return &last_cache_plus_one[-1-n].entry;
}

/* Following two functions determine the index of a cache entry from the

This comment has been minimized.

@iritkatriel

iritkatriel May 26, 2021
Member

This comment doesn't seem correct - they take the index and return oparg or offset.

@markshannon
Copy link
Contributor Author

@markshannon markshannon commented May 27, 2021

@gvanrossum , @iritkatriel I think I've addressed all your comments.
Specifically, I've:

  • Factored out the common code in optimize and entries_needed
  • Made sure that previous_opcode is correct to avoid super-instructions stomping on adaptive ones
  • Added more checks for EXTENDED_ARG to avoid wasting memory
  • Fixed up lots of comments.
  • Added an extended comment showing the layout of the quickening data
Include/internal/pycore_code.h Outdated Show resolved Hide resolved
Include/internal/pycore_code.h Outdated Show resolved Hide resolved
Copy link
Member

@gvanrossum gvanrossum left a comment

Looking good. Maybe we should ask Dino to have a look? (Between this PR and PEP 659 he should have enough to judge the design, right?)

Also, please look at the compiler warnings found by GitHub Actions for Windows (x64).

Include/internal/pycore_code.h Show resolved Hide resolved
<instr 0> <instr 1> <instr 2> <instr 3> <--- co->co_first_instr
<instr 4> <instr 5> <instr 6> <instr 7>
...
<instr N-1>

This comment has been minimized.

@gvanrossum

gvanrossum May 28, 2021
Member

Maybe M-1? The number of cache entries doesn't have to match the number of instructions.

Python/specialize.c Show resolved Hide resolved
}
previous_opcode = opcode;
}
return cache_offset+1;

This comment has been minimized.

@gvanrossum

gvanrossum May 28, 2021
Member

Suggested change
return cache_offset+1;
return cache_offset + 1; // One extra for the count entry
@markshannon
Copy link
Contributor Author

@markshannon markshannon commented Jun 1, 2021

I've been implementing some specialization of LOAD_ATTR in another branch and from that it became clear that two more things were needed:

  1. We need to use the next index (index + 1) for efficiency as that is what is available in the interpreter.
  2. We need to initialize the adaptive cache for adaptive instructions (or they crash).

I've implemented those in the last three commits.

gvanrossum added a commit to faster-cpython/tools that referenced this pull request Jun 1, 2021
}
Py_ssize_t size = PyBytes_GET_SIZE(code->co_code);
int instr_count = (int)(size/sizeof(_Py_CODEUNIT));
if (instr_count > MAX_SIZE_TO_QUICKEN) {

This comment has been minimized.

@iritkatriel

iritkatriel Jun 5, 2021
Member

Would it be possible instead to quicken the first 5000 instructions and then exit? (That would avoid a cliff where a minor change in the code tips it over the no-optimization limit and makes it run much slower.)

This comment has been minimized.

@bluss

bluss Jun 6, 2021

There's a memory cost to creating a duplicate code array, so it would risk wasting memory unproportionally.

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Jun 6, 2021

🤖 New build scheduled with the buildbot fleet by @markshannon for commit ec50298 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Jun 7, 2021

🤖 New build scheduled with the buildbot fleet by @markshannon for commit ab3a30b 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

@markshannon
Copy link
Contributor Author

@markshannon markshannon commented Jun 7, 2021

Windows tests were failing before this PR.
We seem to be right on the edge of running out of stack on Windows when up against the recursion limit.

@markshannon markshannon merged commit 001eb52 into python:main Jun 7, 2021
69 of 72 checks passed
69 of 72 checks passed
@github-actions
Docs
Details
@github-actions
Check for source changes
Details
@github-actions
Check if generated files are up to date
Details
@github-actions
Windows (x86)
Details
@github-actions
Windows (x64)
Details
@github-actions
macOS
Details
@github-actions
Ubuntu
Details
@github-actions
Ubuntu SSL tests with OpenSSL
Details
@bedevere-bot
buildbot/AMD64 Windows10 PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Windows10 Pro PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Windows8.1 Refleaks PR Build started.
Details
Azure Pipelines PR #20210607.16 succeeded
Details
@travis-ci
Travis CI - Pull Request Build Passed
Details
@bedevere-bot
bedevere/issue-number Issue number 44187 found
Details
@bedevere-bot
bedevere/news News entry found in Misc/NEWS.d
@bedevere-bot
buildbot/AMD64 Arch Linux Asan Debug PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Arch Linux Asan PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Arch Linux TraceRefs PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Debian PGO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Debian root PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Fedora Stable Clang Installed PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Fedora Stable Clang PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Fedora Stable LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Fedora Stable LTO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Fedora Stable PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Fedora Stable Refleaks PR Build done.
Details
@bedevere-bot
buildbot/AMD64 FreeBSD Non-Debug PR Build done.
Details
@bedevere-bot
buildbot/AMD64 FreeBSD Shared PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL7 LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL7 LTO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL7 PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL7 Refleaks PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL8 LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL8 LTO PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL8 PR Build done.
Details
@bedevere-bot
buildbot/AMD64 RHEL8 Refleaks PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Ubuntu Shared PR Build done.
Details
@bedevere-bot
buildbot/AMD64 Windows8.1 Non-Debug PR Build done.
Details
@bedevere-bot
buildbot/PPC64 Fedora PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE Fedora Stable Clang Installed PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE Fedora Stable Clang PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE Fedora Stable LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE Fedora Stable LTO PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE Fedora Stable PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE Fedora Stable Refleaks PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL7 LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL7 LTO PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL7 PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL7 Refleaks PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL8 LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL8 LTO PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL8 PR Build done.
Details
@bedevere-bot
buildbot/PPC64LE RHEL8 Refleaks PR Build done.
Details
@bedevere-bot
buildbot/s390x Debian PR Build done.
Details
@bedevere-bot
buildbot/s390x Fedora Clang Installed PR Build done.
Details
@bedevere-bot
buildbot/s390x Fedora Clang PR Build done.
Details
@bedevere-bot
buildbot/s390x Fedora LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/s390x Fedora LTO PR Build done.
Details
@bedevere-bot
buildbot/s390x Fedora PR Build done.
Details
@bedevere-bot
buildbot/s390x Fedora Refleaks PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL7 LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL7 LTO PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL7 PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL7 Refleaks PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL8 LTO + PGO PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL8 LTO PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL8 PR Build done.
Details
@bedevere-bot
buildbot/s390x RHEL8 Refleaks PR Build done.
Details
@bedevere-bot
buildbot/s390x SLES PR Build done.
Details
@bedevere-bot
buildbot/x86 Gentoo Installed with X PR Build done.
Details
@bedevere-bot
buildbot/x86 Gentoo Non-Debug with X PR Build done.
Details
@bedevere-bot
buildbot/x86-64 macOS PR Build done.
Details
@markshannon markshannon deleted the faster-cpython:quickening-infrastructure branch Jun 7, 2021
@@ -73,9 +73,10 @@ def get_pooled_int(value):
alloc_deltas = [0] * repcount
fd_deltas = [0] * repcount
getallocatedblocks = sys.getallocatedblocks
getallocatedblocks = sys.getallocatedblocks

This comment has been minimized.

This comment has been minimized.

@markshannon

markshannon Jun 9, 2021
Author Contributor

Typo. #26624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants