Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate the interpreter #98831

Open
gvanrossum opened this issue Oct 28, 2022 · 0 comments
Open

Generate the interpreter #98831

gvanrossum opened this issue Oct 28, 2022 · 0 comments
Assignees
Labels
3.12 interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@gvanrossum
Copy link
Member

gvanrossum commented Oct 28, 2022

Overview

Over in faster-cpython/ideas we've been exploring the idea of generating the interpreter from a set of instruction definitions. This will eventually enable having multiple versions of the interpreter (e.g. with and without tracing enabled), and it will allow us to automatically combine instructions using a powerful notation (e.g. super(LOAD_FAST__LOAD_FAST) = LOAD_FAST + LOAD_FAST;). It will also let us auto-generate things like stack_effect().

We are planning to land at least an early version of this work in 3.12. We have a tentative grammar for instruction definition DSL -- which will undoubtedly undergo several iterations before we've settled. We have a first draft of the tooling ready for review (which currently reproduces the status quo) done.

Once the first version of the tooling has landed we expect to iterate quickly, using this issue an umbrella issue for our PRs to link to.

References

@gvanrossum gvanrossum added the type-feature A feature request or enhancement label Oct 28, 2022
@kumaraditya303 kumaraditya303 added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.12 labels Oct 29, 2022
@gvanrossum gvanrossum self-assigned this Oct 31, 2022
@gvanrossum gvanrossum changed the title Generating the interpreter Generate the interpreter Oct 31, 2022
gvanrossum added a commit that referenced this issue Nov 3, 2022
The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code).

The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md.

This is surely a work-in-progress. An easy next step could be auto-generating super-instructions.

**IMPORTANT: Merge Conflicts**

If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically).
gvanrossum added a commit that referenced this issue Nov 3, 2022
gvanrossum added a commit that referenced this issue Nov 4, 2022
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
gvanrossum added a commit that referenced this issue Nov 6, 2022
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
gvanrossum added a commit that referenced this issue Nov 10, 2022
)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
gvanrossum added a commit to gvanrossum/cpython that referenced this issue Nov 10, 2022
python#99271)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
ethanfurman pushed a commit to ethanfurman/cpython that referenced this issue Nov 12, 2022
python#99271)

Also mark those opcodes that have no stack effect as such.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
gvanrossum added a commit that referenced this issue Nov 18, 2022
Also complete cache effects for BINARY_SUBSCR family.
gvanrossum added a commit that referenced this issue Nov 23, 2022
Newly supported interpreter definition syntax:
- `op(NAME, (input_stack_effects -- output_stack_effects)) { ... }`
- `macro(NAME) = OP1 + OP2;`

Also some other random improvements:
- Convert `WITH_EXCEPT_START` to use stack effects
- Fix lexer to balk at unrecognized characters, e.g. `@`
- Fix moved output names; support object pointers in cache
- Introduce `error()` method to print errors
- Introduce read_uint16(p) as equivalent to `*p`

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
gvanrossum added a commit that referenced this issue Dec 8, 2022
Stack effects can now have a type, e.g. `inst(X, (left, right -- jump/uint64_t)) { ... }`.

Instructions converted to the non-legacy format:

* COMPARE_OP
* COMPARE_OP_FLOAT_JUMP
* COMPARE_OP_INT_JUMP
* COMPARE_OP_STR_JUMP
* STORE_ATTR
* DELETE_ATTR
* STORE_GLOBAL
* STORE_ATTR_INSTANCE_VALUE
* STORE_ATTR_WITH_HINT
* STORE_ATTR_SLOT, and complete the store_attr family
* Complete the store_subscr family: STORE_SUBSCR{,DICT,LIST_INT}
  (STORE_SUBSCR was alread half converted,
  but wasn't using cache effects yet.)
* DELETE_SUBSCR
* PRINT_EXPR
* INTERPRETER_EXIT (a bit weird, ends in return)
* RETURN_VALUE
* GET_AITER (had to restructure it some)
  The original had mysterious `SET_TOP(NULL)` before `goto error`.
  I assume those just account for `obj` having been decref'ed,
  so I got rid of them in favor of the cleanup implied by `ERROR_IF()`.
* LIST_APPEND (a bit unhappy with it)
* SET_ADD (also a bit unhappy with it)

Various other improvements/refactorings as well.
gvanrossum added a commit that referenced this issue Dec 8, 2022
This makes it easier to see what changed in the generated code
when converting an instruction to super or macro.
gvanrossum added a commit that referenced this issue Dec 17, 2022
…#100205)

The presence of this macro indicates that a particular instruction
may be considered for conversion to a register-based format
(see faster-cpython/ideas#485).

An invariant (currently unchecked) is that `DEOPT_IF()` may only
occur *before* `DECREF_INPUTS()`, and `ERROR_IF()` may only occur
*after* it. One reason not to check this is that there are a few
places where we insert *two* `DECREF_INPUTS()` calls, in different
branches of the code. The invariant checking would have to be able
to do some flow control analysis to understand this.

Note that many instructions, especially specialized ones,
can't be converted to use this macro straightforwardly.
This is because the generator currently only generates plain
`Py_DECREF(variable)` statements, and cannot generate
things like `_Py_DECREF_SPECIALIZED()` let alone deal with
`_PyList_AppendTakeRef()`.
gvanrossum added a commit to gvanrossum/cpython that referenced this issue Dec 17, 2022
… input (python#100205)

The presence of this macro indicates that a particular instruction
may be considered for conversion to a register-based format
(see faster-cpython/ideas#485).

An invariant (currently unchecked) is that `DEOPT_IF()` may only
occur *before* `DECREF_INPUTS()`, and `ERROR_IF()` may only occur
*after* it. One reason not to check this is that there are a few
places where we insert *two* `DECREF_INPUTS()` calls, in different
branches of the code. The invariant checking would have to be able
to do some flow control analysis to understand this.

Note that many instructions, especially specialized ones,
can't be converted to use this macro straightforwardly.
This is because the generator currently only generates plain
`Py_DECREF(variable)` statements, and cannot generate
things like `_Py_DECREF_SPECIALIZED()` let alone deal with
`_PyList_AppendTakeRef()`.
shihai1991 added a commit to shihai1991/cpython that referenced this issue Dec 18, 2022
* origin/main: (1306 commits)
  Correct CVE-2020-10735 documentation (python#100306)
  pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)
  pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)
  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)
  pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)
  pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
  pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)
  pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)
  pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302)
  "Compound statement" docs: Fix with-statement step indexing (python#100286)
  pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
  Improve stats presentation for calls. (pythonGH-100274)
  Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295)
  pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277)
  Document that zipfile's pwd parameter is a `bytes` object (python#100209)
  pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271)
  Fix typo in introduction.rst (python#100266)
  ...
carljm added a commit to carljm/cpython that referenced this issue Dec 19, 2022
* main:
  pythongh-89727: Fix os.walk RecursionError on deep trees (python#99803)
  Docs: Don't upload CI artifacts (python#100330)
  pythongh-94912: Added marker for non-standard coroutine function detection (python#99247)
  Correct CVE-2020-10735 documentation (python#100306)
  pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)
  pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)
  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)
  pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)
  pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)
  pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants