Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split compiler into code-gen, optimizer and assembler. #87092

Open
markshannon opened this issue Jan 13, 2021 · 9 comments
Open

Split compiler into code-gen, optimizer and assembler. #87092

markshannon opened this issue Jan 13, 2021 · 9 comments
Assignees

Comments

@markshannon
Copy link
Member

markshannon commented Jan 13, 2021

BPO 42926
Nosy @gvanrossum, @markshannon, @corona10, @brandtbucher, @iritkatriel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-01-13.15:03:23.987>
labels = []
title = 'Split compiler into code-gen, optimizer and assembler.'
updated_at = <Date 2022-02-03.22:11:26.525>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2022-02-03.22:11:26.525>
actor = 'iritkatriel'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2021-01-13.15:03:23.987>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 42926
keywords = ['patch']
message_count = 2.0
messages = ['385033', '385043']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'Mark.Shannon', 'corona10', 'brandtbucher', 'iritkatriel']
pr_nums = ['31116']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42926'
versions = []

@markshannon
Copy link
Member Author

Currently the compiler operates in three main passes:

Code-gen
Optimize
Assemble

The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies.
A basic block CFG is awkward and error-prone for the code-gen, but not very efficient for the optimizer and assembler.

A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume.

code-gen -> (list of instructions) -> optimizer
optimizer -> (list of extended blocks) -> assembler

(Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit)

This would:

  1. Reduce memory use considerably (the size of instruction and block data structures would be about 60% of current)
  2. Be faster (Less CFG management).
  3. Produce better code (extended blocks are a better unit for optimization that basic blocks).
  4. Be easier to maintain:
    a) Code-gen wouldn't have to worry about creating a correct CFG.
    b) The optimizer wouldn't need to handle empty blocks and track which basic blocks form an extended block.

Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer.

@gvanrossum
Copy link
Member

SGTM. But I’m not the one who has to work with it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jul 28, 2022
…compiler's codegen stage does not work directly with basic blocks
@iritkatriel iritkatriel self-assigned this Jul 28, 2022
markshannon pushed a commit that referenced this issue Aug 4, 2022
…er's codegen stage does not work directly with basic blocks (GH-95398)
iritkatriel added a commit that referenced this issue Aug 11, 2022
…he target pointer is only calculated just before optimization stage (GH-95655)
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Aug 24, 2022
@iritkatriel
Copy link
Member

I've been struggling to determine where to draw the line between the optimization stage and the assembly stage.

I think the solution is to add a fourth stage, between optimization and assembly, which prepares the CFG for assembly. It does all the normalisation of pseudo-stuff (instructions, targets, etc) into actual stuff (real opcodes, offsets, etc).

I don't know if there is a name for this stage in compilers parlance. We could call it "resolve instructions" or something like that.

It will include everything to do with:

(1) calculating stackdepth and except targets, replacing exception related opcodes by NOP
(2) moving cold blocks to end of block-list
(3) everything related to making sure line numbers are correct
(4) replacing pseudo-op jumps by actual jumps, including figuring out the direction and translating conditional backwards jumps into real jumps
(5) Finally, calculating the jump offsets

@gvanrossum
Copy link
Member

I don't know what that should be called either, but after this is done, are EXTENDED_ARG prefixes for jumps all set? I'm guessing, maybe make a similar list of responsibilities for the assembler? Because I'm not sure what those are.

@iritkatriel
Copy link
Member

The responsibility of the assembler is to turn a list of instructions to a code object:

(1) write the bytecode for the instructions
(2) create the lineno table
(3) create the exception table
(4) create a code object from those + metadata about the compilation unit.

Bytecode generation in stage (1) adds EXTENDED_ARG bytecodes when an instruction has a large oparg that requires it. The part that calculates jump offsets ((5) in the "resolve" list) takes into account the EXTENDED_ARGs when calculating block sizes.

The idea is that all the complex calculations would be in the new stage, and we can write tests for them. Then the assembler's job is to transform the instructions representation to what we need in the code object (but all the exception/jump targets, lineno, block order etc is already resolved before). Then we can write tests just for this translation stage.

@gvanrossum
Copy link
Member

SG. May the name can be inspired by “settle” or “tidy” or “clean”? Or maybe “resolve” is fine.

@markshannon
Copy link
Member Author

From the list above:
(1) calculating stackdepth and except targets, replacing exception related opcodes by NOP
(2) moving cold blocks to end of block-list
(3) everything related to making sure line numbers are correct
(4) replacing pseudo-op jumps by actual jumps, including figuring out the direction and translating conditional backwards jumps into real jumps
(5) Finally, calculating the jump offsets

And splitting (1) into 1x (exception targets) and 1s (stack depth calculation) and adding O (other cfg optimizations)

In a normal compiler, there are de-sugaring and semantic analysis passes after parsing, but before optimization.
(3) and (1x) belongs in there, as does computing except targets.

(2) belongs in the optimizer

(1s), (4) and (5) belong in the assembler.

Is this a good order of passes? 3, 1x, 2, O, 4, 5, 1s

@iritkatriel
Copy link
Member

I like the idea or moving some of the resolution to before optimisations. I think the order you suggest is fine, by and large. I’ll need to check whether all the line number business is safe to move up front. Some of it involves duplicating blocks,etc.

@iritkatriel
Copy link
Member

The lineno calculation needs to duplicate blocks that have no lineno but more than one predecessors. So it needs to come after mark_reachable, which is currently somewhere in the middle of the optimization stage. So that part of Mark's suggested reordering might be a problem. At least it needs to be done carefully.

iritkatriel added a commit that referenced this issue Sep 20, 2022
carljm added a commit to carljm/cpython that referenced this issue Oct 6, 2022
* main: (66 commits)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896)
  pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383)
  pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759)
  pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251)
  pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662)
  pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644)
  pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756)
  pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778)
  pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826)
  Add re.VERBOSE flag documentation example (python#97678)
  ...
mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Oct 31, 2022
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Mar 24, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Mar 24, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Mar 24, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Mar 24, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Mar 24, 2023
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 11, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 11, 2023
iritkatriel added a commit that referenced this issue Apr 24, 2023
carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
* main: (53 commits)
  pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)
  pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)
  pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)
  pythongh-94300: Update datetime.strptime documentation (python#95318)
  pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)
  pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)
  pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)
  Revert "Add tests for empty range equality (python#103751)" (python#103770)
  pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)
  pythonGH-65022: Fix description of copyreg.pickle function (python#102656)
  pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)
  pythongh-91687: modernize dataclass example typing (python#103773)
  pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)
  pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)
  pythongh-87452: Improve the Popen.returncode docs
  Removed unnecessary escaping of asterisks (python#103714)
  pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)
  pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)
  Add tests for empty range equality (python#103751)
  pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713)
  ...
carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
* superopt: (82 commits)
  pythongh-101517: fix line number propagation in code generated for except* (python#103550)
  pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)
  pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)
  pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)
  pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)
  pythongh-94300: Update datetime.strptime documentation (python#95318)
  pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)
  pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)
  pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)
  Revert "Add tests for empty range equality (python#103751)" (python#103770)
  pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)
  pythonGH-65022: Fix description of copyreg.pickle function (python#102656)
  pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)
  pythongh-91687: modernize dataclass example typing (python#103773)
  pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)
  pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)
  pythongh-87452: Improve the Popen.returncode docs
  Removed unnecessary escaping of asterisks (python#103714)
  pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)
  pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)
  ...
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 27, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 27, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 27, 2023
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants