New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split compiler into code-gen, optimizer and assembler. #87092
Comments
Currently the compiler operates in three main passes: Code-gen The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies. A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume. code-gen -> (list of instructions) -> optimizer (Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit) This would:
Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer. |
SGTM. But I’m not the one who has to work with it. |
…compiler's codegen stage does not work directly with basic blocks
…er's codegen stage does not work directly with basic blocks (GH-95398)
…he target pointer is only calculated just before optimization stage (GH-95655)
I've been struggling to determine where to draw the line between the optimization stage and the assembly stage. I think the solution is to add a fourth stage, between optimization and assembly, which prepares the CFG for assembly. It does all the normalisation of pseudo-stuff (instructions, targets, etc) into actual stuff (real opcodes, offsets, etc). I don't know if there is a name for this stage in compilers parlance. We could call it "resolve instructions" or something like that. It will include everything to do with: (1) calculating stackdepth and except targets, replacing exception related opcodes by NOP |
I don't know what that should be called either, but after this is done, are EXTENDED_ARG prefixes for jumps all set? I'm guessing, maybe make a similar list of responsibilities for the assembler? Because I'm not sure what those are. |
The responsibility of the assembler is to turn a list of instructions to a code object: (1) write the bytecode for the instructions Bytecode generation in stage (1) adds EXTENDED_ARG bytecodes when an instruction has a large oparg that requires it. The part that calculates jump offsets ((5) in the "resolve" list) takes into account the EXTENDED_ARGs when calculating block sizes. The idea is that all the complex calculations would be in the new stage, and we can write tests for them. Then the assembler's job is to transform the instructions representation to what we need in the code object (but all the exception/jump targets, lineno, block order etc is already resolved before). Then we can write tests just for this translation stage. |
SG. May the name can be inspired by “settle” or “tidy” or “clean”? Or maybe “resolve” is fine. |
From the list above: And splitting (1) into 1x (exception targets) and 1s (stack depth calculation) and adding In a normal compiler, there are de-sugaring and semantic analysis passes after parsing, but before optimization. (2) belongs in the optimizer (1s), (4) and (5) belong in the assembler. Is this a good order of passes? |
I like the idea or moving some of the resolution to before optimisations. I think the order you suggest is fine, by and large. I’ll need to check whether all the line number business is safe to move up front. Some of it involves duplicating blocks,etc. |
The lineno calculation needs to duplicate blocks that have no lineno but more than one predecessors. So it needs to come after mark_reachable, which is currently somewhere in the middle of the optimization stage. So that part of Mark's suggested reordering might be a problem. At least it needs to be done carefully. |
…e the CFG optimization stage (GH-96935)
* main: (66 commits) pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879) docs(typing): add "see PEP 675" to LiteralString (python#97926) pythongh-97850: Remove all known instances of module_repr() (python#97876) I changed my surname early this year (python#96671) pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768) pythongh-91539: improve performance of get_proxies_environment (python#91566) build(deps): bump actions/stale from 5 to 6 (python#97701) pythonGH-95172 Make the same version `versionadded` oneline (python#95172) pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073) pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774) pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896) pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383) pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759) pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251) pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662) pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644) pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756) pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778) pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826) Add re.VERBOSE flag documentation example (python#97678) ...
…embler organisation (pythonGH-97644)
…ompile.c there, so they can be shared
…, which do not need the compiler struct (python#102562)
…, which do not need the compiler struct (python#102562)
* main: (53 commits) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) Add tests for empty range equality (python#103751) pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713) ...
* superopt: (82 commits) pythongh-101517: fix line number propagation in code generated for except* (python#103550) pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782) pythongh-102498 Clean up unused variables and imports in the email module (python#102482) pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244) pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365) pythongh-94300: Update datetime.strptime documentation (python#95318) pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778) pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456) pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775) Revert "Add tests for empty range equality (python#103751)" (python#103770) pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519) pythonGH-65022: Fix description of copyreg.pickle function (python#102656) pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324) pythongh-91687: modernize dataclass example typing (python#103773) pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747) pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769) pythongh-87452: Improve the Popen.returncode docs Removed unnecessary escaping of asterisks (python#103714) pythonGH-102973: Slim down Fedora packages in the dev container (python#103283) pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095) ...
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: