New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split compiler into code-gen, optimizer and assembler. #87092
Comments
Currently the compiler operates in three main passes: Code-gen The problem is that these passes use the same basic-block based CFG, leading to unnecessary coupling and inefficiencies. A better design would be for the code-gen to create a single linear sequence of instructions. The optimizer would take this and produce a list of extended-blocks for the assembler to consume. code-gen -> (list of instructions) -> optimizer (Extended blocks have a single entry and multiple exits, unlike basic blocks which have a single entry and single exit) This would:
Apart from the changes to the compiler, it would help if we made all branch instructions absolute (or have a backward dual) to accommodate free reordering of blocks in the optimizer. |
SGTM. But I’m not the one who has to work with it. |
…compiler's codegen stage does not work directly with basic blocks
…er's codegen stage does not work directly with basic blocks (GH-95398)
…he target pointer is only calculated just before optimization stage (GH-95655)
I've been struggling to determine where to draw the line between the optimization stage and the assembly stage. I think the solution is to add a fourth stage, between optimization and assembly, which prepares the CFG for assembly. It does all the normalisation of pseudo-stuff (instructions, targets, etc) into actual stuff (real opcodes, offsets, etc). I don't know if there is a name for this stage in compilers parlance. We could call it "resolve instructions" or something like that. It will include everything to do with: (1) calculating stackdepth and except targets, replacing exception related opcodes by NOP |
I don't know what that should be called either, but after this is done, are EXTENDED_ARG prefixes for jumps all set? I'm guessing, maybe make a similar list of responsibilities for the assembler? Because I'm not sure what those are. |
The responsibility of the assembler is to turn a list of instructions to a code object: (1) write the bytecode for the instructions Bytecode generation in stage (1) adds EXTENDED_ARG bytecodes when an instruction has a large oparg that requires it. The part that calculates jump offsets ((5) in the "resolve" list) takes into account the EXTENDED_ARGs when calculating block sizes. The idea is that all the complex calculations would be in the new stage, and we can write tests for them. Then the assembler's job is to transform the instructions representation to what we need in the code object (but all the exception/jump targets, lineno, block order etc is already resolved before). Then we can write tests just for this translation stage. |
SG. May the name can be inspired by “settle” or “tidy” or “clean”? Or maybe “resolve” is fine. |
From the list above: And splitting (1) into 1x (exception targets) and 1s (stack depth calculation) and adding In a normal compiler, there are de-sugaring and semantic analysis passes after parsing, but before optimization. (2) belongs in the optimizer (1s), (4) and (5) belong in the assembler. Is this a good order of passes? |
I like the idea or moving some of the resolution to before optimisations. I think the order you suggest is fine, by and large. I’ll need to check whether all the line number business is safe to move up front. Some of it involves duplicating blocks,etc. |
The lineno calculation needs to duplicate blocks that have no lineno but more than one predecessors. So it needs to come after mark_reachable, which is currently somewhere in the middle of the optimization stage. So that part of Mark's suggested reordering might be a problem. At least it needs to be done carefully. |
…e the CFG optimization stage (GH-96935)
* main: (66 commits) pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879) docs(typing): add "see PEP 675" to LiteralString (python#97926) pythongh-97850: Remove all known instances of module_repr() (python#97876) I changed my surname early this year (python#96671) pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768) pythongh-91539: improve performance of get_proxies_environment (python#91566) build(deps): bump actions/stale from 5 to 6 (python#97701) pythonGH-95172 Make the same version `versionadded` oneline (python#95172) pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073) pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774) pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896) pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383) pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759) pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251) pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662) pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644) pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756) pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778) pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826) Add re.VERBOSE flag documentation example (python#97678) ...
…embler organisation (pythonGH-97644)
…ed from assembler stage (GH-99869)
* main: (112 commits) pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895) pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613) Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917) pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916) pythongh-89189: More compact range iterator (pythonGH-27986) bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491) pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906) pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850) pythongh-99845: Use size_t type in __sizeof__() methods (python#99846) pythonGH-99877) Fix typo in exception message in `multiprocessing.pool` (python#99900) pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869) pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893) pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825) pythonGH-81057: remove static state from suggestions.c (python#99411) Improve zip64 limit error message (python#95892) pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591) pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128) pythongh-82836: fix private network check (python#97733) Docs: improve accuracy of socketserver reference (python#24767) ...
markshannon commentedJan 13, 2021
•
edited by bedevere-bot
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: