We need to be consistent in our use of instruction/codeunit/bytecode/opcode, etc.

**Documentation**

We use the terms opcode, bytecode, instruction, and codeunit, in the code, comments and documentation.

However we aren't consistent, nor do we define those terms properly anywhere. The best docs are in `dis.rst` which is the wrong place for them.

#### A glossary

First of all we want some sort of glossary like this:
* Instruction. The element of execution used by the front end to describe execution. All instructions have a *name*. Most, but not all, also have an *operand*
* Execution-Unit: These can be considered to be the "real instructions" used by the interpreter. The assembler converts each *instruction* into zero or more *execution-units*. Instructions that are converted to anything but one *execution-unit* with the same name are called "pseudo-instructions".
* Code-Unit: A pair of bytes consisting of an *opcode* and *oparg*. In the *bytecode*, an *execution-unit* is represented by one or more *codeunits*.
* Bytecode: A sequence of *codeunits* that represents the code of a function, class or module (or other code entity).

Representation of instruction at runtime:
The assembler converts each instruction to zero or more *execution-units*, and each of those are converted to one or more *code-units*
An *execution-unit* is composed of:
* Zero or more operand extensions. These are code units whose `opcode == EXTENDED_ARG` and whose oparg is 8 of the high bits of the instruction's operand.
* One core code unit, whose `opcode` represents the *name* of the instruction, and whose `oparg == (opcode & 255)`
* Zero or more cache entries. The exact number depends on the *execution-unit* *name* and is exactly determined by that *name*.

Although the bytecode, `co.co_code`, is presented as a sequence of bytes, it should be viewed as a sequence of codeunits, with the *opcode* preceding the *oparg*. The `dis` module will disassemble *bytecode* to a list of *codeunits*.

#### Why do this?

Doing this will expose inconsistencies in our terminology and tools and allow us to consider better tooling in the future.

For example, shouldn't dis output a list of **instructions**, not codeunits? 

Could we support an assembler, allowing backwards compatible assembly code?
We could convert a list of 3.10 instructions to 3.11 bytecode. At the instruction level, they aren't so different, even though the bytecode is quite different.

The set of names is infinite, allowing us more flexibility to add new instructions, and support old ones.

#### Examples

The `BINARY_ADD` instruction is also an execution-unit in 3.10, but could be a pseudo-instruction in 3.11+
Likewise `SETUP_FINALLY`. The difference is that the 3.11 front-end emits `SETUP_FINALLY`, but not `BINARY_ADD`.

*Instruction: `LOAD_METHOD "spam"` 
*Execution unit: `LOAD_ATTR 515`
*Code units: `EXTENDED_ARG 2` `LOAD_ATTR 3` `CACHE 0`*6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

We need to be consistent in our use of instruction/codeunit/bytecode/opcode, etc. #94437

A glossary

Why do this?

Examples

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

We need to be consistent in our use of instruction/codeunit/bytecode/opcode, etc. #94437

Description

A glossary

Why do this?

Examples

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions