GH-91719: Make MSVC generate somewhat faster switch code #91718

gvanrossum · 2022-04-20T01:28:20Z

Apparently a switch on an 8-bit quantity where all cases are
present generates a more efficient jump (doing only one indexed
memory load instead of two).

See faster-cpython/ideas#321 (comment)

Apparently a switch on an 8-bit quantity where all cases are present generates a more efficient jump (doing only one indexed memory load instead of two). See faster-cpython/ideas#321 (comment)

markshannon · 2022-04-20T15:18:50Z

Would it make more sense to redefine opcode to be a uint8_t, rather than casting it?

We should probably make use_tracing an 8 bit unsigned integer as well.

gvanrossum · 2022-04-20T15:38:33Z

Would it make more sense to redefine opcode to be a uint8_t, rather than casting it?

Yeah, I had considered that, it makes sense. I'll confirm that it has the same effect.

We should probably make use_tracing an 8 bit unsigned integer as well.

I don't see why -- it's not used in a similar switch AFAICT, and it's not cramped for space in its struct. I assume for most other operations the cost of loading an int and loading a byte is effectively the same, since the CPU has to load a whole cache line (32 or 64 bytes) anyway.

gvanrossum · 2022-04-20T15:39:24Z

I don't believe this needs a news blurb.

markshannon · 2022-04-20T15:42:46Z

I don't see why -- it's not used in a similar switch AFAICT, and it's not cramped for space in its struct. I assume for most other operations the cost of loading an int and loading a byte is effectively the same, since the CPU has to load a whole cache line (32 or 64 bytes) anyway.

The dispatch sequence includes opcode |= cframe.use_tracing.
If cframe.use_tracing is a 32 bit int, then the compiler needs to add a cast.
If it is the same type as opcode, it does not.

gvanrossum · 2022-04-20T18:14:01Z

The dispatch sequence includes opcode |= cframe.use_tracing.
If cframe.use_tracing is a 32 bit int, then the compiler needs to add a cast.
If it is the same type as opcode, it does not.

Okay, I'll make that change.

gvanrossum · 2022-04-20T19:07:47Z

@markshannon, please re-review. I confirmed that the switch still uses a single indirection (goto *(base + offset_table[opcode])). I also found that the opcode |= use_tracing is a single instruction (but maybe it always was one?).

Python/ceval.c

Tools/scripts/generate_opcode_h.py

markshannon · 2022-04-21T09:12:26Z

Looks good to me

Make MSVC generate somewhat faster switch code

ece341c

Apparently a switch on an 8-bit quantity where all cases are present generates a more efficient jump (doing only one indexed memory load instead of two). See faster-cpython/ideas#321 (comment)

gvanrossum requested a review from markshannon as a code owner Apr 20, 2022

bedevere-bot added the awaiting core review label Apr 20, 2022

gvanrossum changed the title ~~Make MSVC generate somewhat faster switch code~~ GH-91719: Make MSVC generate somewhat faster switch code Apr 20, 2022

gvanrossum mentioned this pull request Apr 20, 2022

Measure Windows performance (and improve if lacking) faster-cpython/ideas#321

Open

gvanrossum added the skip news label Apr 20, 2022

Make opcode and use_tracing uint8_t

51ec108

erlend-aasland reviewed Apr 20, 2022

View changes

Python/ceval.c Outdated Show resolved Hide resolved

Tools/scripts/generate_opcode_h.py Outdated Show resolved Hide resolved

gvanrossum added 2 commits Apr 20, 2022

Fix/move comment about opcode and switch

6603d85

No need to check for opcode 255 any more

558d2c4

gvanrossum merged commit f8dc618 into python:main Apr 21, 2022
12 checks passed

bedevere-bot removed the awaiting core review label Apr 21, 2022

python / cpython Public

GH-91719: Make MSVC generate somewhat faster switch code #91718

GH-91719: Make MSVC generate somewhat faster switch code #91718

gvanrossum commented Apr 20, 2022

markshannon commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

markshannon commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

markshannon commented Apr 21, 2022

python / cpython Public

GH-91719: Make MSVC generate somewhat faster switch code #91718

GH-91719: Make MSVC generate somewhat faster switch code #91718

Conversation

gvanrossum commented Apr 20, 2022

markshannon commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

markshannon commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

gvanrossum commented Apr 20, 2022

markshannon commented Apr 21, 2022