Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-92777: Add LOAD_METHOD_LAZY_DICT #92778

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented May 13, 2022

Fixes #92777. Specialize LOAD_METHOD for lazy dictionaries. This accounts for 40% of the misses.

I'm sad that I missed 3.11 beta freeze for this specialization. It's straightforward and is likely to account for the majority of LOAD_METHOD in real world code since lazy __dict__ is now commonplace.

@Fidget-Spinner
Copy link
Member Author

@Fidget-Spinner Fidget-Spinner commented May 13, 2022

Hah, looks like I was wrong, it wasn't that straightforward after all :).

@AlexWaygood AlexWaygood added type-feature performance labels May 13, 2022
Copy link
Member

@markshannon markshannon left a comment

A few minor issues, but generally looks good.
What are the stats for the LOAD_METHOD_LAZY_DICT instruction?

Python/ceval.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
@Fidget-Spinner
Copy link
Member Author

@Fidget-Spinner Fidget-Spinner commented May 13, 2022

How do you collect stats for pyperformance and create that nice table on faster-cpython? I'm frankly clueless (I only know how to use the one that dumps stats out to the terminal or file). Sorry.

On test suite code, I get a 0.3% improvement on hits and 0.6% more misses. But I want to point out that typing loves messing around with __dict__, so test_typing may not be representative. Pyperformance will likely see a noticeable bump in hits with less misses.

./python -m test test_typing test_re test_dis test_zlib


Before:
opcode[160].specializable : 1
    opcode[160].specialization.success : 1395
    opcode[160].specialization.failure : 1146
    opcode[160].specialization.hit : 1026699
    opcode[160].specialization.deferred : 78612
    opcode[160].specialization.miss : 8664
    opcode[160].specialization.deopt : 156
    opcode[160].execution_count : 18435
    opcode[160].specialization.failure_kinds[0] : 63
    opcode[160].specialization.failure_kinds[1] : 39
    opcode[160].specialization.failure_kinds[2] : 505
    opcode[160].specialization.failure_kinds[4] : 406
    opcode[160].specialization.failure_kinds[9] : 4
    opcode[160].specialization.failure_kinds[10] : 3
    opcode[160].specialization.failure_kinds[17] : 33
    opcode[160].specialization.failure_kinds[18] : 23
    opcode[160].specialization.failure_kinds[19] : 1
    opcode[160].specialization.failure_kinds[22] : 69


After:
opcode[160].specializable : 1
    opcode[160].specialization.success : 1399
    opcode[160].specialization.failure : 1107
    opcode[160].specialization.hit : 1029041
    opcode[160].specialization.deferred : 76217
    opcode[160].specialization.miss : 8717
    opcode[160].specialization.deopt : 157
    opcode[160].execution_count : 18488
    opcode[160].specialization.failure_kinds[0] : 63
    opcode[160].specialization.failure_kinds[2] : 505
    opcode[160].specialization.failure_kinds[4] : 406
    opcode[160].specialization.failure_kinds[9] : 4
    opcode[160].specialization.failure_kinds[10] : 3
    opcode[160].specialization.failure_kinds[17] : 33
    opcode[160].specialization.failure_kinds[18] : 23
    opcode[160].specialization.failure_kinds[19] : 1
    opcode[160].specialization.failure_kinds[22] : 69

@Fidget-Spinner
Copy link
Member Author

@Fidget-Spinner Fidget-Spinner commented May 13, 2022

Wow looks like my expectations were proven wrong by the stats again, after removing test_typing, I get a 0.18% increase in hits at the expense of 0.45% more misses. So test_typing was actually bolstering the numbers!

The part of the stdlib I've found that frequently uses this instruction is the _io module's objects. But I don't know how to get stats on those as their tests use subprocesses.

I'm not feeling too confident about this optimization now. It seems like something that would boost our pyperformance numbers but maybe not in the real world?

./python -m test test_re test_dis test_zlib


Before:
opcode[160].specializable : 1
    opcode[160].specialization.success : 2113
    opcode[160].specialization.failure : 4126
    opcode[160].specialization.hit : 5506338
    opcode[160].specialization.deferred : 306030
    opcode[160].specialization.miss : 44980
    opcode[160].specialization.deopt : 745
    opcode[160].execution_count : 59474
    opcode[160].specialization.failure_kinds[0] : 365
    opcode[160].specialization.failure_kinds[1] : 172
    opcode[160].specialization.failure_kinds[2] : 770
    opcode[160].specialization.failure_kinds[4] : 2417
    opcode[160].specialization.failure_kinds[9] : 12
    opcode[160].specialization.failure_kinds[10] : 4
    opcode[160].specialization.failure_kinds[17] : 150
    opcode[160].specialization.failure_kinds[18] : 58
    opcode[160].specialization.failure_kinds[19] : 2
    opcode[160].specialization.failure_kinds[22] : 139
    opcode[160].specialization.failure_kinds[23] : 37



After:
opcode[160].specializable : 1
    opcode[160].specialization.success : 2127
    opcode[160].specialization.failure : 3954
    opcode[160].specialization.hit : 5516541
    opcode[160].specialization.deferred : 295625
    opcode[160].specialization.miss : 45182
    opcode[160].specialization.deopt : 748
    opcode[160].execution_count : 59676
    opcode[160].specialization.failure_kinds[0] : 365
    opcode[160].specialization.failure_kinds[2] : 770
    opcode[160].specialization.failure_kinds[4] : 2417
    opcode[160].specialization.failure_kinds[9] : 12
    opcode[160].specialization.failure_kinds[10] : 4
    opcode[160].specialization.failure_kinds[17] : 150
    opcode[160].specialization.failure_kinds[18] : 58
    opcode[160].specialization.failure_kinds[19] : 2
    opcode[160].specialization.failure_kinds[22] : 139
    opcode[160].specialization.failure_kinds[23] : 37

@markshannon
Copy link
Member

@markshannon markshannon commented May 13, 2022

Generating the table is somewhat manual and hacky. I mean to automate it, but for now here's the procedure:

  1. Create a new branch and cherry-pick this commit: faster-cpython@a9c92c0
  2. Run pyperformance compile on that branch. I use this config.ini file: https://gist.github.com/markshannon/26f4e8db2b715c991eee1508f430f6b2 You will need to modify it for your machine and repo.
  3. While it is in the installing phase, create /tmp/py_stats and clear it out rm -r /tmp/py_stats/*
  4. About the time that the installing phase finishes and the benchmarks start, one final rm -r /tmp/py_stats/*

The table is created by running ./python Tools/scripts/summarize_stats.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review performance type-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants