Allow the linux perf profiler to see Python calls #96143

pablogsal · 2022-08-20T17:32:57Z

The linux perf profiler is a very powerful tool but unfortunately is not able to see Python calls (only the C stack) and therefore it cannot be used (neither its very complete ecosystem) to profile Python applications and extensions.

Turns out that node and the JVM have developed a way to leverage the perf profiler for the Java and javascript frames. They use their JIT compilers to generate a unique area in memory where they place assembly code that in turn calls the frame evaluator function. This JIT compiled areas are unique per function/code object. They use the perf maps (perf allows to place a map in /temp/perf-PID.map with information mapping the JIT-ed areas to a string that identifies them and this allows perf to map java/javascript names to the JIT-ed areas, basically showing the non-native function names on the stack.

We can do a simple version of this idea in Python by using a very simple JIT compiler that compiles a assembly template that is the used to jump to PyEval_EvalFrameDefault and we can place the code names and filenames in the special perf file. This allows perf to see Python calls as well:

And this works with all the tools in the perf ecosystem, like flamegraphs:

See also:
https://www.brendangregg.com/Slides/KernelRecipes_Perf_Events.pdf

The text was updated successfully, but these errors were encountered:

pablogsal · 2022-08-21T16:07:16Z

Is also very easy to transform these into python-only flamegraphs by filtering the py:: prefix:

⚠️

⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️ ⚠️ If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.

…#96433) * gh-96132: Add some comments and minor fixes missed in the original PR * Update Doc/using/cmdline.rst Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>

gpshead · 2022-08-31T23:19:30Z

A Linux buildbot with PYTHONPERFSUPPORT=1 and relevant CFLAGS=-fno-omit-frame-pointer needs to be setup.
Something needs to garbage collect it's /tmp/perf-$pid.map files as well.

minor missed test cleanup to use the modern API from the big review. Automerge-Triggered-By: GH:gpshead

pablogsal · 2022-09-01T09:50:13Z

Something needs to garbage collect it's /tmp/perf-$pid.map files as well.

That's really up to the user unfortunately. The files must be available after the process finishes and at report time so I don't see what we can do that automatically cleans these files because we don't know when the user has finished with them.

Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots.

gpshead · 2022-09-02T06:59:54Z

Or do you mean in the buildbot? In that case, tests are deleting created files already so they should not be polluting the machine so these won't pile up in buildbots.

Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;)

(This is where the buildbot design really shows age. A fresh container per buildbot worker test session would make sense.)

pablogsal · 2022-09-02T08:42:40Z

Yes I was talking about the desired buildbot config. A tmpwatcher of some form set to tmp files over a few hours old is likely sufficient. With the environment variable set to enable perf everywhere, a single test run probably has hundreds if not thousands of PIDs. ;)

Ah in that case I don't think is needed. Tests check the perf files before and after the tests and deleted any new file that matches PIDs that have been spawned during tests. I will revise this logic to ensure it works correctly when running parallel test suites but I think that should be enough.

bedevere-bot mentioned this issue Aug 20, 2022

gh-96143: Allow Linux perf profiler to see Python calls #96123

Merged

kumaraditya303 added type-feature A feature request or enhancement 3.12 labels Aug 20, 2022

pablogsal self-assigned this Aug 20, 2022

tiran mentioned this issue Aug 22, 2022

Build Python with frame pointers (-fno-omit-frame-pointer) #96174

Open

bedevere-bot mentioned this issue Aug 30, 2022

gh-96143: Add some comments and minor fixes missed in the original PR #96433

Merged

bedevere-bot mentioned this issue Aug 30, 2022

gh-96143: Improve perf profiler docs #96445

Open

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 30, 2022

pythongh-96143: Improve perf profiler docs

41bab32

bedevere-bot mentioned this issue Sep 1, 2022

gh-96143: subprocess API %s/universal_newlines=/text=/g. #96468

Merged

miss-islington pushed a commit that referenced this issue Sep 1, 2022

gh-96143: subprocess API %s/universal_newlines=/text=/g. (GH-96468)

e93d1bd

minor missed test cleanup to use the modern API from the big review. Automerge-Triggered-By: GH:gpshead

bedevere-bot mentioned this issue Sep 1, 2022

gh-96143: Clear instruction cache after mprotect call #96476

Merged

pablogsal added a commit to pablogsal/cpython that referenced this issue Sep 1, 2022

pythongh-96143: Clear instruction cache after mprotect call

e000d0e

pablogsal added a commit that referenced this issue Sep 8, 2022

gh-96143: Clear instruction cache after mprotect call (#96476)

3fedfcf

Allow the linux perf profiler to see Python calls #96143

Allow the linux perf profiler to see Python calls #96143

pablogsal commented Aug 20, 2022

pablogsal commented Aug 21, 2022

gpshead commented Aug 31, 2022

pablogsal commented Sep 1, 2022 •

edited

gpshead commented Sep 2, 2022

pablogsal commented Sep 2, 2022

Allow the linux perf profiler to see Python calls #96143

Allow the linux perf profiler to see Python calls #96143

Comments

pablogsal commented Aug 20, 2022

pablogsal commented Aug 21, 2022

gpshead commented Aug 31, 2022

pablogsal commented Sep 1, 2022 • edited

gpshead commented Sep 2, 2022

pablogsal commented Sep 2, 2022

pablogsal commented Sep 1, 2022 •

edited