Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-96143: Allow Linux perf profiler to see Python calls #96123

Merged
merged 50 commits into from Aug 30, 2022

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Aug 19, 2022

Automerge-Triggered-By: GH:pablogsal

Objects/codeobject.c Outdated Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
Copy link
Member

@markshannon markshannon left a comment

I have no idea if that assembly code does what you say, but I've reviewed the rest.

Definitely an intriguing idea.

Include/cpython/code.h Outdated Show resolved Hide resolved
Include/internal/pycore_ceval.h Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Objects/codeobject.c Outdated Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
@pablogsal pablogsal force-pushed the perf branch 2 times, most recently from 9b009f2 to 439ef28 Compare Aug 19, 2022
@tiran
Copy link
Member

tiran commented Aug 19, 2022

You can use preprocesor macros if you name the file Objects/asm_trampoline.S or .sx instead of Objects/asm_trampoline.s. The .sx form needs a makefile rule.

@pablogsal
Copy link
Member Author

pablogsal commented Aug 19, 2022

You can use preprocesor macros if you name the file Objects/asm_trampoline.S or .sx instead of Objects/asm_trampoline.s. The .sx form needs a makefile rule.

Hummm, not sure I follow, could you maybe show me an example of what we can achieve with this?

@tiran
Copy link
Member

tiran commented Aug 19, 2022

You can have multiple implementations in the same file:

    .text
    .globl	_Py_trampoline_func_start
_Py_trampoline_func_start:
#ifdef __x86_64__
    push   %rbp
    mov    %rsp,%rbp
    mov    %rdi,%rax
    mov    %rsi,%rdi
    mov    %rdx,%rsi
    mov    %ecx,%edx
    call   *%rax
    pop    %rbp
    ret
#endif // __x86_64__
#ifdef __aarch64__
    TODO
#endif
    .globl	_Py_trampoline_func_end
_Py_trampoline_func_end:

Include/internal/pycore_ceval.h Outdated Show resolved Hide resolved
@gpshead gpshead added type-feature A feature request or enhancement interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) labels Aug 20, 2022
@pablogsal pablogsal marked this pull request as ready for review Aug 20, 2022
@pablogsal pablogsal requested a review from a team as a code owner Aug 20, 2022
@pablogsal pablogsal force-pushed the perf branch 3 times, most recently from df2a40d to dda9e04 Compare Aug 20, 2022
@pablogsal pablogsal changed the title Allow Linux perf profiler to see Python calls gh-96143: Allow Linux perf profiler to see Python calls Aug 20, 2022
@pablogsal pablogsal requested a review from markshannon Aug 20, 2022
@pablogsal pablogsal force-pushed the perf branch 3 times, most recently from f38dfc2 to 22fc892 Compare Aug 20, 2022
@pablogsal pablogsal force-pushed the perf branch 2 times, most recently from a2f4182 to 1b35ed3 Compare Aug 20, 2022
@tiran
Copy link
Member

tiran commented Aug 20, 2022

Why did you add a Windows build file? How about we do not define _PyPerfTrampoline_Init unless HAVE_PERF_TRAMPOLINE is defined?

@pablogsal
Copy link
Member Author

pablogsal commented Aug 20, 2022

Why did you add a Windows build file? How about we do not define _PyPerfTrampoline_Init unless HAVE_PERF_TRAMPOLINE is defined?

Then we need to add more ifdef every place is used but that works as well for sure

Objects/perf_trampoline.c Outdated Show resolved Hide resolved
Objects/perf_trampoline.c Outdated Show resolved Hide resolved
Objects/perf_trampoline.c Outdated Show resolved Hide resolved
Python/sysmodule.c Outdated Show resolved Hide resolved
@pablogsal
Copy link
Member Author

pablogsal commented Aug 25, 2022

I have added some user-facing docs. Feel free to push to my PR if you want to add something

Doc/howto/instrumentation.rst Outdated Show resolved Hide resolved
Doc/howto/perf_profiling.rst Outdated Show resolved Hide resolved
Doc/howto/perf_profiling.rst Show resolved Hide resolved
Doc/howto/perf_profiling.rst Outdated Show resolved Hide resolved
tiran
tiran approved these changes Aug 25, 2022
they now match the current code.
Copy link
Member

@gpshead gpshead left a comment

Two thoughts have been running around in my mind:

(1) We've got the command line flag and sys API, but what about just enabling this system wide or at least task/job wide within a container? An environment variable that turns it on would make that easy without plumbing options through. PYTHONPERFSUPPORT=1 as another equivalent to -Xperf? That's usually done in initconfig.c alongside the -X processing I believe.

Second: multiprocessing spawn start method. If the -Xperf flag was passed to the parent process I assume the "spawn" method should also pass that to its children.

@pablogsal
Copy link
Member Author

pablogsal commented Aug 29, 2022

Two thoughts have been running around in my mind:

(1) We've got the command line flag and sys API, but what about just enabling this system wide or at least task/job wide within a container? An environment variable that turns it on would make that easy without plumbing options through. PYTHONPERFSUPPORT=1 as another equivalent to -Xperf? That's usually done in initconfig.c alongside the -X processing I believe.

Second: multiprocessing spawn start method. If the -Xperf flag was passed to the parent process I assume the "spawn" method should also pass that to its children.

I think (1) makes sense and is indeed coherent with how we handle some of the other options like tracemalloc so I will add an environment variable alongside the -X option.

Regarding (2) I am not that sure. We don't do this for the rest of the flags that we pass around and you can achieve the same with the environment variable if you want. In any case, as that would require some tests I would prefer to do that in a separate PR as this is already gigantic

@pablogsal
Copy link
Member Author

pablogsal commented Aug 29, 2022

I implemented the environment variable and solved a bunch of conflicts. Please check it out when you have time.

I also had to solve a bunch of conflicts. @gpshead if you are ok with the current status I would like us to land if everything looks good as the size of the PR is already attracting a bunch of merge conflicts in the build system, clinic and other files.

@pablogsal
Copy link
Member Author

pablogsal commented Aug 29, 2022

Ah wait, I need to document the environment variable. Will push a commit for that soon.

@pablogsal
Copy link
Member Author

pablogsal commented Aug 29, 2022

Ah wait, I need to document the environment variable. Will push a commit for that soon.

Done 👍

@pablogsal
Copy link
Member Author

pablogsal commented Aug 29, 2022

@erlend-aasland has mentioned that he was a bunch of docs improvements that he will do in a separate PR.

@pablogsal pablogsal requested a review from gpshead Aug 29, 2022
Copy link
Member

@gpshead gpshead left a comment

My comments are easy to address things that don't need further review - minor edits or added note/todo comments to our future selves.

Agreed that the docs can use some polishing up but all the important bits are there to seed that future work, thanks for writing them!

Thanks for taking on adding this feature! I expect we'll see how a backport fares internally.

Doc/howto/perf_profiling.rst Show resolved Hide resolved
Doc/using/cmdline.rst Show resolved Hide resolved
Lib/test/test_perf_profiler.py Show resolved Hide resolved
Lib/test/test_perf_profiler.py Show resolved Hide resolved
Objects/perf_trampoline.c Show resolved Hide resolved
Objects/perf_trampoline.c Show resolved Hide resolved
Objects/perf_trampoline.c Show resolved Hide resolved
@pablogsal pablogsal added the 🤖 automerge PR will be merged once it's been approved and all CI passed label Aug 30, 2022
@miss-islington miss-islington merged commit 6d791a9 into python:main Aug 30, 2022
14 checks passed
@pablogsal
Copy link
Member Author

pablogsal commented Aug 30, 2022

Damn, @miss-islington has landed the PR without waiting for the CI to build the last commit I pushed so I created #96433.

@pablogsal
Copy link
Member Author

pablogsal commented Aug 30, 2022

Thanks, everyone for the fantastic reviews and for helping to get this feature ready ❤️

You all rock 🤘

@pablogsal
Copy link
Member Author

pablogsal commented Aug 30, 2022

@erlend-aasland You can make the doc improvement PR after #96433 lands.

@jjerphan
Copy link
Contributor

jjerphan commented Aug 30, 2022

Thanks for all for pushing for better interoperability with perf(1). I highly appreciate this contribution. 🙏

@vstinner
Copy link
Member

vstinner commented Aug 30, 2022

Nice feature!

There is a typo in sys.deactivate_stack_trampoline() docstring: "Dectivate the perf profiler trampoline": Deactivate.


In the doc, python -m sysconfig | grep HAVE_PERF_TRAMPOLINE can be replaced with python -c "import sysconfig; print(bool(sysconfig.get_config_var('PY_HAVE_PERF_TRAMPOLINE')))" to not rely on the external grep command. At least, I suggest to add PY_ to fix the variable name:

python -m sysconfig | grep PY_HAVE_PERF_TRAMPOLINE

The second command can be replaced with python -c "import sysconfig; print('no-omit-frame-pointer' in sysconfig.get_config_var('PY_CORE_CFLAGS'))".

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Aug 30, 2022

There is a typo in sys.deactivate_stack_trampoline() docstring: "Dectivate the perf profiler trampoline": Deactivate.

[...]

See #96445 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) 🤖 automerge PR will be merged once it's been approved and all CI passed type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet