Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-37958: Adding get_profile_dict to pstats #15495

Merged
merged 9 commits into from Jan 15, 2020

Conversation

@Olshansk
Copy link
Contributor

Olshansk commented Aug 26, 2019

pstats is really useful or profiling and printing the output of the execution of some block of code, but I've found on multiple occasions when I'd like to access this output directly in an easily usable dictionary on which I can further analyze or manipulate.

The proposal is to add a function called get_profile_dict inside of pstats that'll automatically return this data the data in an easily accessible dict.

The output of the following script:

import cProfile, pstats
import pprint
from pstats import func_std_string, f8

def fib(n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fib(n-1) + fib(n-2)

pr = cProfile.Profile()
pr.enable()
fib(5)
pr.create_stats()

ps = pstats.Stats(pr).sort_stats('tottime', 'cumtime')

def get_profile_dict(self, keys_filter=None):
    """
        Returns a dict where the key is a function name and the value is a dict
        with the following keys:
            - ncalls
            - tottime
            - percall_tottime
            - cumtime
            - percall_cumtime
            - file_name
            - line_number

        keys_filter can be optionally set to limit the key-value pairs in the
        retrieved dict.
    """
    pstats_dict = {}
    func_list = self.fcn_list[:] if self.fcn_list else list(self.stats.keys())

    if not func_list:
        return pstats_dict

    pstats_dict["total_tt"] = float(f8(self.total_tt))
    for func in func_list:
        cc, nc, tt, ct, callers = self.stats[func]
        file, line, func_name = func
        ncalls = str(nc) if nc == cc else (str(nc) + '/' + str(cc))
        tottime = float(f8(tt))
        percall_tottime = -1 if nc == 0 else float(f8(tt/nc))
        cumtime = float(f8(ct))
        percall_cumtime = -1 if cc == 0 else float(f8(ct/cc))
        func_dict = {
            "ncalls": ncalls,
            "tottime": tottime, # time spent in this function alone
            "percall_tottime": percall_tottime,
            "cumtime": cumtime, # time spent in the function plus all functions that this function called,
            "percall_cumtime": percall_cumtime,
            "file_name": file,
            "line_number": line
        }
        func_dict_filtered = func_dict if not keys_filter else { key: func_dict[key] for key in keys_filter }
        pstats_dict[func_name] = func_dict_filtered

    return pstats_dict

pp = pprint.PrettyPrinter(depth=6)
pp.pprint(get_profile_dict(ps))

will produce:

{"<method 'disable' of '_lsprof.Profiler' objects>": {'cumtime': 0.0,
                                                      'file_name': '~',
                                                      'line_number': 0,
                                                      'ncalls': '1',
                                                      'percall_cumtime': 0.0,
                                                      'percall_tottime': 0.0,
                                                      'tottime': 0.0},
 'create_stats': {'cumtime': 0.0,
                  'file_name': '/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/cProfile.py',
                  'line_number': 50,
                  'ncalls': '1',
                  'percall_cumtime': 0.0,
                  'percall_tottime': 0.0,
                  'tottime': 0.0},
 'fib': {'cumtime': 0.0,
         'file_name': 'get_profile_dict.py',
         'line_number': 5,
         'ncalls': '15/1',
         'percall_cumtime': 0.0,
         'percall_tottime': 0.0,
         'tottime': 0.0},
 'total_tt': 0.0}

As an example, this can be used to generate a stacked column chart using various visualization tools which will assist in easily identifying program bottlenecks.

https://bugs.python.org/issue37958

Automerge-Triggered-By: @gpshead

@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Aug 26, 2019

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@pablogsal

This comment has been minimized.

Copy link
Member

pablogsal commented Aug 26, 2019

Hi @Olshansk and thank you for opening a PR. Before we can review, could you:

@Olshansk Olshansk changed the title Adding get_profile_dict to pstats bpo37958 - Adding get_profile_dict to pstats Aug 27, 2019
@Olshansk

This comment has been minimized.

Copy link
Contributor Author

Olshansk commented Aug 27, 2019

@the-knights-who-say-ni I've signed the CLA

@pablogsal I've created https://bugs.python.org/issue37958, but not sure if I filled everything out correctly. I also mentioned it in the title and created a blurb (Olshansk@2bdb8ab)

@Olshansk

This comment has been minimized.

Copy link
Contributor Author

Olshansk commented Aug 29, 2019

@pablogsal I was wondering whom I should message to get some feedback about this idea?

@Olshansk Olshansk changed the title bpo37958 - Adding get_profile_dict to pstats bpo-37958: Adding get_profile_dict to pstats Sep 23, 2019
Daniel Olshansky added 2 commits Sep 23, 2019
Daniel Olshansky
Copy link
Member

gpshead left a comment

This needs unittest coverage (see Lib/test/test_pstats.py).

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Sep 25, 2019

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Lib/pstats.py Outdated Show resolved Hide resolved
Lib/pstats.py Outdated Show resolved Hide resolved
Daniel Olshansky and others added 2 commits Dec 8, 2019
Daniel Olshansky
WIP
- Move to using dataclasses
- Add comments/documentation
- Add tests
@Olshansk

This comment has been minimized.

Copy link
Contributor Author

Olshansk commented Dec 8, 2019

I have made the requested changes; please review again.

Sorry for being MIA and taking so long to followup on this, but I've replied to all the comments and would appreciate if you take another look!

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Dec 8, 2019

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from gpshead Dec 8, 2019
@Olshansk

This comment has been minimized.

Copy link
Contributor Author

Olshansk commented Dec 29, 2019

@gpshead Just wanted to follow up on this.

Lib/test/test_pstats.py Outdated Show resolved Hide resolved
Lib/pstats.py Outdated Show resolved Hide resolved
@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 8, 2020

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@gpshead gpshead self-assigned this Jan 8, 2020
- Return correct default value
- Make tests output better errors with assertIn
@Olshansk

This comment has been minimized.

Copy link
Contributor Author

Olshansk commented Jan 9, 2020

I have made the requested changes; please review again.

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 9, 2020

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from gpshead Jan 9, 2020
gpshead added 2 commits Jan 15, 2020
@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Jan 15, 2020

@Olshansk: Status check is done, and it's a failure .

@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Jan 15, 2020

@Olshansk: Status check is done, and it's a success .

@miss-islington miss-islington merged commit 01602ae into python:master Jan 15, 2020
9 checks passed
9 checks passed
Docs
Details
Windows (x86)
Details
Windows (x64)
Details
macOS
Details
Ubuntu
Details
Azure Pipelines PR #20200115.28 succeeded
Details
bedevere/issue-number Issue number 37958 found
Details
bedevere/news News entry found in Misc/NEWS.d
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.