Skip to content

gh-92041: Improve performance of inspect.getmodule #92042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mdeck
Copy link

@mdeck mdeck commented Apr 29, 2022

getmodule maintains a cache of module names & associated file names.
Whenever it encounters a module filename not in its cache, it iterates
over sys.modules and repopulates its cache.

An object which has no associated module filename will trigger this
repopulation loop every time getmodule is called. The cost is easily
seen with inspect.getstack

def run_test(module):
    # cols have stack depths
    # rows have len(sys.modules)

    def print_line(h, *vals):
        if not h:
            print("%5s" % h, *["%6i" % v for v in vals])
        else:
            print("%5s" % h, *["%6.1f" % v for v in vals])

    def add_modules(n):
        import random
        import sys
        for _ in range(n):
            sys.modules[f"foo_{random.randint(0,2**64)}"] = module

    def measure(depth):
        import inspect
        import timeit

        def nest(level):
            if level > 0:
                return nest(level-1)
            else:
                dur = timeit.timeit(lambda: inspect.stack(), number=1)
                return dur * 1000

        return nest(depth)

    import sys
    print_line("", *[2**e for e in range(7)])
    for i in range(1, 5):
        add_modules(n=10**i)
        times = [measure(depth=2**e) for e in range(7)]
        print_line(len(sys.modules), *times)

python 3.11.0a7+ optimized build ====

# sys has no __file__ and doesn't enter getmodule cache
>>> import sys
>>> run_test(sys)
           1      2      4      8     16     32     64
   87    2.1    0.8    1.0    1.3    2.0    3.3    6.1
  187    2.0    2.5    2.7    3.6    5.5    9.5   16.8
 1187   13.6   15.7   19.1   26.3   39.6   67.4  123.3
11187  134.1  150.3  183.4  252.0  383.5  750.3 1189.8
>>> exit()

>>> import random
>>> run_test(random)
           1      2      4      8     16     32     64
   87    2.1    0.7    0.9    1.2    1.7    2.9    5.6
  187    2.3    1.1    1.5    1.8    2.5    4.3    7.8
 1187   17.0    4.3    5.6    7.1   11.1   18.6   33.9
11187  164.6   37.4   45.7   61.6   94.7  161.0  297.7
>>> exit()

python 3.11.0a7+ optimized build, with changes ====

>>> import sys
>>> run_test(sys)
           1      2      4      8     16     32     64
   87    1.7    0.2    0.2    0.3    0.4    0.6    1.2
  187    0.2    0.2    0.2    0.2    0.3    0.5    0.9
 1187    0.2    0.2    0.2    0.3    0.3    0.5    0.9
11187    0.3    0.3    0.3    0.3    0.5    0.8    1.3
>>> exit()

>>> import random
>>> run_test(random)
           1      2      4      8     16     32     64
   87    1.8    0.2    0.2    0.3    0.4    0.6    1.2
  187    0.2    0.2    0.2    0.3    0.3    0.6    0.9
 1187    0.2    0.2    0.2    0.3    0.4    0.5    0.9
11187    0.3    0.2    0.2    0.3    0.3    0.6    1.0
>>> exit()

@ghost
Copy link

ghost commented Apr 29, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

@mdeck mdeck changed the title Improve performance of inspect.getmodule gh-92041: Improve performance of inspect.getmodule Apr 29, 2022
@bedevere-bot
Copy link

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

Copy link
Contributor

@eendebakpt eendebakpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Is the new case also covered by tests?

@mdeck
Copy link
Author

mdeck commented Apr 30, 2022

The recursion-break test suggested code objects should be distinguished by object instance rather than by their hash alone. I'm using id & hash together as best effort to uniquely identify code objects that have no module. I assume identity comparison with weakref would be the ideal solution, but the performance is O(N) rather than O(1), with N = len(sys.modules). The identity approximation seemed a reasonable trade-off to me.

@mdeck mdeck force-pushed the fix_inspect branch 6 times, most recently from 9bbf165 to 725a326 Compare April 30, 2022 18:26
@mdeck mdeck marked this pull request as draft April 30, 2022 18:43
@mdeck mdeck marked this pull request as ready for review April 30, 2022 20:49
@mdeck
Copy link
Author

mdeck commented May 1, 2022

Rebased past some bad upstream that broke CI.

You can view the weakref implementation I tested here..
I'm not sure why the performance degraded so much with weakref.

Also worth noting, the tests I've shown were run by pasting into interactive console. With same test code in a module that is executed, it will run faster (though still slower than with this fix) -- inspect.stack ends up calling getmodule on each stack frame. This is slower with interactive console frames, as they are moduleless and don't get cached.

@mdeck
Copy link
Author

mdeck commented May 1, 2022

Also it seems as though the CI has stalled since last update. Not sure if I need to make some random change and force a re-run, or if it's not running because I'm a "first-time contributor" -- the description is a bit vague. 4 expected checks, 2 workflows awaiting approval.. not sure what a 'workflow' vs a 'check' is.

@mdeck
Copy link
Author

mdeck commented May 1, 2022

run_test shows indirect changes via inspect.stack. Here are numbers with getmodule directly. One change to test:
dur = timeit.timeit(lambda: inspect.getmodule(inspect.currentframe()), number=1)

Reminder, columns are stack depth, rows are len(sys.modules), numbers are milliseconds.

Without changes:

>>> run_test(sys)
           1      2      4      8     16     32     64
   89    2.5    0.2    0.2    0.2    0.2    0.2    0.2
  189    0.7    0.6    0.6    0.6    0.6    0.6    0.6
 1189    5.1    5.1    5.0    5.1    5.5    5.0    5.1
11189   49.0   49.3   50.0   49.3   50.6   49.5   49.7
>>> 

With this PR changes:

>>> run_test(sys)
           1      2      4      8     16     32     64
   89    2.5    0.0    0.0    0.0    0.0    0.0    0.0
  189    0.0    0.0    0.0    0.0    0.0    0.0    0.0
 1189    0.0    0.0    0.0    0.0    0.0    0.0    0.0
11189    0.0    0.0    0.0    0.0    0.0    0.0    0.0
>>> 

Also testing with weakref version:

>>> run_test(sys)
           1      2      4      8     16     32     64
   91    2.5    0.0    0.0    0.0    0.0    0.0    0.0
  191    0.0    0.0    0.0    0.0    0.0    0.0    0.0
 1191    0.0    0.0    0.0    0.0    0.0    0.0    0.0
11191    0.0    0.0    0.0    0.0    0.0    0.0    0.0
>>> 

Interesting! getmodule performance looks equivalent with weakref! So why did I see slower inspect.stack times when using weakref? Let's revisit the inspect.stack times, using original run_test code.

With this PR changes:

>>> run_test(sys)
           1      2      4      8     16     32     64
   89    3.8    0.4    0.6    0.6    0.8    1.3    2.2
  189    0.4    0.4    0.5    0.6    0.8    1.6    2.2
 1189    0.4    0.4    0.4    0.5    1.0    1.3    2.2
11189    0.4    0.5    0.5    0.5    1.0    1.3    2.2
>>> 

With weakref version:

>>> run_test(sys)
           1      2      4      8     16     32     64
   91    4.1    0.5    0.6    0.7    1.0    1.5    2.6
  191    0.7    0.7    1.0    1.0    1.2    1.8    2.8
 1191    2.9    3.0    3.5    3.1    3.5    3.9    5.0
11191   26.4   25.3   25.2   25.4   25.6   26.2   27.3
>>> 

I will take a closer look at what else inspect.stack is doing, to try and explain this.

@mdeck
Copy link
Author

mdeck commented May 1, 2022

Also the getmodule times were all 0.0; Here are those numbers in microseconds, rather than milliseonds.

With this PR changes:

>>> run_test(sys)
           1      2      4      8     16     32     64
   89 2447.6    6.2    5.5    6.1    7.1    4.4    7.3
  189    5.8    4.2    4.2    4.7    4.2    4.2    3.9
 1189    4.8    4.4    3.9    4.0    3.7    4.5    4.1
11189   15.3    4.6    4.1    4.0    4.2    4.4    4.1

With weakref impl:

>>> run_test(sys)
           1      2      4      8     16     32     64
   91 2546.2    9.6    8.0    7.3    8.7    6.8   10.8
  191    8.4    6.6    6.1    6.4    5.9    6.5    5.9
 1191    7.1    6.1    6.0    5.8    5.9    5.9   12.9
11191   22.6    7.4    6.6    6.2    5.9    6.2    6.0

At higher module counts, either implementation uses less microseconds than the current implementation in milliseconds..

@mdeck
Copy link
Author

mdeck commented May 1, 2022

The docs indicate weakref doesn't work with code objects. They appear to work with simple tests in interactive console. However, debugging the weakref implementation I linked earlier, the _moduleless cache fills with dead weakrefs indexed by the same code object id(). The current use of id ^ hash seems optimal.

I have previously explored defining _moduleless as an LRU cache. When testing I got the sense getmodule returning None was rare, so the LRU complexity was unnecessary. I am curious if that was a safe assumption - perhaps there are use cases that involve dynamic generated code objects being queried. I may explore that again to see the additional complexity & cost. Interested in any thoughts here.

@mdeck mdeck closed this May 3, 2022
@mdeck mdeck reopened this May 3, 2022
@mdeck mdeck closed this May 3, 2022
@mdeck mdeck reopened this May 3, 2022
@mdeck
Copy link
Author

mdeck commented May 3, 2022

Hmm, looks like I need a maintainer to approve running tests. I can't seem to trigger them despite pressing all the buttons.

@AlexWaygood AlexWaygood added performance Performance or resource usage stdlib Python modules in the Lib dir labels May 4, 2022
@pablogsal pablogsal removed their request for review May 5, 2022 10:51
@mdeck mdeck mentioned this pull request May 8, 2022
@eendebakpt
Copy link
Contributor

Hmm, looks like I need a maintainer to approve running tests. I can't seem to trigger them despite pressing all the buttons.

@mdeck could you merge/rebase to current main? Perhaps that will trigger the tests to build

@bdraco
Copy link
Contributor

bdraco commented Jun 1, 2023

getmodule

Home Assistant startup has 7,184,347 calls to ismodule at startup from this function.

@bedevere-app
Copy link

bedevere-app bot commented Feb 12, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@mdeck
Copy link
Author

mdeck commented Feb 12, 2024

Here, sys._getframe() is moduleless, and is cached:

$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
$ ./python -m timeit -s 'import inspect, sys' 'inspect.getmodule(sys._getframe())'
2000 loops, best of 5: 115 usec per loop

$ git checkout fix_inspect
Switched to branch 'fix_inspect'
Your branch is up to date with 'origin/fix_inspect'.
$ ./python -m timeit -s 'import inspect, sys' 'inspect.getmodule(sys._getframe())'
100000 loops, best of 5: 3.36 usec per loop

inspect.stack() durations with various len(sys.modules) and call stack depths. We use -c to ensure a moduleless frame is in call stack (initial import test3 is <string> frame).

$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
$ ./python -c 'import test3'
                        2         8        64  <- stack depth
            100   8110 us   2662 us   6045 us
           1000  72495 us  11043 us  12711 us
          10000 726973 us  96112 us 100057 us
len sys.modules
                        2         8        64  <- stack depth
            100   2433 us   2335 us   4473 us
           1000   9680 us  10715 us  13248 us
          10000  92595 us  89032 us  95080 us
len sys.modules
resource.struct_rusage(ru_utime=1.451359, ru_stime=0.04727, ru_maxrss=16676, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=2974, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=72, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=0, ru_nivcsw=3)

$ git checkout fix_inspect
Switched to branch 'fix_inspect'
Your branch is up to date with 'origin/fix_inspect'.
$ ./python -c 'import test3'
len moduleless: 0
                        2         8        64  <- stack depth
            100   8015 us   1117 us   3504 us
           1000    907 us   1066 us   3952 us
          10000   1833 us   1137 us   4820 us
len sys.modules
len moduleless: 6
                        2         8        64  <- stack depth
            100   1529 us   1083 us   3530 us
           1000   1073 us   1580 us   4290 us
          10000   1155 us   1106 us   4663 us
len sys.modules
resource.struct_rusage(ru_utime=0.220348, ru_stime=0.009914, ru_maxrss=18916, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=3466, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=272, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=0, ru_nivcsw=0)
# test3.py
import gc
import inspect
import random
import time
import sys 

def run_test(module):
    def print_line(h, *vals):
        if not h:
            print("%15s" % h, *["%9i" % v for v in vals], " <- stack depth")
        else:
            print("%15s" % h, *["%6.0f us" % v for v in vals])
    #
    len_sys_modules = [100, 1_000, 10_000]
    def add_modules(n):
        while len(sys.modules) < n:
            sys.modules[f"foo_{random.randint(0,2**64)}"] = module
    #
    depths = [2, 8, 64]
    def nest(depth):
        if depth > 0:
            return nest(depth-1)
        t = time.time()
        _ = inspect.stack()
        dur = time.time() - t
        return dur * 1000 * 1000
    #
    gc.collect()
    if hasattr(inspect, "_moduleless"):
        print(f"len moduleless: {len(inspect._moduleless)}")
        inspect._moduleless.clear()
    #
    orig_modules = sys.modules.copy()
    print_line("", *depths)
    for len_modules in len_sys_modules:
        add_modules(n=len_modules)
        times = [nest(depth) for depth in depths]
        print_line(len(sys.modules), *times)
        sys.modules = orig_modules.copy()
    print("len sys.modules")

run_test(random)
run_test(sys)

import resource
print(resource.getrusage(resource.RUSAGE_SELF))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants