Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

chrisgmorton · 2023-09-07T03:08:59Z

Bug report

Bug description:

Comparing Python 3.11.4 with 3.12.0rc1 I see a substantial slowdown of dictionary operations (update, copy, items, get). Build is from source on Ubuntu 20.04 using gcc/g++ 9.4.0, with configure options --enable-shared and --enable-optimizations. The NumPy version in both cases is 1.25.2 with linkage to mkl 2023.2.0.

Profiling results for my application are as follows:
Python 3.12.0rc1:

Wed Sep  6 18:55:23 2023    profile.binfile

         27271926 function calls (26072328 primitive calls) in 20.862 seconds

   Ordered by: internal time
   List reduced from 520 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     7045    4.000    0.001    5.031    0.001 digraph.py:469(add_nodes_from)
     7045    3.003    0.000    5.198    0.001 digraph.py:713(add_edges_from)
  7056013    1.952    0.000    2.093    0.000 {method 'update' of 'dict' objects}
     1765    1.304    0.001    1.351    0.001 core.py:7090(concatenate)
  1232480    0.752    0.000    1.121    0.000 reportviews.py:788(<genexpr>)
    91026    0.699    0.000    1.069    0.000 resolver.py:92(Define)
   203489    0.645    0.000    1.167    0.000 core.py:2952(_update_from)
    41601    0.568    0.000    0.581    0.000 device.py:157(Default)
    41601    0.479    0.000    0.917    0.000 mnaindexer.py:238(SetIndexing)
        2    0.405    0.203    0.469    0.234 mnaloader.py:95(SetLinearTerms)
  1495228    0.326    0.000    0.326    0.000 {method 'copy' of 'dict' objects}
    83202    0.320    0.000    0.558    0.000 device.py:40(SetSocket)
  1496872    0.311    0.000    0.311    0.000 {method 'items' of 'dict' objects}
  1626268    0.308    0.000    0.308    0.000 {method 'get' of 'dict' objects}

Python 3.11.4:

Wed Sep  6 18:54:04 2023    profile.binfile

         27569104 function calls (26369506 primitive calls) in 16.836 seconds

   Ordered by: internal time
   List reduced from 541 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     7045    4.409    0.001    4.802    0.001 digraph.py:469(add_nodes_from)
     7045    1.576    0.000    2.972    0.000 digraph.py:713(add_edges_from)
     1765    1.337    0.001    1.380    0.001 core.py:7090(concatenate)
  1232480    0.767    0.000    0.961    0.000 reportviews.py:788(<genexpr>)
  7056013    0.765    0.000    0.881    0.000 {method 'update' of 'dict' objects}
    91026    0.520    0.000    0.787    0.000 resolver.py:92(Define)
   203489    0.486    0.000    0.721    0.000 core.py:2952(_update_from)
    41601    0.464    0.000    0.464    0.000 device.py:37(<dictcomp>)
    41601    0.369    0.000    0.704    0.000 mnaindexer.py:238(SetIndexing)
        2    0.331    0.165    0.413    0.206 mnaloader.py:95(SetLinearTerms)
    83202    0.312    0.000    0.402    0.000 device.py:40(SetSocket)
    93800    0.270    0.000    1.559    0.000 core.py:3217(__getitem__)
    43368    0.206    0.000    1.698    0.000 resolver.py:233(Connect)
1332528/332816    0.186    0.000    0.187    0.000 cell.py:20(GetDevices)
   267312    0.182    0.000    0.182    0.000 {built-in method numpy.array}
   104392    0.178    0.000    0.722    0.000 core.py:2978(__array_finalize__)
        1    0.148    0.148    0.921    0.921 mnamethod.py:276(SetLinearDeviceRulesIndexing)
   351240    0.129    0.000    0.839    0.000 {function MaskedArray.view at 0x7fc262658540}
  1227200    0.112    0.000    0.112    0.000 reportviews.py:774(<lambda>)
   1765/1    0.110    0.000   12.163   12.163 resolver.py:350(Build)
  1663235    0.109    0.000    0.109    0.000 {built-in method builtins.getattr}
    41601    0.105    0.000    0.162    0.000 mnaindexer.py:268(SetMatrixConstructionIndexing)
  1496872    0.101    0.000    0.101    0.000 {method 'items' of 'dict' objects}
  1643874    0.099    0.000    0.099    0.000 {method 'get' of 'dict' objects}
  5292/12    0.098    0.000    0.743    0.062 cell.py:275(SetParameters)
  1495228    0.098    0.000    0.098    0.000 {method 'copy' of 'dict' objects}

The slowdowns in networkx digraph class methods add_edges_from(), in particular, and add_nodes_from() are likely caused by the performance degradation of the python dictionary methods.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

gaogaotiantian · 2023-09-07T03:37:23Z

Sorry but I don't think this proves anything.

There could be a LOT of reasons why a complicated function slows down with a different interpreter. If this is indeed a performance regression of dictionary, you should be able to reproduce the result with ONLY dictionary operations. The issue itself is more like a speculation.

In fact, cProfile itself changed between 3.11 and 3.12, and it's a deterministic profiler which could introduce a signficant overhead if you have many small function calls.

Like I said, if you believe this is a dictionary performance regression, you should be able to isolate the dict operation, do it in a loop and stop-watch it - it should show the regression. Otherwise, I don't think this worth investigation from CPython's perspective.

chrisgmorton · 2023-09-10T16:50:11Z

import time
from array import array
import sys

print("\nPython", sys.version.replace("\n",""), "\n")

a = array('d')
start = time.time()
for i in range(100000000): a.append(i)
end = time.time()

print('\tarray.append(): {0:3.1f}s'.format(end-start))

l = []
start = time.time()
for i in range(200000000): l.append(i)
end = time.time()

print('\tlist.append():  {0:3.1f}s'.format(end-start))

d = {}
start = time.time()
for i in range(25000000): d.update({str(i):i})
end = time.time()

print('\tdict.update():  {0:3.1f}s'.format(end-start))

s = set()
start = time.time()
for i in range(160000000): s.add(i)
end = time.time()

print('\tset.add():      {0:3.1f}s'.format(end-start))

Five manual, consecutive runs with Python 3.11:

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 9.4s
        list.append():  10.0s
        dict.update():  11.8s
        set.add():      9.2s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 10.6s
        list.append():  9.4s
        dict.update():  12.2s
        set.add():      10.2s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 10.4s
        list.append():  9.0s
        dict.update():  11.9s
        set.add():      10.0s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 9.7s
        list.append():  8.7s
        dict.update():  12.0s
        set.add():      9.2s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 9.5s
        list.append():  9.4s
        dict.update():  12.2s
        set.add():      9.6s

Five manual, consecutive runs with Python 3.12:

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 11.5s
        list.append():  10.4s
        dict.update():  13.1s
        set.add():      10.3s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 10.9s
        list.append():  11.9s
        dict.update():  12.7s
        set.add():      11.1s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 11.7s
        list.append():  10.8s
        dict.update():  13.3s
        set.add():      10.5s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 11.4s
        list.append():  11.2s
        dict.update():  12.9s
        set.add():      10.7s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 10.8s
        list.append():  9.6s
        dict.update():  12.8s
        set.add():      11.2s

I would say the results are of statistical significance and the slowdown in Python 3.12 versus Python 3.11 is evident here. It seems plausible that the root cause is related to memory management, not specific to the dict built-in as I had originally thought: this test case consumes 20GB+ of memory.

chrisgmorton · 2023-09-10T20:49:25Z

I should check the performance of the other dict methods I referenced in the original submission as there may be multiple issues here. I'll wait for feedback first on these tests.

gaogaotiantian · 2023-09-10T22:36:04Z

Yes, this is observable slowdown. What's the result when the loop is smaller? Or when the memory consumption is smaller (same loop count, but do not accumulate memory that much).

It's not a common pattern to accumulate very small pieces of memory to a very large number (not saying that the slowdown does not mean anything). It could be a tradeoff to make common pattern faster.

It could be a memory management related reason, but it could also be something else. Not sure who's the expert in this area, but it would be nice to have more information (tests I mentioned above) so we may narrow down the cause.

chrisgmorton · 2023-09-11T01:17:12Z

Here's a quick response for the case where the loop is 100 times smaller for each datatype (again slowdown observed):

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 95.3ms
        list.append():  77.9ms
        dict.update():  74.0ms
        set.add():      99.2ms

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 97.6ms
        list.append():  82.8ms
        dict.update():  83.3ms
        set.add():      105.2ms

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 112.2ms
        list.append():  86.6ms
        dict.update():  75.9ms
        set.add():      104.6ms

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 95.3ms
        list.append():  82.1ms
        dict.update():  72.0ms
        set.add():      99.8ms

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 98.5ms
        list.append():  79.1ms
        dict.update():  74.8ms
        set.add():      108.0ms

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 112.6ms
        list.append():  93.8ms
        dict.update():  82.1ms
        set.add():      112.5ms

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 112.6ms
        list.append():  105.6ms
        dict.update():  82.7ms
        set.add():      115.7ms

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 109.3ms
        list.append():  90.1ms
        dict.update():  83.5ms
        set.add():      117.1ms

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 108.1ms
        list.append():  90.6ms
        dict.update():  80.6ms
        set.add():      106.5ms

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 110.5ms
        list.append():  91.9ms
        dict.update():  84.2ms
        set.add():      110.8ms

chrisgmorton · 2023-09-11T16:20:05Z

Here's the original test case but with immediate removal of the data entry inside the loop either with pop() or del d[key]. In this case there is no growth in memory usage (as expected). It's unclear to me that this tells us much more. The previous tests would suggest we simply have a per operation slowdown.

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 12.3s
        list.append():  15.7s
        dict.update():   9.7s
        set.add():      10.7s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 12.9s
        list.append():  16.8s
        dict.update():   9.9s
        set.add():      12.1s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 12.3s
        list.append():  15.2s
        dict.update():   9.6s
        set.add():      11.9s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 12.9s
        list.append():  15.1s
        dict.update():   9.6s
        set.add():      12.1s

Python 3.11.4 (main, Sep 10 2023, 06:39:02) [GCC 9.4.0]

        array.append(): 12.3s
        list.append():  14.8s
        dict.update():   9.4s
        set.add():      12.5s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 14.9s
        list.append():  16.8s
        dict.update():  11.6s
        set.add():      12.2s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 15.4s
        list.append():  17.6s
        dict.update():  12.1s
        set.add():      13.3s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 14.3s
        list.append():  16.1s
        dict.update():  11.7s
        set.add():      14.1s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 16.2s
        list.append():  16.2s
        dict.update():  12.1s
        set.add():      12.9s

Python 3.12.0rc2 (main, Sep 10 2023, 07:51:16) [GCC 9.4.0]

        array.append(): 15.4s
        list.append():  19.3s
        dict.update():  11.7s
        set.add():      15.3s

chrisgmorton · 2023-09-13T00:41:48Z

The performance regression seems to take place between 3.12.0a7 and 3.12.0b1. I'm not sure it's a single commit. It may be an accumulation of a number of commits, possibly in the area of sub-interpreter work.

eendebakpt · 2023-09-16T20:11:27Z

I can confirm there is some kind of regression. Performing this benchmark:

import pyperf
runner = pyperf.Runner()

setup="""
import copy

a={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}

"""

runner.timeit(name=f"deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)

results in

Mean +- std dev: [3114] 7.18 us +- 0.09 us -> [main] 7.91 us +- 0.22 us: 1.10x slower

This is for comparison of 3.11.4 with current main.

eendebakpt · 2023-09-19T07:06:47Z

From speed.python.org:

The performance regression seems to occur around the merge of #19474, which is known to reduce performance a bit.

chrisgmorton · 2023-09-21T16:28:21Z

For the bigger picture, my application is in a class of Scientific Computing applications, numerically solving coupled non-linear PDEs and extensively using NumPy, SciPy, H5Py, and sparse matrix linear algebra.

The 3.11.4 and 3.12.0rc2 installations use identical Python module versions (NumPy, SciPy, H5Py/HDF5 built from source for each Python version).

These are my performance test results, benchmarked against Python 3.8:

Python 3.11.4:

 > Testing circperformance

Test 0: Time is 3.20771s, reference is 4.10, performance is 1.28x or 27.8% speed-up. PASS
Test 1: Time is 1.11444s, reference is 1.40, performance is 1.26x or 25.6% speed-up. PASS
Test 2: Time is 1.45542s, reference is 1.70, performance is 1.17x or 16.8% speed-up. PASS
Test 3: Time is 2.01886s, reference is 2.40, performance is 1.19x or 18.9% speed-up. PASS

100% pass: 4 Tests, failures[], skipped[]

Python 3.12.0rc2:

 > Testing circperformance

Test 0: Time is 3.78364s, reference is 4.10, performance is 1.08x or 8.4% speed-up. PASS
Test 1: Time is 1.30262s, reference is 1.40, performance is 1.07x or 7.5% speed-up. PASS
Test 2: Time is 1.61533s, reference is 1.70, performance is 1.05x or 5.2% speed-up. PASS
Test 3: Time is 2.41445s, reference is 2.40, performance is 0.99x or 0.6% slow-down. FAIL

75% pass: 4 Tests, failures[3], skipped[]

Python 3.11.4:

 > Testing devperformance

Test 0: Time is 1.64307s, reference is 1.85, performance is 1.13x or 12.6% speed-up. PASS
Test 1: Time is 2.28619s, reference is 2.60, performance is 1.14x or 13.7% speed-up. PASS
Test 2: Time is 1.62358s, reference is 1.90, performance is 1.17x or 17.0% speed-up. PASS
Test 3: Time is 1.75078s, reference is 2.00, performance is 1.14x or 14.2% speed-up. PASS
Test 4: Time is 2.06646s, reference is 2.20, performance is 1.06x or 6.5% speed-up. PASS
Test 5: Time is 1.57825s, reference is 1.75, performance is 1.11x or 10.9% speed-up. PASS
Test 6: Time is 2.35982s, reference is 2.80, performance is 1.19x or 18.7% speed-up. PASS
Test 7: Time is 2.03836s, reference is 2.20, performance is 1.08x or 7.9% speed-up. PASS
Test 8: Time is 4.01285s, reference is 4.50, performance is 1.12x or 12.1% speed-up. PASS

100% pass: 9 Tests, failures[], skipped[]

Python 3.12.0rc2

> Testing devperformance

Test 0: Time is 1.85779s, reference is 1.85, performance is 1.00x or 0.4% slow-down. FAIL
Test 1: Time is 2.53469s, reference is 2.60, performance is 1.03x or 2.6% speed-up. PASS
Test 2: Time is 1.89236s, reference is 1.90, performance is 1.00x or 0.4% speed-up. PASS
Test 3: Time is 1.98677s, reference is 2.00, performance is 1.01x or 0.7% speed-up. PASS
Test 4: Time is 2.25508s, reference is 2.20, performance is 0.98x or 2.5% slow-down. FAIL
Test 5: Time is 1.77294s, reference is 1.75, performance is 0.99x or 1.3% slow-down. FAIL
Test 6: Time is 2.61213s, reference is 2.80, performance is 1.07x or 7.2% speed-up. PASS
Test 7: Time is 2.19335s, reference is 2.20, performance is 1.00x or 0.3% speed-up. PASS
Test 8: Time is 4.52551s, reference is 4.50, performance is 0.99x or 0.6% slow-down. FAIL

55% pass: 9 Tests, failures[0, 4, 5, 8], skipped[]

As expected, we see substantial performance improvements for Python 3.11 over Python 3.8. However, Python 3.12.0rc2 appears to give up much of these gains.

eendebakpt · 2023-09-22T20:49:32Z

I did some more tests with publicly available packages (sympy, lmfit and lark) and compared python 3.10, 3.11 and 3.12. The results are similar to the OP: a significant performance improvement from 3.10 to 3.11, but a regression from 3.11 to 3.12 for two out of the three test cases. For sympy the performance loss is about 10%.

3.10.8 (tags/v3.10.8-dirty:aaaf517424, May  9 2023, 23:40:53) [GCC 9.4.0]
sympy 0.35402853199775564
lmfit 0.3148488170008932
lark 0.3982844579986704

3.11.5 (main, Sep 22 2023, 16:56:56) [GCC 9.4.0]
sympy 0.2964389949993347
lmfit 0.281992337997508
lark 0.3138663429999724

3.12.0rc3 (main, Sep 22 2023, 21:36:02) [GCC 9.4.0]
sympy 0.32691953900211956
lmfit 0.29783570200015674
lark 0.2835550689997035

Full code for benchmarks

Source was downloaded from python.org and compiled with ./configure --enable-optimizations. Required packages have been installed with

python -m pip install sympy==1.12 numpy==1.26.0 scipy==1.11.2 lmfit==1.2.2 pyperformance==1.0.9 lark==1.1.7

Benchmark script

import sys
import time

import sympy
from lark import Lark, Transformer, v_args
from lmfit import Model
from numpy import exp, linspace, random

print(sys.version)


# %%


def gaussian(x, amp, cen, wid):
    return amp * exp(-(x-cen)**2 / wid)


def bench_sympy():
    x = sympy.symbols('x')

    for n in [1, 10, 20]:
        expr = x**n*sympy.exp(x)
        integral = sympy.Integral(expr, x)
        r = integral.doit()

        expr = sympy.log(sympy.Add(*[sympy.exp(i*x)
                         for i in range(n)])).diff(x)
        integral = sympy.Integral(expr, x)
        r = integral.doit()


# activate caches in sympy
bench_sympy()

t0 = time.perf_counter()
bench_sympy()
dt = time.perf_counter()-t0
print(f'sympy {dt}')

# %%


def bench_lmfit():

    x = linspace(-10, 10, 101)
    y = gaussian(x, 2.33, 0.21, 1.51) + random.normal(0, 0.2, x.size)

    gmodel = Model(gaussian)
    params = gmodel.make_params(cen=0.3, amp=3, wid=1.25)
    result = gmodel.fit(y, params, x=x)

    gmodel.set_param_hint('amp', min=2.34)
    result = gmodel.fit(y, params, x=x)


t0 = time.perf_counter()
for ii in range(40):
    bench_lmfit()
dt = time.perf_counter()-t0
print(f'lmfit {dt}')


# %%


json_grammar = r"""
    ?start: value

    ?value: object
          | array
          | string
          | SIGNED_NUMBER      -> number
          | "true"             -> true
          | "false"            -> false
          | "null"             -> null

    array  : "[" [value ("," value)*] "]"
    object : "{" [pair ("," pair)*] "}"
    pair   : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS

    %ignore WS
"""


class TreeToJson(Transformer):
    @v_args(inline=True)
    def string(self, s):
        return s[1:-1].replace('\\"', '"')

    array = list
    pair = tuple
    object = dict
    number = v_args(inline=True)(float)

    def null(self, _): return None
    def true(self, _): return True
    def false(self, _): return False


def bench_lark():
    json_parser = Lark(json_grammar, parser='lalr',
                       lexer='basic',
                       propagate_positions=False,
                       maybe_placeholders=False,
                       transformer=TreeToJson())
    parse = json_parser.parse

    test_json = '''
        {
            "empty_object" : {},
            "empty_array"  : [],
            "booleans"     : { "YES" : true, "NO" : false },
            "numbers"      : [ 0, 1, -2, 3.3, 4.4e5, 6.6e-7 ],
            "strings"      : [ "This", [ "And" , "That", "And a \\"b" ] ],
            "nothing"      : null
        }
    '''

    j = parse(test_json)


t0 = time.perf_counter()
for ii in range(20):
    bench_lark()
dt = time.perf_counter()-t0
print(f'lark {dt}')

Update: I benchmarked sympy for ea2c001 (the immortal instances PR) and 916de04 (the commit before)
The performance regression is 5% (averaged over 10 runs: base: 0.29103303970023264 immortal instances: 0.30650054839989027). That means the other 5% performance regression is due to other commits between 3.11 and 3.12

nijel · 2023-10-12T07:05:00Z

I've found this issue while noticing that our CI tests occasionally timeout at GitHub on Python 3.12 while they worked fine on 3.11. We did see performance gains for several Python releases, but 3.12 seems to go back to the performance of 3.9 (at least CI wise).

Slowdown can be seen on other projects as well. For example, the above-mentioned sympy has some parts of the test suite (they run it split into 4 parts) 20% slower on Python 3.12 compared to 3.11, Django tests seem about 10% slower. I know that CI timing can vary a lot, but the regression seems to happen across projects, and this issue shows some synthetic benchmarks as well.

Links to CI runs

Voultapher · 2023-10-27T17:00:05Z

The user sahnehaeubchen linked this issue to PEP 683 in this discussion, so far I don't see the possible connection talked about here. I don't know what they are basing the claim on, but it seems plausible to me that an additional branch on every object refcount check could have wide ranging impacts. The PEP claims:

A naive implementation of the approach described below makes CPython roughly 4% slower. However, the implementation is performance-neutral once known mitigations are applied.

I could image that the benchmarks performed might not account for branch miss-prediction due to BTB misses and generally larger code-sizes that don't fit into i-cache L1 and L0/1 BTB and other cache effects of larger programs that affect branch prediction cost.

ericsnowcurrently · 2023-11-28T15:57:16Z

CC @eduardo-elizondo

eduardo-elizondo · 2023-12-29T17:08:06Z

Hey, just catching up here! Thanks for the ping @ericsnowcurrently - following on the above there is a link on the regression after the implementation of PEP683. There are benchmarks that perform better and benchmarks that perform worse. For the full list, take a look here which shows a ~1.02x regression on the geometric average.

One thing to add though, the build here uses gcc/g++ 9.4 which I empirically found to perform slightly worse with PEP683. Instead, I’d recommend trying this out again with GCC 11.4 (or LLVM 15+) which seems to fare much better with PEP683 than GCC 9.4.

Sorry for the delay in replies and happy to answer any follow-up questions 🙂

chrisgmorton added the type-bug An unexpected behavior, bug, or error label Sep 7, 2023

hugovk added performance Performance or resource usage 3.12 bugs and security fixes 3.13 new features, bugs and security fixes labels Sep 7, 2023

chrisgmorton mentioned this issue Sep 7, 2023

Possible performance regression of digraph add_edges_from and add_nodes_from in Python 3.12.0rc1 networkx/networkx#6918

Closed

nijel mentioned this issue Oct 12, 2023

chore(deps): update python docker tag to v3.12.0 WeblateOrg/docker#1990

Closed

1 task

eendebakpt mentioned this issue Oct 28, 2023

gh-84436: Implement Immortal Objects #19474

Merged

nijel mentioned this issue Nov 6, 2023

Python 3.12 support WeblateOrg/weblate#10361

Closed

2 tasks

iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 28, 2023

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

chrisgmorton commented Sep 7, 2023 •

edited by github-actions bot

gaogaotiantian commented Sep 7, 2023 •

edited

chrisgmorton commented Sep 10, 2023 •

edited

chrisgmorton commented Sep 10, 2023 •

edited

gaogaotiantian commented Sep 10, 2023

chrisgmorton commented Sep 11, 2023 •

edited

chrisgmorton commented Sep 11, 2023 •

edited

chrisgmorton commented Sep 13, 2023

eendebakpt commented Sep 16, 2023

eendebakpt commented Sep 19, 2023

chrisgmorton commented Sep 21, 2023

eendebakpt commented Sep 22, 2023 •

edited

nijel commented Oct 12, 2023

Voultapher commented Oct 27, 2023 •

edited

ericsnowcurrently commented Nov 28, 2023

eduardo-elizondo commented Dec 29, 2023

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

Substantial Performance Regression of Dict operations in Python 3.12.0rc1 versus Python 3.11.4 #109049

Comments

chrisgmorton commented Sep 7, 2023 • edited by github-actions bot

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

gaogaotiantian commented Sep 7, 2023 • edited

chrisgmorton commented Sep 10, 2023 • edited

chrisgmorton commented Sep 10, 2023 • edited

gaogaotiantian commented Sep 10, 2023

chrisgmorton commented Sep 11, 2023 • edited

chrisgmorton commented Sep 11, 2023 • edited

chrisgmorton commented Sep 13, 2023

eendebakpt commented Sep 16, 2023

eendebakpt commented Sep 19, 2023

chrisgmorton commented Sep 21, 2023

eendebakpt commented Sep 22, 2023 • edited

nijel commented Oct 12, 2023

Voultapher commented Oct 27, 2023 • edited

ericsnowcurrently commented Nov 28, 2023

eduardo-elizondo commented Dec 29, 2023

chrisgmorton commented Sep 7, 2023 •

edited by github-actions bot

gaogaotiantian commented Sep 7, 2023 •

edited

chrisgmorton commented Sep 10, 2023 •

edited

chrisgmorton commented Sep 10, 2023 •

edited

chrisgmorton commented Sep 11, 2023 •

edited

chrisgmorton commented Sep 11, 2023 •

edited

eendebakpt commented Sep 22, 2023 •

edited

Voultapher commented Oct 27, 2023 •

edited