Skip to content

Free list for longobject medium value #91912

Closed
@penguin-wwy

Description

@penguin-wwy

static PyObject *
_PyLong_FromMedium(sdigit x)
{
assert(!IS_SMALL_INT(x));
assert(is_medium_int(x));
/* We could use a freelist here */
PyLongObject *v = PyObject_Malloc(sizeof(PyLongObject));
if (v == NULL) {
PyErr_NoMemory();
return NULL;
}

Medium value needs to be allocated and deallocated very frequently.

I tested under different numfree and showed that freelist can improve performance.

./main/bin/python3 -m pyperf compare_to --min-speed 5 -G --table --table-format md main.json free_list100.json 
Benchmark main numfree 100
spectral_norm 112 ms 96.7 ms: 1.16x faster
json_loads 29.8 us 25.9 us: 1.15x faster
logging_silent 112 ns 102 ns: 1.10x faster
regex_v8 25.4 ms 23.7 ms: 1.07x faster
mako 10.6 ms 10.1 ms: 1.06x faster
pyflate 454 ms 430 ms: 1.06x faster
regex_dna 228 ms 217 ms: 1.05x faster
crypto_pyaes 87.0 ms 91.6 ms: 1.05x slower
chaos 76.6 ms 82.4 ms: 1.08x slower
Geometric mean (ref) 1.01x faster

./main/bin/python3 -m pyperf compare_to --min-speed 5 -G --table --table-format md main.json free_list500.json

Benchmark main numfree 500
json_loads 29.8 us 26.3 us: 1.13x faster
regex_v8 25.4 ms 23.5 ms: 1.08x faster
scimark_sparse_mat_mult 4.63 ms 4.34 ms: 1.07x faster
spectral_norm 112 ms 106 ms: 1.06x faster
scimark_fft 361 ms 343 ms: 1.05x faster
regex_dna 228 ms 217 ms: 1.05x faster
scimark_lu 105 ms 112 ms: 1.07x slower
crypto_pyaes 87.0 ms 95.3 ms: 1.10x slower
Geometric mean (ref) 1.00x faster

./main/bin/python3 -m pyperf compare_to --min-speed 5 -G --table --table-format md main.json free_list1000.json

Benchmark main numfree 1000
regex_dna 228 ms 184 ms: 1.24x faster
spectral_norm 112 ms 96.9 ms: 1.16x faster
json_loads 29.8 us 26.2 us: 1.13x faster
scimark_sparse_mat_mult 4.63 ms 4.17 ms: 1.11x faster
scimark_sor 129 ms 121 ms: 1.07x faster
regex_effbot 3.02 ms 2.82 ms: 1.07x faster
scimark_fft 361 ms 341 ms: 1.06x faster
pickle_dict 30.4 us 28.8 us: 1.06x faster
logging_silent 112 ns 106 ns: 1.06x faster
telco 7.23 ms 6.85 ms: 1.06x faster
crypto_pyaes 87.0 ms 99.0 ms: 1.14x slower
Geometric mean (ref) 1.02x faster

The current results show that a numfree of 1000 can speed 2% in the pyperformance benchmark. In memory, medium value need sizeof(PyLongObject) + 4. In the worst case, every thread may have to pay about 36k of extra memory(if numfree == 1000)

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions