Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91049: Introduce set vectorcall field API for PyFunctionObject #92257

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

adphrost
Copy link

@adphrost adphrost commented May 3, 2022

Avoid specializing functions with overridden vectorcall field

@adphrost adphrost requested a review from markshannon as a code owner May 3, 2022
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented May 3, 2022

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

@adphrost
Copy link
Author

@adphrost adphrost commented May 3, 2022

@markshannon Can you let me know if there should be any other checks in ceval.c besides the one added in this PR?

For context, this check fixes an edge case where calling function with a set vectorfield only worked with a unpacked tuple, e.g. f(1) failed but f(*(1,)) succeeded (now both succeed)

Copy link
Contributor

@itamaro itamaro left a comment

thanks for working on this @adphrost !

Lib/test/test_call.py Outdated Show resolved Hide resolved
Lib/test/test_call.py Outdated Show resolved Hide resolved
Python/ceval.c Outdated
@@ -4792,7 +4792,7 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
PyObject *function = PEEK(total_args + 1);
int positional_args = total_args - KWNAMES_LEN();
// Check if the call can be inlined or not
if (Py_TYPE(function) == &PyFunction_Type && tstate->interp->eval_frame == NULL) {
if (Py_TYPE(function) == &PyFunction_Type && tstate->interp->eval_frame == NULL && ((PyFunctionObject *)function)->vectorcall == _PyFunction_Vectorcall) {
Copy link
Contributor

@itamaro itamaro May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably exceeds line length

Copy link
Author

@adphrost adphrost May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the max line length allowed? And is there a linter I can run to handle this for me?


This PR introduces an API call where vectorcall field can be modified like so:

`void PyFunction_SetVectorcall(PyFunctionObject *func, vectorcallfunc vectorcall);`
Copy link
Member

@vstinner vstinner May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the function in Doc/c-api/function.rst and in the NEWS entry, refer to use with:

:c:func:`PyFunction_SetVectorcall`

New public functions should also be documented in What's New in Python 3.11 > C API > New features: Doc/whatsnew/3.11.rst.

Copy link
Member

@markshannon markshannon left a comment

A few comments.
Have you attempted to measure the performance impact of this? (I'd expect it to be small or negligible).

Objects/funcobject.c Outdated Show resolved Hide resolved
Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Modules/_testcapimodule.c Outdated Show resolved Hide resolved
Change how assertion is done

Co-authored-by: Itamar Ostricher <itamarost@gmail.com>
@cpython-cla-bot
Copy link

@cpython-cla-bot cpython-cla-bot bot commented May 3, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@itamaro
Copy link
Contributor

@itamaro itamaro commented May 4, 2022

Have you attempted to measure the performance impact of this? (I'd expect it to be small or negligible).

@markshannon is there a simple way to ask a buildbot to run pyperformance comparing this against the base revision? we can do that manually, but a buildbot doing it for us would be nice :)

@itamaro
Copy link
Contributor

@itamaro itamaro commented May 19, 2022

circling back to @markshannon 's performance impact question. -- tldr Geometric mean: 1.00x faster

some more details:

rebased this PR onto a4460f2 and ran pyperformance on a4460f2 and on the rebased branch

both baseline and branch were built using ./configure --enable-optimizations --with-lto && make -j

used pyperformance 1.0.5 (installed directly from GitHub) on bare metal AWS machine (Linux-5.13.0-1023-aws-x86_64-with-glibc2.31 with 72 logical CPUs), and excluded the sqlalchemy benchmarks because they tried building greenlet and failed

pyperformance run --benchmarks=-sqlalchemy_declarative,-sqlalchemy_imperative --python builds/3.12-opt/python -o main.json --rigorous
pyperformance run --benchmarks=-sqlalchemy_declarative,-sqlalchemy_imperative --python builds/3.12-gh-91049-vectorcall-opt/python -o gh-91049.json --rigorous

compare results:

pyperf compare_to main.json gh-91049.json -G
Slower (20):
- regex_effbot: 3.02 ms +- 0.01 ms -> 3.17 ms +- 0.01 ms: 1.05x slower
- meteor_contest: 110 ms +- 2 ms -> 114 ms +- 4 ms: 1.04x slower
- pidigits: 200 ms +- 0 ms -> 208 ms +- 0 ms: 1.04x slower
- telco: 7.17 ms +- 0.21 ms -> 7.39 ms +- 0.34 ms: 1.03x slower
- html5lib: 64.0 ms +- 2.7 ms -> 65.6 ms +- 2.6 ms: 1.02x slower
- unpickle_list: 5.30 us +- 0.08 us -> 5.40 us +- 0.05 us: 1.02x slower
- pyflate: 430 ms +- 3 ms -> 438 ms +- 4 ms: 1.02x slower
- sympy_sum: 168 ms +- 2 ms -> 171 ms +- 2 ms: 1.02x slower
- xml_etree_iterparse: 110 ms +- 1 ms -> 112 ms +- 3 ms: 1.02x slower
- async_tree_memoization: 640 ms +- 25 ms -> 650 ms +- 25 ms: 1.01x slower
- chaos: 71.8 ms +- 0.7 ms -> 72.8 ms +- 0.6 ms: 1.01x slower
- hexiom: 6.71 ms +- 0.06 ms -> 6.80 ms +- 0.06 ms: 1.01x slower
- sympy_str: 299 ms +- 3 ms -> 302 ms +- 4 ms: 1.01x slower
- spectral_norm: 101 ms +- 1 ms -> 103 ms +- 1 ms: 1.01x slower
- sympy_integrate: 21.5 ms +- 0.2 ms -> 21.8 ms +- 0.2 ms: 1.01x slower
- logging_silent: 105 ns +- 1 ns -> 107 ns +- 3 ns: 1.01x slower
- raytrace: 303 ms +- 2 ms -> 306 ms +- 4 ms: 1.01x slower
- regex_v8: 23.1 ms +- 0.3 ms -> 23.3 ms +- 0.1 ms: 1.01x slower
- unpickle_pure_python: 239 us +- 2 us -> 240 us +- 2 us: 1.01x slower
- 2to3: 282 ms +- 2 ms -> 283 ms +- 1 ms: 1.00x slower

Faster (25):
- genshi_xml: 57.1 ms +- 2.7 ms -> 54.3 ms +- 1.8 ms: 1.05x faster
- scimark_sparse_mat_mult: 4.93 ms +- 0.06 ms -> 4.70 ms +- 0.10 ms: 1.05x faster
- scimark_lu: 112 ms +- 2 ms -> 109 ms +- 2 ms: 1.03x faster
- json_dumps: 13.3 ms +- 0.2 ms -> 12.9 ms +- 0.1 ms: 1.03x faster
- scimark_fft: 337 ms +- 3 ms -> 331 ms +- 3 ms: 1.02x faster
- nbody: 97.6 ms +- 1.8 ms -> 95.9 ms +- 1.8 ms: 1.02x faster
- pathlib: 23.0 ms +- 1.2 ms -> 22.6 ms +- 0.8 ms: 1.02x faster
- pickle_pure_python: 324 us +- 3 us -> 318 us +- 3 us: 1.02x faster
- genshi_text: 23.2 ms +- 0.4 ms -> 22.8 ms +- 0.3 ms: 1.02x faster
- unpack_sequence: 46.5 ns +- 0.6 ns -> 45.7 ns +- 0.7 ns: 1.02x faster
- fannkuch: 401 ms +- 6 ms -> 396 ms +- 4 ms: 1.01x faster
- chameleon: 7.06 ms +- 0.06 ms -> 6.98 ms +- 0.08 ms: 1.01x faster
- pickle_dict: 27.5 us +- 0.1 us -> 27.2 us +- 1.0 us: 1.01x faster
- pickle: 10.2 us +- 0.1 us -> 10.1 us +- 0.1 us: 1.01x faster
- xml_etree_parse: 170 ms +- 2 ms -> 168 ms +- 3 ms: 1.01x faster
- xml_etree_process: 56.8 ms +- 0.5 ms -> 56.3 ms +- 0.5 ms: 1.01x faster
- nqueens: 88.0 ms +- 1.1 ms -> 87.2 ms +- 1.1 ms: 1.01x faster
- json_loads: 26.3 us +- 0.3 us -> 26.1 us +- 0.2 us: 1.01x faster
- async_tree_none: 549 ms +- 15 ms -> 544 ms +- 12 ms: 1.01x faster
- async_tree_cpu_io_mixed: 771 ms +- 13 ms -> 766 ms +- 10 ms: 1.01x faster
- mako: 10.5 ms +- 0.0 ms -> 10.4 ms +- 0.1 ms: 1.01x faster
- crypto_pyaes: 77.2 ms +- 0.8 ms -> 76.8 ms +- 0.9 ms: 1.01x faster
- xml_etree_generate: 81.2 ms +- 0.6 ms -> 80.8 ms +- 0.5 ms: 1.00x faster
- regex_compile: 142 ms +- 1 ms -> 141 ms +- 1 ms: 1.00x faster
- scimark_sor: 121 ms +- 1 ms -> 120 ms +- 1 ms: 1.00x faster

Benchmark hidden because not significant (18): async_tree_io, deltablue, django_template, dulwich_log, float, go, logging_format, logging_simple, pickle_list, python_startup, python_startup_no_site, regex_dna, richards, scimark_monte_carlo, sqlite_synth, sympy_expand, tornado_http, unpickle

Geometric mean: 1.00x faster

thanks @ansley for doing the perf analysis on this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants