Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46212: Avoid temporary varargs tuple creation in argument passing #30312

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

colorfulappl
Copy link
Contributor

@colorfulappl colorfulappl commented Dec 31, 2021

When "Augument Clinic generated code" are parsing arguments, the args are packed to a tuple before passing to callee. This may be unnecessary.

Pass a raw pointer which points to on-stack varargs, and a varargssize integer to indicate how many varargs are passed, can save the time of tuple creation/destruction and value copy.

https://bugs.python.org/issue46212

@the-knights-who-say-ni
Copy link

the-knights-who-say-ni commented Dec 31, 2021

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@colorfulappl

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

Copy link
Member

@sweeneyde sweeneyde left a comment

Just a couple of comments here.

I wonder if it would be simpler to restrict the scope of such a change to only signatures like def f(*args, kw1=<default1>, kw2=<default2>, ...), like print()/min()/max() have.

@@ -3341,16 +3342,16 @@ test_vararg_and_posonly(PyObject *module, PyObject *const *args, Py_ssize_t narg
for (Py_ssize_t i = 0; i < nargs - 1; ++i) {
PyTuple_SET_ITEM(__clinic_args, i, args[1 + i]);
Copy link
Member

@sweeneyde sweeneyde Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems suspicious -- isn't __clinic_args a pointer-to-PyObject-pointer, not a pointer-to-tuple?

Copy link
Contributor Author

@colorfulappl colorfulappl Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault. The tuple doesn't need to be created.

if (!args) {
goto exit;
}
a = args[0];
__clinic_args = args[1];
return_value = test_vararg_impl(module, a, __clinic_args);
__clinic_args = (PyObject *const *)args[1];
Copy link
Member

@sweeneyde sweeneyde Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid such unsafe casts. Maybe passing a PyObject *** into _PyArg_UnpackKeywordsWithVarargFast instead?

Copy link
Contributor Author

@colorfulappl colorfulappl Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy and cast are unnecessary.
Maybe __clinic_args = args + 1; is enough, note the args refers to the 2nd argument of function test_vararg, assuming it's not assigned in line 3389.

Python/getargs.c Outdated
buf[vararg] = (PyObject *)&args[vararg];

/* copy required positional args */
for (i = 0; i < vararg; i++) {
Copy link
Member

@sweeneyde sweeneyde Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to avoid copying at all by directly passing some sub-buffer of the *args buffer into the ..._impl function?

Copy link
Contributor Author

@colorfulappl colorfulappl Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. None of the positional args or variable-length args need to be copied. I will let _PyArg_UnpackKeywordsWithVarargFast parse default and keyword args only.

@isidentical isidentical self-requested a review Jan 21, 2022
@isidentical
Copy link
Sponsor Member

isidentical commented Jan 21, 2022

Thanks for the PR and benchmarks @colorfulappl, I am current blocked a bit but plan to review this soon!

@colorfulappl
Copy link
Contributor Author

colorfulappl commented Jan 21, 2022

Thanks for the PR and benchmarks @colorfulappl, I am current blocked a bit but plan to review this soon!

Thank you @isidentical !
With @sweeneyde 's comment I have found that my patch is actually buggy.
I am writing a new patch but blocked by some affairs last a few days.
You can review after I push a new commit and ignore this version. Sorry :)

@isidentical
Copy link
Sponsor Member

isidentical commented Jan 21, 2022

Sure, then just ping me whenever the new patch ready and I'll try to take a look at it then!

@colorfulappl
Copy link
Contributor Author

colorfulappl commented Mar 23, 2022

Sorry for such a long delay.

When generate parsing code for function with varargs, this PR:

  1. Introduce a new function _PyArg_UnpackKeywordsWithVarargKwonly.
    It's a copy of _PyArg_UnpackKeywordsWithVararg but copying of positional args and packing of varargs are deleted.
    _PyArg_UnpackKeywordsWithVarargKwonly returns an array pointer of parsed keyword args only.
  2. Positional args are not copied to argsbuf. argsbuf only saves parsed keyword args.
  3. Pass varargsize and vararg array pointer to XXXX_impl, this avoid pack/unpack of varargs.

I'd really appreciate it if you could take a look and give me some advice @isidentical .

@colorfulappl
Copy link
Contributor Author

colorfulappl commented Mar 23, 2022

Benchmark Result

Benchmark:
https://bugs.python.org/file50533

Environment:
OS: macOS 12.3
CPU: Apple M1 Pro
Compiler: Apple clang version 13.1.6
Build args: --enable-optimizations --with-lto

+-----------------------------------------------------+---------+-----------------------+
| Benchmark                                           | base    | opt                   |
+=====================================================+=========+=======================+
| print(a)                                            | 133 ns  | 126 ns: 1.06x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, b, c)                                      | 337 ns  | 326 ns: 1.04x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, b, c, *v)                                  | 679 ns  | 661 ns: 1.03x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, sep='', file=stdout)                       | 134 ns  | 125 ns: 1.07x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, sep='', flush=True, file=stdout)           | 571 ns  | 563 ns: 1.01x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(*v, sep='', flush=True, file=stdout)          | 833 ns  | 819 ns: 1.02x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, b, c, *v, sep='', flush=True, file=stdout) | 1.18 us | 1.16 us: 1.02x faster |
+-----------------------------------------------------+---------+-----------------------+
| Geometric mean                                      | (ref)   | 1.03x faster          |
+-----------------------------------------------------+---------+-----------------------+

@colorfulappl
Copy link
Contributor Author

colorfulappl commented Mar 24, 2022

I found there are some bugs when argument clinic processes varargs and made a fix in #32092.
Maybe this patch should wait until these bugs are fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants