Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide C implementation for asyncio.current_task #100344

Closed
itamaro opened this issue Dec 19, 2022 · 12 comments
Closed

Provide C implementation for asyncio.current_task #100344

itamaro opened this issue Dec 19, 2022 · 12 comments
Labels
3.12 expert-asyncio performance Performance or resource usage

Comments

@itamaro
Copy link
Contributor

itamaro commented Dec 19, 2022

Feature or enhancement

By providing a C implementation for asyncio.current_task, its performance can be improved.

Pitch

Performance improvement.

From Instagram profiling data, we've found that this function is called frequently, and a C implementation (in Cinder 3.8) showed more than 4x speedup in a microbenchmark.

Previous discussion

N/A

Linked PRs

@itamaro itamaro added the type-feature A feature request or enhancement label Dec 19, 2022
itamaro added a commit to itamaro/cpython that referenced this issue Dec 19, 2022
Implementing it in C makes it about 4x-6x faster

Microbenchmark:

```
 # bench.py

import time
import timeit
import asyncio

ITERS: int = 10**6
NANO: int = 10**9
NANO_PER_ITER: float = NANO / ITERS

async def main():
   # avoid load attr noise
   py_current_task = asyncio.tasks._py_current_task
   c_current_task = asyncio.tasks._c_current_task

   asyncio.current_task() # warmup
   py_current_task() # warmup
   c_current_task() # warmup

   print(
      "current_task: {}ns".format(timeit.timeit(py_current_task, number=ITERS, timer=time.process_time) * NANO_PER_ITER)
   )
   print(
      "current_task: {}ns".format(timeit.timeit(c_current_task, number=ITERS, timer=time.process_time) * NANO_PER_ITER)
   )

asyncio.run(main())
```

a few runs on MacBook Pro
2.4 GHz 8-Core Intel Core i9
64 GB 2667 MHz DDR4:

debug build:

```
~/work/pyexe/main-dbg  9:57:34
$ ./python.exe bench.py
[py] current_task: 606.234ns
[c] current_task: 104.47699999999993ns

~/work/pyexe/main-dbg  9:57:59
$ ./python.exe bench.py
[py] current_task: 631.856ns
[c] current_task: 110.22500000000002ns

~/work/pyexe/main-dbg  9:58:08
$ ./python.exe bench.py
[py] current_task: 637.746ns
[c] current_task: 105.03899999999999ns

~/work/pyexe/main-dbg  9:58:16
$ ./python.exe bench.py
[py] current_task: 621.3169999999999ns
[c] current_task: 103.01300000000002ns
```

opt build:

```
~/work/pyexe/main-opt  10:33:17
$ ./python.exe bench.py
[py] current_task: 128.743ns
[c] current_task: 31.997999999999998ns

~/work/pyexe/main-opt  10:33:24
$ ./python.exe bench.py
[py] current_task: 126.388ns
[c] current_task: 32.64599999999998ns

~/work/pyexe/main-opt  10:33:26
$ ./python.exe bench.py
[py] current_task: 137.053ns
[c] current_task: 32.066999999999986ns

~/work/pyexe/main-opt  10:33:28
$ ./python.exe bench.py
[py] current_task: 131.17800000000003ns
[c] current_task: 32.06600000000001ns
```
@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Dec 19, 2022

Can you post benchmarks with pyperf?

@kumaraditya303 kumaraditya303 added performance Performance or resource usage expert-asyncio 3.12 and removed type-feature A feature request or enhancement labels Dec 19, 2022
@itamaro
Copy link
Contributor Author

itamaro commented Dec 19, 2022

Can you post benchmarks with pyperf?

You mean pyperformance suite with and without the C acceleration?

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Dec 19, 2022

You mean pyperformance suite with and without the C acceleration?

No this microbenchmark with pyperf.

@bluetech
Copy link

bluetech commented Dec 19, 2022

As a note of support for making this faster, current_task is used in the asgiref.local implementation, which is used a lot by Django, and it shows up in profiles.

@itamaro
Copy link
Contributor Author

itamaro commented Dec 19, 2022

No this microbenchmark with pyperf.

thanks for the clarification :)

I don't know how to isolate the overhead of the event loop when using pyperf, so hope this is helpful:

C implementation:

$ python -m pyperf timeit -s 'from asyncio.tasks import _c_current_task as current_task' -s 'from asyncio import run' -s '
async def main():
  for _ in range(10**6):
    current_task()
' 'run(main())'
.....................
Mean +- std dev: 33.4 ms +- 1.0 ms

Python implementation:

$ python -m pyperf timeit -s 'from asyncio.tasks import _py_current_task as current_task' -s 'from asyncio import run' -s '
async def main():
  for _ in range(10**6):
    current_task()
' 'run(main())'
.....................
Mean +- std dev: 133 ms +- 8 ms

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Dec 20, 2022

The numbers looks interesting, it seems to be because there is no fastpath for dict.get like there is for list.append in ceval.

@markshannon Do you have plans to optimize this?

@markshannon
Copy link
Member

markshannon commented Dec 20, 2022

@kumaraditya303 No plans at the moment.

I'd be interested to see how this compared:

def current_task(loop=None):
    """Return a currently executed task."""
    if loop is None:
        loop = events.get_running_loop()
    try:
        return _current_tasks[loop]
    except:
        return None

I assume that current_task() is expected to return a task, not None, most of the time.

@itamaro
Copy link
Contributor Author

itamaro commented Dec 20, 2022

I assume that current_task() is expected to return a task, not None, most of the time.

makes sense!

this optimization make the python impl about 40% faster!

$ python -m pyperf timeit -s 'from asyncio.tasks import _py_current_task as current_task' -s 'from asyncio import run' -s '
async def main():
  for _ in range(10**6):
    current_task()
' 'run(main())'
.....................
Mean +- std dev: 81.1 ms +- 4.9 ms

the C impl is still more than 2x faster than this, so maybe do both?

@gvanrossum
Copy link
Member

gvanrossum commented Dec 21, 2022

Why bother speeding up the Python version if we have the C version? There really aren't any interesting situations where the C accelerator is unavailable (that I know of).

@itamaro
Copy link
Contributor Author

itamaro commented Dec 21, 2022

Why bother speeding up the Python version if we have the C version? There really aren't any interesting situations where the C accelerator is unavailable (that I know of).

In my mind it's "why not speed up the Python version?"
I'm not aware of situations where it matters for cpython users, but maybe alternative implementations that use cpython's stdlib it would be valuable?
anyway, I don't feel strongly about it. happy to revert to the existing python implementation if you'd prefer!

@gvanrossum
Copy link
Member

gvanrossum commented Dec 21, 2022

In my mind it's "why not speed up the Python version?"

Because you're replacing one line of code with a well-known idiom with four lines of code that require the reader to follow carefully what's going on and why. For me, reading the version with .get() is much quicker than the try/except version.

If we wrote hyper-optimized code like that everywhere, even when speed doesn't matter, we'd end up with considerably less readable code. So, in my mind the question very much needs to be "why speed it up".

@carljm
Copy link
Contributor

carljm commented Dec 22, 2022

The optimization in the Python version is also very much tuned for the current performance characteristics of the adaptive interpreter (i.e. that dict subscripts are much better optimized than dict method calls.) If alternate Python implementations are the only likely users of the Python implementation, this code change won't necessarily give similar speedups for them.

kumaraditya303 pushed a commit that referenced this issue Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 expert-asyncio performance Performance or resource usage
Projects
Status: Done
Development

No branches or pull requests

6 participants