Provide C implementation for asyncio.current_task #100344

itamaro · 2022-12-19T10:16:18Z

Feature or enhancement

By providing a C implementation for asyncio.current_task, its performance can be improved.

Pitch

Performance improvement.

From Instagram profiling data, we've found that this function is called frequently, and a C implementation (in Cinder 3.8) showed more than 4x speedup in a microbenchmark.

Previous discussion

N/A

Linked PRs

gh-100344: Provide C implementation for asyncio.current_task #100345

The text was updated successfully, but these errors were encountered:

Implementing it in C makes it about 4x-6x faster Microbenchmark: ``` # bench.py import time import timeit import asyncio ITERS: int = 10**6 NANO: int = 10**9 NANO_PER_ITER: float = NANO / ITERS async def main(): # avoid load attr noise py_current_task = asyncio.tasks._py_current_task c_current_task = asyncio.tasks._c_current_task asyncio.current_task() # warmup py_current_task() # warmup c_current_task() # warmup print( "current_task: {}ns".format(timeit.timeit(py_current_task, number=ITERS, timer=time.process_time) * NANO_PER_ITER) ) print( "current_task: {}ns".format(timeit.timeit(c_current_task, number=ITERS, timer=time.process_time) * NANO_PER_ITER) ) asyncio.run(main()) ``` a few runs on MacBook Pro 2.4 GHz 8-Core Intel Core i9 64 GB 2667 MHz DDR4: debug build: ``` ~/work/pyexe/main-dbg ⌚ 9:57:34 $ ./python.exe bench.py [py] current_task: 606.234ns [c] current_task: 104.47699999999993ns ~/work/pyexe/main-dbg ⌚ 9:57:59 $ ./python.exe bench.py [py] current_task: 631.856ns [c] current_task: 110.22500000000002ns ~/work/pyexe/main-dbg ⌚ 9:58:08 $ ./python.exe bench.py [py] current_task: 637.746ns [c] current_task: 105.03899999999999ns ~/work/pyexe/main-dbg ⌚ 9:58:16 $ ./python.exe bench.py [py] current_task: 621.3169999999999ns [c] current_task: 103.01300000000002ns ``` opt build: ``` ~/work/pyexe/main-opt ⌚ 10:33:17 $ ./python.exe bench.py [py] current_task: 128.743ns [c] current_task: 31.997999999999998ns ~/work/pyexe/main-opt ⌚ 10:33:24 $ ./python.exe bench.py [py] current_task: 126.388ns [c] current_task: 32.64599999999998ns ~/work/pyexe/main-opt ⌚ 10:33:26 $ ./python.exe bench.py [py] current_task: 137.053ns [c] current_task: 32.066999999999986ns ~/work/pyexe/main-opt ⌚ 10:33:28 $ ./python.exe bench.py [py] current_task: 131.17800000000003ns [c] current_task: 32.06600000000001ns ```

kumaraditya303 · 2022-12-19T10:43:08Z

Can you post benchmarks with pyperf?

itamaro · 2022-12-19T11:44:50Z

Can you post benchmarks with pyperf?

You mean pyperformance suite with and without the C acceleration?

kumaraditya303 · 2022-12-19T11:53:08Z

You mean pyperformance suite with and without the C acceleration?

No this microbenchmark with pyperf.

bluetech · 2022-12-19T12:19:05Z

As a note of support for making this faster, current_task is used in the asgiref.local implementation, which is used a lot by Django, and it shows up in profiles.

itamaro · 2022-12-19T14:15:31Z

No this microbenchmark with pyperf.

thanks for the clarification :)

I don't know how to isolate the overhead of the event loop when using pyperf, so hope this is helpful:

C implementation:

$ python -m pyperf timeit -s 'from asyncio.tasks import _c_current_task as current_task' -s 'from asyncio import run' -s '
async def main():
  for _ in range(10**6):
    current_task()
' 'run(main())'
.....................
Mean +- std dev: 33.4 ms +- 1.0 ms

Python implementation:

$ python -m pyperf timeit -s 'from asyncio.tasks import _py_current_task as current_task' -s 'from asyncio import run' -s '
async def main():
  for _ in range(10**6):
    current_task()
' 'run(main())'
.....................
Mean +- std dev: 133 ms +- 8 ms

kumaraditya303 · 2022-12-20T10:44:32Z

The numbers looks interesting, it seems to be because there is no fastpath for dict.get like there is for list.append in ceval.

@markshannon Do you have plans to optimize this?

markshannon · 2022-12-20T11:28:20Z

@kumaraditya303 No plans at the moment.

I'd be interested to see how this compared:

def current_task(loop=None):
    """Return a currently executed task."""
    if loop is None:
        loop = events.get_running_loop()
    try:
        return _current_tasks[loop]
    except:
        return None

I assume that current_task() is expected to return a task, not None, most of the time.

itamaro · 2022-12-20T20:08:34Z

I assume that current_task() is expected to return a task, not None, most of the time.

makes sense!

this optimization make the python impl about 40% faster!

$ python -m pyperf timeit -s 'from asyncio.tasks import _py_current_task as current_task' -s 'from asyncio import run' -s '
async def main():
  for _ in range(10**6):
    current_task()
' 'run(main())'
.....................
Mean +- std dev: 81.1 ms +- 4.9 ms

the C impl is still more than 2x faster than this, so maybe do both?

gvanrossum · 2022-12-21T00:57:19Z

Why bother speeding up the Python version if we have the C version? There really aren't any interesting situations where the C accelerator is unavailable (that I know of).

itamaro · 2022-12-21T06:34:09Z

Why bother speeding up the Python version if we have the C version? There really aren't any interesting situations where the C accelerator is unavailable (that I know of).

In my mind it's "why not speed up the Python version?"
I'm not aware of situations where it matters for cpython users, but maybe alternative implementations that use cpython's stdlib it would be valuable?
anyway, I don't feel strongly about it. happy to revert to the existing python implementation if you'd prefer!

gvanrossum · 2022-12-21T21:34:15Z

In my mind it's "why not speed up the Python version?"

Because you're replacing one line of code with a well-known idiom with four lines of code that require the reader to follow carefully what's going on and why. For me, reading the version with .get() is much quicker than the try/except version.

If we wrote hyper-optimized code like that everywhere, even when speed doesn't matter, we'd end up with considerably less readable code. So, in my mind the question very much needs to be "why speed it up".

carljm · 2022-12-22T01:52:02Z

The optimization in the Python version is also very much tuned for the current performance characteristics of the adaptive interpreter (i.e. that dict subscripts are much better optimized than dict method calls.) If alternate Python implementations are the only likely users of the Python implementation, this code change won't necessarily give similar speedups for them.

Co-authored-by: pranavtbhat

itamaro added the type-feature A feature request or enhancement label Dec 19, 2022

bedevere-bot mentioned this issue Dec 19, 2022

gh-100344: Provide C implementation for asyncio.current_task #100345

Merged

kumaraditya303 added performance Performance or resource usage expert-asyncio 3.12 and removed type-feature A feature request or enhancement labels Dec 19, 2022

kumaraditya303 pushed a commit that referenced this issue Dec 22, 2022

gh-100344: Add C implementation for asyncio.current_task (#100345)

4cc63e0

Co-authored-by: pranavtbhat

kumaraditya303 closed this as completed Dec 22, 2022

Provide C implementation for asyncio.current_task #100344

Provide C implementation for asyncio.current_task #100344

itamaro commented Dec 19, 2022 •

edited by bedevere-bot

kumaraditya303 commented Dec 19, 2022

itamaro commented Dec 19, 2022

kumaraditya303 commented Dec 19, 2022

bluetech commented Dec 19, 2022

itamaro commented Dec 19, 2022

kumaraditya303 commented Dec 20, 2022

markshannon commented Dec 20, 2022

itamaro commented Dec 20, 2022

gvanrossum commented Dec 21, 2022

itamaro commented Dec 21, 2022

gvanrossum commented Dec 21, 2022

carljm commented Dec 22, 2022

Provide C implementation for asyncio.current_task #100344

Provide C implementation for asyncio.current_task #100344

Comments

itamaro commented Dec 19, 2022 • edited by bedevere-bot

Feature or enhancement

Pitch

Previous discussion

Linked PRs

kumaraditya303 commented Dec 19, 2022

itamaro commented Dec 19, 2022

kumaraditya303 commented Dec 19, 2022

bluetech commented Dec 19, 2022

itamaro commented Dec 19, 2022

kumaraditya303 commented Dec 20, 2022

markshannon commented Dec 20, 2022

itamaro commented Dec 20, 2022

gvanrossum commented Dec 21, 2022

itamaro commented Dec 21, 2022

gvanrossum commented Dec 21, 2022

carljm commented Dec 22, 2022

itamaro commented Dec 19, 2022 •

edited by bedevere-bot