New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide C implementation for asyncio.current_task #100344
Comments
Implementing it in C makes it about 4x-6x faster Microbenchmark: ``` # bench.py import time import timeit import asyncio ITERS: int = 10**6 NANO: int = 10**9 NANO_PER_ITER: float = NANO / ITERS async def main(): # avoid load attr noise py_current_task = asyncio.tasks._py_current_task c_current_task = asyncio.tasks._c_current_task asyncio.current_task() # warmup py_current_task() # warmup c_current_task() # warmup print( "current_task: {}ns".format(timeit.timeit(py_current_task, number=ITERS, timer=time.process_time) * NANO_PER_ITER) ) print( "current_task: {}ns".format(timeit.timeit(c_current_task, number=ITERS, timer=time.process_time) * NANO_PER_ITER) ) asyncio.run(main()) ``` a few runs on MacBook Pro 2.4 GHz 8-Core Intel Core i9 64 GB 2667 MHz DDR4: debug build: ``` ~/work/pyexe/main-dbg⌚ 9:57:34 $ ./python.exe bench.py [py] current_task: 606.234ns [c] current_task: 104.47699999999993ns ~/work/pyexe/main-dbg⌚ 9:57:59 $ ./python.exe bench.py [py] current_task: 631.856ns [c] current_task: 110.22500000000002ns ~/work/pyexe/main-dbg⌚ 9:58:08 $ ./python.exe bench.py [py] current_task: 637.746ns [c] current_task: 105.03899999999999ns ~/work/pyexe/main-dbg⌚ 9:58:16 $ ./python.exe bench.py [py] current_task: 621.3169999999999ns [c] current_task: 103.01300000000002ns ``` opt build: ``` ~/work/pyexe/main-opt⌚ 10:33:17 $ ./python.exe bench.py [py] current_task: 128.743ns [c] current_task: 31.997999999999998ns ~/work/pyexe/main-opt⌚ 10:33:24 $ ./python.exe bench.py [py] current_task: 126.388ns [c] current_task: 32.64599999999998ns ~/work/pyexe/main-opt⌚ 10:33:26 $ ./python.exe bench.py [py] current_task: 137.053ns [c] current_task: 32.066999999999986ns ~/work/pyexe/main-opt⌚ 10:33:28 $ ./python.exe bench.py [py] current_task: 131.17800000000003ns [c] current_task: 32.06600000000001ns ```
Can you post benchmarks with pyperf? |
You mean pyperformance suite with and without the C acceleration? |
No this microbenchmark with pyperf. |
As a note of support for making this faster, |
thanks for the clarification :) I don't know how to isolate the overhead of the event loop when using pyperf, so hope this is helpful: C implementation:
Python implementation:
|
The numbers looks interesting, it seems to be because there is no fastpath for @markshannon Do you have plans to optimize this? |
@kumaraditya303 No plans at the moment. I'd be interested to see how this compared: def current_task(loop=None):
"""Return a currently executed task."""
if loop is None:
loop = events.get_running_loop()
try:
return _current_tasks[loop]
except:
return None I assume that |
makes sense! this optimization make the python impl about 40% faster!
the C impl is still more than 2x faster than this, so maybe do both? |
Why bother speeding up the Python version if we have the C version? There really aren't any interesting situations where the C accelerator is unavailable (that I know of). |
In my mind it's "why not speed up the Python version?" |
Because you're replacing one line of code with a well-known idiom with four lines of code that require the reader to follow carefully what's going on and why. For me, reading the version with If we wrote hyper-optimized code like that everywhere, even when speed doesn't matter, we'd end up with considerably less readable code. So, in my mind the question very much needs to be "why speed it up". |
The optimization in the Python version is also very much tuned for the current performance characteristics of the adaptive interpreter (i.e. that dict subscripts are much better optimized than dict method calls.) If alternate Python implementations are the only likely users of the Python implementation, this code change won't necessarily give similar speedups for them. |
itamaro commentedDec 19, 2022
•
edited by bedevere-bot
Feature or enhancement
By providing a C implementation for
asyncio.current_task
, its performance can be improved.Pitch
Performance improvement.
From Instagram profiling data, we've found that this function is called frequently, and a C implementation (in Cinder 3.8) showed more than 4x speedup in a microbenchmark.
Previous discussion
N/A
Linked PRs
The text was updated successfully, but these errors were encountered: