New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eager task creation API to asyncio #97696
Comments
If I didn't already understand the changes to async functions you are proposing I wouldn't understand what this means (or the code fragment above it). What does Separately -- Can you remind me of the reason to make this a TaskGroup method instead of a new flag on create_task()? And the enqueue() name feels odd, especially since it may immediately execute some of the coroutine, which seems the opposite of putting something in a queue. To be clear, I have no objection to the feature, and I think making this an opt-in part of task creation avoids worries of changing semantics for something as simple as |
This came up in the conversation we had with @DinoV during the Language Summit. We started out with a flag on |
Right. Yeah, this is more contrived than needed, I'll fix up the text.
I wasn't in this conversation but when writing this issue it occurred to me that even with eager execution we could still return something Having said this, I think eager execution should be the default behavior users reach for going forward. So it might still be better to have a new method. |
Thanks for fixing up the description, it's clear now.
It could return a Future, which has many of the same methods. If the coroutine returns an immediate result that could be the Future's result. However, if the result is generally not used though the need to return something with appropriate methods would just reduce the performance benefit.
Yeah, that is definitely an advantage of a new method name. I find Have the performance results been repeated with |
The benchmark showing an 8x improvement (etc.) is with Dino's prototype of
Dino's prototype currently returns a newly constructed
Indeed, I have misspelled it a number of times already. I like |
For a new API, returning either a I hadn't realized the prototype doesn't even need C changes -- I had assumed it depended on the changes to vectorcall to pass through a bit indicating it's being awaited immediately. Is that change even needed once we have this? (Probably we resolved that already during discussions at PyCon, it's just so long ago I can't remember anything.) I think we need to bikeshed some more on the name, but you're right about the impression the name ought to give. Maybe |
I think it'd be better to document it as returning a
That is an interesting question, and again I don't think I was around if that was discussed in detail. For this feature, that flag is not needed. If we want to add an optimization for the large body of existing code that uses
Fine with me. Maybe we should discuss further once a PR is up. |
From a static type POV it's always a Future, not always a Task. If you want to cancel it you take your chances -- cancel() will return a bool telling you whether it worked. The docs should just explain the situation without lying. I will wait for the PR (that you're hopefully working on?). I recommend using |
Whoops, I just realized
We should have something up soon. Really wanted to get an issue open first for early feedback on the plan. |
We had a fruitful discussion on this topic at the core dev sprint. @1st1 and @carljm were present (and others). Yury recommends not adding a new API like proposed here, but instead solving this globally (per event loop). We can have an optional task factory that people can install using The new This requires an extension of the |
I'm working on a prototype implementation in #98137. |
I'm hoping that one of the Instagram folks can work on this contribution? I don't have the bandwidth to finish up that PR -- it was just meant as a quick proof of concept of what we discussed at the sprint. |
Yep, we'll pick that up! Just need to find a moment to look in more detail. |
@jbower-fb Have you ever found the time to look into this more? |
@gvanrossum I have not but I think @itamaro is going to pick this up. Sorry for the confusion, it was a bit unclear last year who was going to have time to do what. |
Cool. Do I need to keep this PR open? |
already pulled that PR locally, so you can close it! probably going to take me a bit to learn enough asyncio internals to get this working and tested |
Let me know (e.g. by posting questions here) if you need any help! |
GH-101613 has a working prototype. still much left to do. TODO items and additional discussion items on the draft PR (let me know if you prefer to keep the discussion on the issue). |
isn't this going to cause issues with structured concurrency objects (those that use eg in async def some_task():
async with asyncio.timeout(1):
await asyncio.Event().wait() # never gets cancelled here
opening a timeout or TaskGroup in the first step of an async function doesn't seem like a rare occurrence |
yes, this is a real problem. one possible solution is to modify would that resolve the issue and be an acceptable solution? |
Not in detail. I think at some point during development I found eager tasks were executing more slowly than I was expecting. When I compared with/without eager execution for the same benchmark I noticed GC routines popping up in Linux I just took another quick look at this for the
So, clearing the coroutine is a significant improvement. The reason I asked what applications you had in mind for the coroutines of finished tasks is I did think about this a bit when I wrote the code. It occurred to me the main things someone might want would be specific/limited metadata about the coroutine, e.g. its (qual)name. These could be captured and stored explicitly by a Task with much less system impact. Keeping around completed coroutines and their completed frames seems a bit sketchy in general. It could cause unbounded amounts of data to avoid GC longer than expected.
Today eagerly completing tasks are an optional feature. The reason for this is they can cause subtle differences in behavior. I would argue that coroutines being cleared on task completion could be one of these differences and we can document it as such. After the feature has been in use publicly for a while we might get some idea of what people want zombie coroutines for and come up with a more efficient way of providing targeted functionality. |
The numbers are hard to trust considering that the stddev is 80ms as pyperf is correctly warning that the benchmark is so unstable but I take it. Anyways since you feel so strongly about this I accept it but with proper documentation and that @gvanrossum agrees. I'll approve the PR and wait for Guido to approve. |
- Document `eager_start` argument to Task constructor - Document usage of `create_eager_task_factory` - Fix docs nit from https://github.com/python/cpython/pull/104140/files#r1186714354
I have proposed documentation improvements in itamaro@ac1ee82 (PR depends on gh-104251 first) |
…ing check based on feedback from python#102853 (review)
- Document `eager_start` argument to Task constructor - Document usage of `create_eager_task_factory` - Fix docs nit from https://github.com/python/cpython/pull/104140/files#r1186714354
…check in `_asyncio` (#104255)
… eager task factory
Remove "#include cpython/context.h"` from `_asynciomodule.c`. It's already included in `Python.h`.
…nning check in `_asyncio` (python#104255)
Instead, add docstring to create_eager_task_factory.
* main: (47 commits) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172) pythongh-103193: Improve `getattr_static` test coverage (python#104286) Trim trailing whitespace and test on CI (python#104275) pythongh-102500: Remove mention of bytes shorthand (python#104281) pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256) pythongh-99108: Replace SHA3 implementation HACL* version (python#103597) pythongh-104273: Remove redundant len() calls in argparse function (python#104274) pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200) pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266) pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244) pythongh-103650: Fix perf maps address format (python#103651) pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243) ...
* main: pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)
What’s left? Is there a what’s new entry? |
I think this is done now! |
* main: (29 commits) pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172) pythongh-103193: Improve `getattr_static` test coverage (python#104286) Trim trailing whitespace and test on CI (python#104275) pythongh-102500: Remove mention of bytes shorthand (python#104281) pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256) pythongh-99108: Replace SHA3 implementation HACL* version (python#103597) ...
* main: pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304) pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441) pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096) pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217) pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312) pythongh-104240: return code unit metadata from codegen (python#104300)
* main: (156 commits) pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304) pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441) pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096) pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217) pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312) pythongh-104240: return code unit metadata from codegen (python#104300) pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) ...
* main: (35 commits) pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304) pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441) pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096) pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217) pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312) pythongh-104240: return code unit metadata from codegen (python#104300) pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277) pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298) pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320) require-pr-label.yml: Add missing "permissions:" (python#104309) pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939) pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) ...
Feature or enhancement
We propose adding “eager” coroutine execution support to
asyncio.TaskGroup
via a new methodenqueue()
[1].TaskGroup.enqueue()
would have the same signature asTaskGroup.create_task()
but eagerly perform the first step of the passedcoroutine
’s execution immediately. If thecoroutine
completes without yielding, the result ofenqueue()
would be an object which behaves like a completedasyncio.Task
. Otherwise,enqueue()
behaves the same asTaskGroup.create_task()
, returning a pendingasyncio.Task
.The reason for a new method, rather than changing the implementation of
TaskGroup.create_task()
is this new method introduces a small semantic difference. For example in:The exception will cancel everthing scheduled in
tg
, but if some or all ofcoro()
completes eagerly any side-effects of this will be observable in further execution. Iftg.create_task()
is used instead no part ofcoro()
will be executed.Pitch
At Instagram we’ve observed ~70% of coroutine instances passed to
asyncio.gather()
can run fully synchronously i.e. without performing any I/O which would suspend execution. This typically happens when there is a local cache which can elide actual I/O. We exploit this in Cinder with a modifiedasyncio.gather()
that eagerly executescoroutine
args and skips scheduling aasyncio.Task
object to an event loop if no yield occurs. Overall this optimization saved ~4% CPU on our Django webservers.In a prototype implementation of this proposed feature [2] the overhead when scheduling
TaskGroup
s with all fully-synchronous coroutines was decreased by ~8x. When scheduling a mixture of synchronous and asynchronouscoroutine
s, performance is improved by ~1.4x, and when nocoroutine
s can complete synchronously there is still a small improvement.We anticipate code relying on any semantics which change between
TaskGroup.create_task()
andTaskGroup.enqueue()
will be rare. So, as the TaskGroup interface is new in 3.11, we hopeenqueue()
and its performance benefits can be promoted as the preferred method for scheduling coroutines in 3.12+.Previous discussion
This new API was discussed informally at PyCon 2022, with at least some of this being between @gvanrossum, @DinoV, @markshannon, and /or @jbower-fb.
[1] The name "
enqueue
" came out of a discussion between @gvanrossum and @DinoV.[2] Prototype implementation (some features missing, e.g. specifying Context), and benchmark.
Linked PRs
The text was updated successfully, but these errors were encountered: