New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio: Use strong references for free-flying tasks #91887
Comments
You can use the new |
It looks to me like it would be a trivial code change (keeping all active tasks in a separate, non-weak set) to gain a large benefit here. This behavior is inherited from Futures (where it makes sense). If a future is not being kept track of, it's a bug (the developer forgot to await). However, tasks start executing, if not immediately, on their own, so not awaiting a task is not as obviously a mistake. Interrupting actively running code because someone didn't maintain a reference (on the gc's terms and timing) is a very different beast than not executing code that the desire and timing of which is unclear (and an exception can be somewhat promptly triggered for). I'll have time this evening to open a PR, if nobody else wants to get to it. |
For impact perspective, discord.py drops the task after calling create_task on it. This means (effectively) every python based discord bot is affected by this, and every event dispatch appears to be technically at the whims of the GC. |
Rather than rhetoric about impact or how trivial the fix would be, we need a discussion on why we designed tasks this way in the first place. It doesn't look to me like it was inherited from Futures -- the variable is called That said, I'm not sure why we designed it this way, and (from private email) @1st1 doesn't seem to recall either. But it was definitely designed with some purpose in mind -- the code is quite complex and was updated every time an issue with it was discovered, and we've had many opportunities to change this behavior but instead chose to update the docs (gh-29163), even when asyncio itself suffered (gh-90467). I also don't understand why the dict of all tasks is a global, since the only API that uses this ( FWIW if we simply make |
Agree that this should be approved by someone who is familiar with the code's design objectives, which is why I brought up two alternatives and did not create a PR right away. |
Sorry, I've been doing more reading into what the current state is and other related issues since I wrote that comment and it is absolutely a lot more detailed behind the scenes (like this comment here I only got to today: #80788 (comment) ) which mirrors some of what you've sent here. I was working with the idea that Tasks were inherited from Futures because that was how they were introduced in the original PEP-3156. I think I have found the origin of why it was made a weakset in the first place: https://groups.google.com/g/python-tulip/c/13hfgbKrIyY/m/HpwWPGHKT6IJ Specifically, this patch: So it was originally intended to be a registry to help recover stuck tasks and get stack frames for? |
(Sorry, I hit some wrong buttons.)
But apparently at that point it was already the case that tasks had to be kept alive by their owner -- ISTM a weakset was used specifically to avoid keeping tasks alive longer than necessary. So this has always been part of the design, it just wasn't made explicit in the docs. I'm still curious why we originally designed it that way. It's possible that we never consciously realized this constraint. It's also possible that, since we did make Task a subclass of Future, we assumed that tasks would always be awaited. I am still not convinced, despite this being a common stumbling point, that we can fix this without consequences for use code. |
#65362 made this make a little more sense to me. Tasks can't be dropped if they're actively executing or scheduled to execute (_ready) because there's strong references held by the event loop, and there's a comment that notes a bit about this at the start of the Task class, although I'm not sure if that's intended to apply to execution only or for reference management. Tasks only appear to be GCable if they're blocking on another future which ends up being GCable itself. If the blocking future is _ready/executing, then it, also, is kept alive through the above invariant, and there's no chance of losing the dependent task. It makes sense to error and garbage collect if the chain is broken, because we've got a "proven" memory leak (the task cannot be woken again if it's dependent future is unreachable). |
Yes, it is possible to fix this by making sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. But I have a feeling that this would be a somewhat fragile approach, both considering future changes in the stdlib and futures that a user creates. A year ago I Haven't thought of it like causing a provable memory leak, though. Good point. |
So how about adding a keyword argument to |
In #88831 @vincentbernat pointed out that CPython only keeps weak references in
_all_tasks
, so a reference to aTask
returned byloop.create_task
has to be kept to be sure the task will not be killed with a "Task was destroyed but it is pending!" at some random point in time.When shielding a task from cancellation with
await shield(something())
,something
continues to run when the containing coroutine is cancelled. As soon as that happens,something()
is free-flying, i.e. there's no reference from user code anymore.shield
itself has a bunch of circular strong references, but these shouldn't keep CPython from garbage-collecting the task. Hence, here the same problem occurs and the task might be killed unpredictably. Additionally, when running coroutines in parallel withgather
andreturn_exceptions=False
, an exception in one of the coroutines will leave remaining tasks free-flying. Also in this case, the remaining tasks might be killed unpredictably.Hence, a warning in the documentation for
create_task
unfortunately does not suffice to solve the problem. Additionally, it has been brought up in #88831 that an API for fire-and-forget tasks (i.e. when the user doesn't want to keep a reference) would be nice.As solution, I suggest to either
(1) introduce a further
_pending_tasks
set to keep strong references to all pending tasks. This would be the simplest solution also with respect to the API. In fact, a lot of dicussions on Stack Overflow (e.g., here, here, here) already rely on this behavior (throwing away the reference returned bycreate_task
), although it's wrong currently. Since the behaviour for free-flying tasks is unpredictable currently, it should not introduce any compatibility issues when making it predictable by preventing them from being garbage-collected.(2) make sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. A quick
grep
resulted in potential problems, e.g., here and here. This does not seem like a very robust approach, though.(3) introduce the concept of background tasks, i.e., tasks the user does not want to hold references to. The interface could look like suggested in #88831 (comment) . Tasks from
shield
andgather
could be automatically converted to such background tasks. Clearly, it would add complexity to the API, but the distinction between normal tasks and background tasks might potentially be beneficial also for other purposes. E.g., one might add an API call that waits for all background tasks to be completed.My preferred solution would be (1).
The text was updated successfully, but these errors were encountered: