asyncio: Use strong references for free-flying tasks #91887

alexhartl · 2022-04-24T19:25:49Z

In #88831 @vincentbernat pointed out that CPython only keeps weak references in _all_tasks, so a reference to a Task returned by loop.create_task has to be kept to be sure the task will not be killed with a "Task was destroyed but it is pending!" at some random point in time.

When shielding a task from cancellation with await shield(something()), something continues to run when the containing coroutine is cancelled. As soon as that happens, something() is free-flying, i.e. there's no reference from user code anymore. shield itself has a bunch of circular strong references, but these shouldn't keep CPython from garbage-collecting the task. Hence, here the same problem occurs and the task might be killed unpredictably. Additionally, when running coroutines in parallel with gather and return_exceptions=False, an exception in one of the coroutines will leave remaining tasks free-flying. Also in this case, the remaining tasks might be killed unpredictably.

Hence, a warning in the documentation for create_task unfortunately does not suffice to solve the problem. Additionally, it has been brought up in #88831 that an API for fire-and-forget tasks (i.e. when the user doesn't want to keep a reference) would be nice.

As solution, I suggest to either

(1) introduce a further _pending_tasks set to keep strong references to all pending tasks. This would be the simplest solution also with respect to the API. In fact, a lot of dicussions on Stack Overflow (e.g., here, here, here) already rely on this behavior (throwing away the reference returned by create_task), although it's wrong currently. Since the behaviour for free-flying tasks is unpredictable currently, it should not introduce any compatibility issues when making it predictable by preventing them from being garbage-collected.

(2) make sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. A quick grep resulted in potential problems, e.g., here and here. This does not seem like a very robust approach, though.

(3) introduce the concept of background tasks, i.e., tasks the user does not want to hold references to. The interface could look like suggested in #88831 (comment) . Tasks from shield and gather could be automatically converted to such background tasks. Clearly, it would add complexity to the API, but the distinction between normal tasks and background tasks might potentially be beneficial also for other purposes. E.g., one might add an API call that waits for all background tasks to be completed.

My preferred solution would be (1).

The text was updated successfully, but these errors were encountered:

kumaraditya303 · 2022-07-12T09:38:54Z

You can use the new asyncio.TaskGroup to avoid this in 3.11+

bast0006 · 2023-02-11T22:14:03Z

It looks to me like it would be a trivial code change (keeping all active tasks in a separate, non-weak set) to gain a large benefit here.

This behavior is inherited from Futures (where it makes sense). If a future is not being kept track of, it's a bug (the developer forgot to await). However, tasks start executing, if not immediately, on their own, so not awaiting a task is not as obviously a mistake.

Interrupting actively running code because someone didn't maintain a reference (on the gc's terms and timing) is a very different beast than not executing code that the desire and timing of which is unclear (and an exception can be somewhat promptly triggered for).

I'll have time this evening to open a PR, if nobody else wants to get to it.

bast0006 · 2023-02-11T23:03:23Z

For impact perspective, discord.py drops the task after calling create_task on it.

https://github.com/Rapptz/discord.py/blob/742630f1441d4b0b12a5fd9a751ab5cd1b39a5c6/discord/client.py#L499

This means (effectively) every python based discord bot is affected by this, and every event dispatch appears to be technically at the whims of the GC.

gvanrossum · 2023-02-13T05:31:54Z

Rather than rhetoric about impact or how trivial the fix would be, we need a discussion on why we designed tasks this way in the first place. It doesn't look to me like it was inherited from Futures -- the variable is called _all_tasks, not _all_futures. Without understanding this, we risk making things worse with a hasty "fix".

That said, I'm not sure why we designed it this way, and (from private email) @1st1 doesn't seem to recall either. But it was definitely designed with some purpose in mind -- the code is quite complex and was updated every time an issue with it was discovered, and we've had many opportunities to change this behavior but instead chose to update the docs (gh-29163), even when asyncio itself suffered (gh-90467).

I also don't understand why the dict of all tasks is a global, since the only API that uses this (all_tasks()) filters out tasks belonging to a different event loop. Again, I assume there's a reason for the complexity, but I don't know what it is. Is it 3rd party event loops like uvloop? Or frameworks like aiohttp?

FWIW if we simply make _all_tasks a plain dict, we would have to arrange for tasks to remove themselves from the list when they exit, possibly in a done-callback (though there's a cost to that). There's already an API to do that (_unregister_task) but it doesn't seem to be called.

alexhartl · 2023-02-13T09:14:38Z

Agree that this should be approved by someone who is familiar with the code's design objectives, which is why I brought up two alternatives and did not create a PR right away.
I would not turn _all_tasks into a plain dict (or set), since this might break code that assumes that finished tasks (that have a user reference somewhere) are still returned by all_tasks(). The suggested change is to have an additional _pending_tasks set. Yes, we would need a done-callback, which would introduce a slight performance penalty.
In any case, imho the current behavior is somewhat annoying or misses some fundamental functionality.

bast0006 · 2023-02-13T09:14:49Z

Sorry, I've been doing more reading into what the current state is and other related issues since I wrote that comment and it is absolutely a lot more detailed behind the scenes (like this comment here I only got to today: #80788 (comment) ) which mirrors some of what you've sent here.

I was working with the idea that Tasks were inherited from Futures because that was how they were introduced in the original PEP-3156.

I think I have found the origin of why it was made a weakset in the first place: https://groups.google.com/g/python-tulip/c/13hfgbKrIyY/m/HpwWPGHKT6IJ

Specifically, this patch:
https://codereview.appspot.com/14295043/diff/1/tulip/tasks.py

So it was originally intended to be a registry to help recover stuck tasks and get stack frames for?

gvanrossum · 2023-02-15T18:35:40Z

(Sorry, I hit some wrong buttons.)

So it was originally intended to be a registry to help recover stuck tasks and get stack frames for?

But apparently at that point it was already the case that tasks had to be kept alive by their owner -- ISTM a weakset was used specifically to avoid keeping tasks alive longer than necessary.

So this has always been part of the design, it just wasn't made explicit in the docs. I'm still curious why we originally designed it that way. It's possible that we never consciously realized this constraint. It's also possible that, since we did make Task a subclass of Future, we assumed that tasks would always be awaited.

I am still not convinced, despite this being a common stumbling point, that we can fix this without consequences for use code.

bast0006 · 2023-02-17T15:36:08Z

#65362 made this make a little more sense to me.

Tasks can't be dropped if they're actively executing or scheduled to execute (_ready) because there's strong references held by the event loop, and there's a comment that notes a bit about this at the start of the Task class, although I'm not sure if that's intended to apply to execution only or for reference management.

Tasks only appear to be GCable if they're blocking on another future which ends up being GCable itself. If the blocking future is _ready/executing, then it, also, is kept alive through the above invariant, and there's no chance of losing the dependent task.

It makes sense to error and garbage collect if the chain is broken, because we've got a "proven" memory leak (the task cannot be woken again if it's dependent future is unreachable).

alexhartl · 2023-02-18T09:22:36Z

Yes, it is possible to fix this by making sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. But I have a feeling that this would be a somewhat fragile approach, both considering future changes in the stdlib and futures that a user creates. A year ago I greped asyncio's source and found two spots where weak references were used that might cause problems. Don't know if those have been fixed/modified meanwhile.

Haven't thought of it like causing a provable memory leak, though. Good point.

alexhartl · 2023-03-03T13:42:42Z

So how about adding a keyword argument to create_task that allows adding the task to a _pending_tasks set? We could add a note in the documentation warning about potential memory leaks.

alexhartl added the type-bug An unexpected behavior, bug, or error label Apr 24, 2022

AlexWaygood added the expert-asyncio label Apr 24, 2022

agrommek mentioned this issue May 26, 2022

gh-88831: In docs for asyncio.create_task, explain why strong references to tasks are needed #93258

Merged

ProgramRipper mentioned this issue Jun 11, 2022

fix: prevent undone task be killed by gc GraiaProject/BroadcastControl#48

Merged

gvanrossum closed this as completed Feb 15, 2023

gvanrossum reopened this Feb 15, 2023

asyncio: Use strong references for free-flying tasks #91887

asyncio: Use strong references for free-flying tasks #91887

alexhartl commented Apr 24, 2022

kumaraditya303 commented Jul 12, 2022

bast0006 commented Feb 11, 2023

bast0006 commented Feb 11, 2023 •

edited

gvanrossum commented Feb 13, 2023

alexhartl commented Feb 13, 2023

bast0006 commented Feb 13, 2023 •

edited

gvanrossum commented Feb 15, 2023

bast0006 commented Feb 17, 2023 •

edited

alexhartl commented Feb 18, 2023

alexhartl commented Mar 3, 2023

asyncio: Use strong references for free-flying tasks #91887

asyncio: Use strong references for free-flying tasks #91887

Comments

alexhartl commented Apr 24, 2022

kumaraditya303 commented Jul 12, 2022

bast0006 commented Feb 11, 2023

bast0006 commented Feb 11, 2023 • edited

gvanrossum commented Feb 13, 2023

alexhartl commented Feb 13, 2023

bast0006 commented Feb 13, 2023 • edited

gvanrossum commented Feb 15, 2023

bast0006 commented Feb 17, 2023 • edited

alexhartl commented Feb 18, 2023

alexhartl commented Mar 3, 2023

bast0006 commented Feb 11, 2023 •

edited

bast0006 commented Feb 13, 2023 •

edited

bast0006 commented Feb 17, 2023 •

edited