Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio: Use strong references for free-flying tasks #91887

Open
alexhartl opened this issue Apr 24, 2022 · 10 comments
Open

asyncio: Use strong references for free-flying tasks #91887

alexhartl opened this issue Apr 24, 2022 · 10 comments
Labels
expert-asyncio type-bug An unexpected behavior, bug, or error

Comments

@alexhartl
Copy link

In #88831 @vincentbernat pointed out that CPython only keeps weak references in _all_tasks, so a reference to a Task returned by loop.create_task has to be kept to be sure the task will not be killed with a "Task was destroyed but it is pending!" at some random point in time.

When shielding a task from cancellation with await shield(something()), something continues to run when the containing coroutine is cancelled. As soon as that happens, something() is free-flying, i.e. there's no reference from user code anymore. shield itself has a bunch of circular strong references, but these shouldn't keep CPython from garbage-collecting the task. Hence, here the same problem occurs and the task might be killed unpredictably. Additionally, when running coroutines in parallel with gather and return_exceptions=False, an exception in one of the coroutines will leave remaining tasks free-flying. Also in this case, the remaining tasks might be killed unpredictably.

Hence, a warning in the documentation for create_task unfortunately does not suffice to solve the problem. Additionally, it has been brought up in #88831 that an API for fire-and-forget tasks (i.e. when the user doesn't want to keep a reference) would be nice.

As solution, I suggest to either

(1) introduce a further _pending_tasks set to keep strong references to all pending tasks. This would be the simplest solution also with respect to the API. In fact, a lot of dicussions on Stack Overflow (e.g., here, here, here) already rely on this behavior (throwing away the reference returned by create_task), although it's wrong currently. Since the behaviour for free-flying tasks is unpredictable currently, it should not introduce any compatibility issues when making it predictable by preventing them from being garbage-collected.

(2) make sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. A quick grep resulted in potential problems, e.g., here and here. This does not seem like a very robust approach, though.

(3) introduce the concept of background tasks, i.e., tasks the user does not want to hold references to. The interface could look like suggested in #88831 (comment) . Tasks from shield and gather could be automatically converted to such background tasks. Clearly, it would add complexity to the API, but the distinction between normal tasks and background tasks might potentially be beneficial also for other purposes. E.g., one might add an API call that waits for all background tasks to be completed.

My preferred solution would be (1).

@kumaraditya303
Copy link
Contributor

You can use the new asyncio.TaskGroup to avoid this in 3.11+

@bast0006
Copy link

It looks to me like it would be a trivial code change (keeping all active tasks in a separate, non-weak set) to gain a large benefit here.

This behavior is inherited from Futures (where it makes sense). If a future is not being kept track of, it's a bug (the developer forgot to await). However, tasks start executing, if not immediately, on their own, so not awaiting a task is not as obviously a mistake.

Interrupting actively running code because someone didn't maintain a reference (on the gc's terms and timing) is a very different beast than not executing code that the desire and timing of which is unclear (and an exception can be somewhat promptly triggered for).

I'll have time this evening to open a PR, if nobody else wants to get to it.

@bast0006
Copy link

bast0006 commented Feb 11, 2023

For impact perspective, discord.py drops the task after calling create_task on it.

https://github.com/Rapptz/discord.py/blob/742630f1441d4b0b12a5fd9a751ab5cd1b39a5c6/discord/client.py#L499

This means (effectively) every python based discord bot is affected by this, and every event dispatch appears to be technically at the whims of the GC.

@gvanrossum
Copy link
Member

Rather than rhetoric about impact or how trivial the fix would be, we need a discussion on why we designed tasks this way in the first place. It doesn't look to me like it was inherited from Futures -- the variable is called _all_tasks, not _all_futures. Without understanding this, we risk making things worse with a hasty "fix".

That said, I'm not sure why we designed it this way, and (from private email) @1st1 doesn't seem to recall either. But it was definitely designed with some purpose in mind -- the code is quite complex and was updated every time an issue with it was discovered, and we've had many opportunities to change this behavior but instead chose to update the docs (gh-29163), even when asyncio itself suffered (gh-90467).

I also don't understand why the dict of all tasks is a global, since the only API that uses this (all_tasks()) filters out tasks belonging to a different event loop. Again, I assume there's a reason for the complexity, but I don't know what it is. Is it 3rd party event loops like uvloop? Or frameworks like aiohttp?

FWIW if we simply make _all_tasks a plain dict, we would have to arrange for tasks to remove themselves from the list when they exit, possibly in a done-callback (though there's a cost to that). There's already an API to do that (_unregister_task) but it doesn't seem to be called.

@alexhartl
Copy link
Author

Agree that this should be approved by someone who is familiar with the code's design objectives, which is why I brought up two alternatives and did not create a PR right away.
I would not turn _all_tasks into a plain dict (or set), since this might break code that assumes that finished tasks (that have a user reference somewhere) are still returned by all_tasks(). The suggested change is to have an additional _pending_tasks set. Yes, we would need a done-callback, which would introduce a slight performance penalty.
In any case, imho the current behavior is somewhat annoying or misses some fundamental functionality.

@bast0006
Copy link

bast0006 commented Feb 13, 2023

Sorry, I've been doing more reading into what the current state is and other related issues since I wrote that comment and it is absolutely a lot more detailed behind the scenes (like this comment here I only got to today: #80788 (comment) ) which mirrors some of what you've sent here.

I was working with the idea that Tasks were inherited from Futures because that was how they were introduced in the original PEP-3156.

I think I have found the origin of why it was made a weakset in the first place: https://groups.google.com/g/python-tulip/c/13hfgbKrIyY/m/HpwWPGHKT6IJ

Specifically, this patch:
https://codereview.appspot.com/14295043/diff/1/tulip/tasks.py

So it was originally intended to be a registry to help recover stuck tasks and get stack frames for?

@gvanrossum
Copy link
Member

(Sorry, I hit some wrong buttons.)

So it was originally intended to be a registry to help recover stuck tasks and get stack frames for?

But apparently at that point it was already the case that tasks had to be kept alive by their owner -- ISTM a weakset was used specifically to avoid keeping tasks alive longer than necessary.

So this has always been part of the design, it just wasn't made explicit in the docs. I'm still curious why we originally designed it that way. It's possible that we never consciously realized this constraint. It's also possible that, since we did make Task a subclass of Future, we assumed that tasks would always be awaited.

I am still not convinced, despite this being a common stumbling point, that we can fix this without consequences for use code.

@bast0006
Copy link

bast0006 commented Feb 17, 2023

#65362 made this make a little more sense to me.

Tasks can't be dropped if they're actively executing or scheduled to execute (_ready) because there's strong references held by the event loop, and there's a comment that notes a bit about this at the start of the Task class, although I'm not sure if that's intended to apply to execution only or for reference management.

Tasks only appear to be GCable if they're blocking on another future which ends up being GCable itself. If the blocking future is _ready/executing, then it, also, is kept alive through the above invariant, and there's no chance of losing the dependent task.

It makes sense to error and garbage collect if the chain is broken, because we've got a "proven" memory leak (the task cannot be woken again if it's dependent future is unreachable).

@alexhartl
Copy link
Author

Yes, it is possible to fix this by making sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. But I have a feeling that this would be a somewhat fragile approach, both considering future changes in the stdlib and futures that a user creates. A year ago I greped asyncio's source and found two spots where weak references were used that might cause problems. Don't know if those have been fixed/modified meanwhile.

Haven't thought of it like causing a provable memory leak, though. Good point.

@alexhartl
Copy link
Author

So how about adding a keyword argument to create_task that allows adding the task to a _pending_tasks set? We could add a note in the documentation warning about potential memory leaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expert-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: In Progress
Development

No branches or pull requests

5 participants