New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-46771: Implement asyncio context managers for handling timeouts #31394
base: main
Are you sure you want to change the base?
Conversation
@Tinche this is the first sketch.
It has no tests yet, sorry.
The idea is: let's use CancelScope class.
The class has a state.
If timeout occurs, the state is switched to cancelled
and task.cancel()
is called.
The fact of timeout occurrence is detected by the state check (self._state == _State.CANCELLED
).
The explicit state machine is very useful, it simplifies both designing and debugging.
move_on_*
family is renamed with cancel_scope
+ cancel_spoce_at
.
fail_*
family is renamed with timeout
+ timeout_at
Sync context managers are used to visually guarantee no suspension points.
I didn't implement CancelScope.cancel()
because it is barely equal to task.cancel()
. Can be added if we decide to support this.
Ironically, the cancellation message is not needed for implementation because the cancellation flag is set by call_at()
callback. If something swallows raised CancelledError -- the flag is still set. We can add two flags actually: cancellation was raised and caught; I'm not sure if it is really needed. If we decide 'yes' .cancelling()
flag can be added along with '.cancelled()'.
I'll try to add tests later.
Sorry, tomorrow I'll be traveling and inactive most likely.
@gvanrossum you may be interested in this preliminary design. It is a mix of trio-styled cancellation scopes and asyncio specific.
The code is pretty simple and understandable, at least from my point of view.
Sure, need to write a comprehensive test suite.
I'll gather tests from both async-timeout and quattro, plus add tests for mentioned edge cases.
@asvetlov Thanks for starting this, I will dedicate some time over the weekend. How can I contribute tests to this though? You invited me to asvetlov/cpython, but this PR is from python/cpython. Sorry, I have not done this before :) |
Clone my repo locally BTW, I use GitHub CLI (https://cli.github.com/), it allows to checkout forks very easily. Click 'copy-to-clipboard' button right to the branch name. The tool setups remote, checkouts branch, and switches to it. |
Sorry, I was over-optimistic :( |
Thanks for the explanation. The asvetlov repo has no Can |
@asvetlov Ok, for the discussion about more sophisticated nonce handling. Would you like me to do a writeup here, or open an issue on the asvetlov repo? I will try to explain the issue and a proposed solution. You were right saying on the mailing list that asyncio doesn't have priorities, but I think with a little logic we can do better and maybe save our users from a potential footgun. |
Can you add a test that can only be fixed with a nonce? I'd like to take a look, because I'm still -1 on the nonce idea. We should figure out another way. |
I'm not sure if the discussion about cancel edge cases should move here, but my comments on that so far are:
|
@1st1 Here's a test that fails without nonces: async def test_nested_timeouts_concurrent(self):
with self.assertRaises(TimeoutError):
with asyncio.timeout(0.002):
try:
with asyncio.timeout(0.003):
# Pretend we crunch some numbers.
time.sleep(0.005)
await asyncio.sleep(1)
except asyncio.TimeoutError:
pass Both timeouts are marked as cancelled, but the inner timeout swallows the cancellation for the outer timeout. So the outer timeout never triggers. |
@Tinche up to you. |
Thinking about what happens here: When we reach the The expectation is that the outer cancel scope should also raise TimeoutError, because its deadline is also exceeded. So perhaps the outer But it looks like there's a variant of the example that isn't fixed by such a check -- for example, we could add await asyncio.sleep(2) after the try/except block (or even in the except clause), inside the outer cancel scope. Since the CancelledError exception has been wholly swallowed by the inner I'm guessing the fix for that would be to use a nonce (either using an explicit nonce API or via the cancel message). However, that still depends on the order in which the callbacks run. With the new cancel semantics (where extra cancels are ignored) whoever runs first wins, while with the old cancel semantics (where the most recent cancel gets to set the cancel message / nonce) whoever runs last wins -- but we don't want to rely on the order in which the callbacks run (since I have no idea in which order the current implementation runs them, given that they both become "ready" simultaneously, and we shouldn't depend on that). So that's why Tin is proposing to use timestamps (or a cancel stack, or some other mechanism that lets us arbitrate between the two cancel calls). Note that this case is slightly different from the "web server cancels" problem, which was solvable without resorting to timestamps (the cancel scope callback would have to check whether the task was already being cancelled, using t.cancelling()). I'm not sure yet what the right solution would be, but I'm glad that we're thingking about this scenario carefully. |
This comment has been minimized.
This comment has been minimized.
@gvanrossum thanks for the great write up, I believe you hit the nail on the head. I think to get this 100% correct, we need a way to throw the highest priority cancellation into the task on its iteration. The highest priority is the cancel scope that was entered first. It doesn't have to be a timestamp, we could have a global (or thread-local, but that's overthinking it) counter that the cancel scope takes from on The cancellation data could also be in a context var instead of on the CancelledError, I suppose? Would that be cleaner? |
I have a new proposal. Thinking about @asvetlov's diagrams some more, I think we can make everything work if the cancel state of a task kept a count of cancellations. This would replace the Now any context manager cleanup handler, like The only thing that's painful with this scheme is how to handle cancel messages. Hoever, (this is for @cjerdonek to disagree with :-) I think cancel messages were a bad idea to begin with -- if I read the bpo where it started, it came from a user who misunderstood how to print exceptions. It seems they are not always preserved regardless. I believe that any scheme that allows for multiple pending cancellations will have a hard time to keep track of which cancel message should be allowed to bubbled out. I'm guessing it could be solved by making |
@gvanrossum what happens to the count when the exception is thrown? Does it reset to 0 (I guess this would have to happen after the entire iteration is done) or does it stay, and only If it stays, I think essentially you have level-triggered cancellation then. Which I like, but it would be a compatibility break, right? |
It stays, but this is not the flag that causes the exception to be thrown -- for that we have So I don't think it's quite the same as level-triggered, IIUC (that would keep interrupting every await governed by the cancel scope until its |
Cool, I'm on board with this proposal then. (I got a little confused on Your understanding of level-based cancellation is correct. In practice it helps remove a few footguns, especially around |
Sure, go ahead and send a draft PR! |
Got a working version here: #31434. |
My intention is to first get the cancel counts PR agreed upon and landed, and then timeouts -- so the latter can depend on the former. (I haven't found your new cancel count PR yet -- can you link to it when you have it?) |
I'm still not fully convinced that the cancel counter system works in every imaginable scenario. I will attempt to poke holes in it by constructing an extremely pathological scenario, either tonight or tomorrow. If I can't make it fail, I'll give the thumbs up. |
@gvanrossum @asvetlov #31508 here we go, let me know if we want a different bpo in the PR title. |
Looks like there had been a misunderstanding, the new PR is here: #31513. |
Hey @asvetlov, did you just git rebase instead of git merge? |
It's taken me a while but I've been looking at cancel counters and I was not able to make them fail when all the involved context manager support uncancellation properly. There are other considerations, however, which make me still favor semantic changes to the current main branch. To understand everything perfectly I would need to know the exact motivations for adding uncancellation to tasks in the first place – what do task groups use it for (asking sincerely)? I looked at the code but I wasn't really sure. |
Suppose the task group user does something like this: async def parent():
async with asyncio.TaskGroup() as g:
g.create_task(some_coro())
await asyncio.sleep(1)
print("Done") This will normally print "Done" after the task Several special cases exist:
All these events will cause
All in all these lines are crucial -- here we call Now, instead of calling The loser here is cancel messages -- but the winner is simplicity of the scheme, and I think deprecating cancel messages will improve simplicity again over the long run. |
What you just said echoes the need for a cancel counter/scope scheme, but not for the cancellation state changes. All of this cancel stuff could be done on a per-iteration basis. We just need to move the counter/scope logic to Now, as for the differences between cancel counters and tokens/nonces, I didn't find anything that would favor one solution over the other, so I'll be content with counters so long as we move the logic to be per-iteration. |
I am just guessing that when you're pleading for a "per-iteration" scheme you're talking about the two lines added to And the counter-example you use is something like try:
async with timeout(A):
try:
asyncio.sleep(1)
finally:
async with timeout(B):
asyncio.sleep(2) Where if timeout(A) cancels sleep(1), timeout(B) is powerless to interrupt sleep(2). Right? |
As I recall, the backwards incompatible change that was made in
Exactly. |
When I remove the two offending lines from So that suggests that task groups don't actually depend on this behavior. I would like to wait for @asvetlov to respond to the counter-example. IIUC he's got more important stuff to worry about for now. |
Also from the _asyncio C accelerator module, and adjust one test that the change caused to fail. For more discussion see the discussion starting here: #31394 (comment) (Basically, @asvetlov proposed to return False from cancel() when there is already a pending cancellation, and I went along, even though it wasn't necessary for the task group implementation, and @agronholm has come up with a counterexample that fails because of this change. So now I'm changing it back to the old semantics (but still bumping the counter) until we can have a proper discussion about this.)
@agronholm, @Tinche, @asvetlov, @1st1 -- I've submitted a PR (#31623) that restores the old semantics of Since @asvetlov is in a war zone we should be patient. (Andrew, I hope you and your family are safe!) |
Also from the _asyncio C accelerator module, and adjust one test that the change caused to fail. For more discussion see the discussion starting here: #31394 (comment) (Basically, @asvetlov proposed to return False from cancel() when there is already a pending cancellation, and I went along, even though it wasn't necessary for the task group implementation, and @agronholm has come up with a counterexample that fails because of this change. So now I'm changing it back to the old semantics (but still bumping the counter) until we can have a proper discussion about this.)
The PR is ready. P.S. My family and I are good, thanks. |
This looks good. I have left some comments, will do a full review by tomorrow. Thanks
A few nits. I'm getting some local test failures on Windows, will investigate later.
import enum | ||
|
||
from types import TracebackType | ||
from typing import final, Any, Dict, Optional, Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing import final, Any, Dict, Optional, Type | |
from typing import final, Optional, Type |
self._state = _State.CREATED | ||
|
||
self._timeout_handler: Optional[events.TimerHandle] = None | ||
self._task: Optional[tasks.Task[Any]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._task: Optional[tasks.Task[Any]] = None | |
self._task: Optional[tasks.Task] = None |
I got a complaint from pyright (VS Code + pylance + pyright in strict mode) because Task is generic in typeshed but not in the stdlib, and this being part of the stdlib it resolves to the stdlib Task. We can make it slightly more happy by leaving out the [Any]
-- that's the default anyways.
def __repr__(self) -> str: | ||
info = [''] | ||
if self._state is _State.ENTERED: | ||
info.append(f"when={self._when:.3f}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash if when
is None. Test:
async with asyncio.timeout(None) as t:
print(repr(t))
def timeout_at(when: Optional[float]) -> Timeout: | ||
"""Schedule the timeout at absolute time. | ||
when argument points on the time in the same clock system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when argument points on the time in the same clock system | |
Like `timeout() but argument gives absolute time in the same clock system |
Useful in cases when you want to apply timeout logic around block | ||
of code or in cases when asyncio.wait_for is not suitable. For example: | ||
>>> with timeout(10): # 10 seconds timeout | ||
... await long_running_task() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain that long_running_task()
will receive CancelledError
, but the context manager will raise TimeoutError
.
Useful in cases when you want to apply timeout logic around block | ||
of code or in cases when asyncio.wait_for is not suitable. For example: | ||
>>> with timeout(10): # 10 seconds timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> with timeout(10): # 10 seconds timeout | |
>>> async with timeout(10): # 10 seconds timeout |
|
||
|
||
def timeout(delay: Optional[float]) -> Timeout: | ||
"""timeout context manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""timeout context manager. | |
"""timeout async context manager. |
Never mind about the Windows errors, they were because I hadn't recompiled the _asyncio
extension. Sorry for the false alarm.
Here are a lot of nits about the tests. Let me know if you want me to just fix everything and I will.
"The only topmost timed out context manager " | ||
"raises TimeoutError" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message confused me. Maybe "Only the topmost context manager should raise TimeoutError"?
t1 = loop.time() | ||
|
||
self.assertFalse(cm.expired()) | ||
# finised fast. Very busy CI box requires high enough limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the first line of this comment missing? (Also, s/finised/finished/g
.)
async def test_timeout_at_disabled(self): | ||
loop = asyncio.get_running_loop() | ||
t0 = loop.time() | ||
async with asyncio.timeout(None) as cm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean
async with asyncio.timeout(None) as cm: | |
async with asyncio.timeout_at(None) as cm: |
async def test_foreign_cancel_doesnt_timeout_if_not_expired(self): | ||
with self.assertRaises(asyncio.CancelledError): | ||
async with asyncio.timeout(10) as cm: | ||
raise asyncio.CancelledError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would be more convincing if it used an actual cancel() call.
raise asyncio.CancelledError | |
asyncio.current_task().cancel() | |
await asyncio.sleep(0.01) |
assert has_timeout | ||
assert not task.cancelled() | ||
assert task.done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use self.assertTrue()
?
except asyncio.TimeoutError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use assertRaises? That way you'd know if the TimeoutError didn't happen.
After the inner timeout is an expensive operation which should | ||
be stopped by the outer timeout. | ||
Note: this fails for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passes now. :-)
async with asyncio.timeout(0.001): | ||
# Pretend the loop is busy for a while. | ||
time.sleep(0.010) | ||
await asyncio.sleep(0.001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For robustness maybe sleep(1)? It'll get interrupted right away anyways.
except asyncio.TimeoutError: | ||
# This sleep should be interrupted. | ||
await asyncio.sleep(10) | ||
except asyncio.TimeoutError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, assertRaises?
Note: this fails for now. | ||
""" | ||
start = time.perf_counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why perf_counter instead of loop.time()?
I'd like to add this test: async def test_nested_timeout_in_finally(self):
with self.assertRaises(TimeoutError):
async with asyncio.timeout(0.01):
try:
await asyncio.sleep(1)
finally:
with self.assertRaises(TimeoutError):
async with asyncio.timeout(0.01):
await asyncio.sleep(10) This is the counterexample from above. But! If the inner timeout() doesn't interrupt the sleep() it'll still appear to pass! It'll raise AssertionError from the inner assertRaises(), then the outer timeout(0.01) will catch that and raise TimeoutError on top of it, which is approved by the outer assertRaises(). I think this means that our
|
https://bugs.python.org/issue46771