-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-88241: Add asyncio.stalled audit hook #25990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1866,19 +1866,19 @@ def _run_once(self): | |
handle = self._ready.popleft() | ||
if handle._cancelled: | ||
continue | ||
if self._debug: | ||
try: | ||
self._current_handle = handle | ||
t0 = self.time() | ||
handle._run() | ||
dt = self.time() - t0 | ||
if dt >= self.slow_callback_duration: | ||
logger.warning('Executing %s took %.3f seconds', | ||
_format_handle(handle), dt) | ||
finally: | ||
self._current_handle = None | ||
else: | ||
try: | ||
self._current_handle = handle | ||
t0 = self.time() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not familiar enough to know whether these are okay to do even when not debugging, particularly the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The report should be in debug mode only, sure.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a better way of detecting a stall? I did a quick bit of bechmarking and a raw, non-lookup'd call to A 100ns overhead per callback leaves a lot of breathing room. |
||
handle._run() | ||
dt = self.time() - t0 | ||
if dt >= self.slow_callback_duration: | ||
formatted_handle = _format_handle(handle) | ||
sys.audit("asyncio.stalled", dt, handle, formatted_handle, self) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why all three arguments (handle, formatted handle, and loop) are passed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. It might also help to have an event at the point where a handle is created, so there's another event to correlate with, but you certainly don't want to be eagerly formatting anything. Assume the audit hook is merely There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Calculating the handle callback execution time seems really quite cheap, unless I'm missing something. In fact it seems it would be far cheaper to calculate the time difference and invoke an audit hook once rather than invoke an audit hook twice:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. People who care about maximum performance won't have any audit hooks (or they'll have native ones, which are considerable quicker than pure Python ones), so it's not Still, I'd rather audit events be used for unusual/unexpected events rather than tracing. If this is a tight loop operation where 0.1us is important enough for performance to override functionality concerns, then we don't want audit events being raised that frequently. If it's fairly uncommon then may as well do the calculation and only raise relevant events to keep the noise down. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd argue that an event loop stall should be both unusual and unexpected. But you're right, I'm not sure audit events are a perfect fit here. The whole point behind this is that it's hard, or even impossible, to get metrics about event loop stalls in production right now without running your entire app in async debug mode. Which is interesting, because people who care about maximum performance would surely want accurate metrics on stalls whilst being totally adverse to running things in debug mode. Some way to do something in response to a stall would be ideal. We maybe even don't need an exact time, just an idea that a stall took place and a way to consume that to do custom stuff ™️. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see different numbers with the fresh main branch:
Note: there is no audit hook subscribed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking up the I'm using 3.10, so maybe things have changed, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest measuring Python 3.11 and the real scenario. |
||
if self._debug: | ||
logger.warning('Executing %s took %.3f seconds', | ||
formatted_handle, dt) | ||
finally: | ||
self._current_handle = None | ||
handle = None # Needed to break cycles when an exception occurs. | ||
|
||
def _set_coroutine_origin_tracking(self, enabled): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Add audit events for Asyncio loop stalling. Patch by Tom Forbes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs a third argument that is the anchor that should be on the link to get back here (probably
asyncio-debug-mode
but you can look at the existing docs on the web site to see which section is best)