bpo-44075: Add asyncio.stalled audit hook #25990

orf · 2021-05-08T12:46:45Z

https://bugs.python.org/issue44075

github-actions · 2021-06-14T00:04:34Z

This PR is stale because it has been open for 30 days with no activity.

zooba · 2021-07-06T23:30:13Z

Doc/library/asyncio-dev.rst

@@ -62,6 +62,8 @@ When the debug mode is enabled:
  minimum execution duration in seconds that is considered "slow".


+.. audit-event:: asyncio.stalled time,handle,formatted_handle,loop


This also needs a third argument that is the anchor that should be on the link to get back here (probably asyncio-debug-mode but you can look at the existing docs on the web site to see which section is best)

zooba · 2021-07-06T23:30:13Z

Lib/asyncio/base_events.py

-            else:
+            try:
+                self._current_handle = handle
+                t0 = self.time()


I'm not familiar enough to know whether these are okay to do even when not debugging, particularly the _current_handle assignment.

The report should be in debug mode only, sure.
Two timer calls per callback are not free.

Is there a better way of detecting a stall? I did a quick bit of bechmarking and a raw, non-lookup'd call to time.time() or time.monotonic() is about as fast as you can get (~50ns), on-par with invoking lambda: None or def foo(): pass.

A 100ns overhead per callback leaves a lot of breathing room.

asvetlov · 2022-03-18T00:22:29Z

Lib/asyncio/base_events.py

-            else:
+            try:
+                self._current_handle = handle
+                t0 = self.time()


The report should be in debug mode only, sure.
Two timer calls per callback are not free.

asvetlov · 2022-03-18T00:22:29Z

Lib/asyncio/base_events.py

+                dt = self.time() - t0
+                if dt >= self.slow_callback_duration:
+                    formatted_handle = _format_handle(handle)
+                    sys.audit("asyncio.stalled", dt, handle, formatted_handle, self)


Why all three arguments (handle, formatted handle, and loop) are passed?
I suspect the only handle is enough for a low-level tool.

Agreed. It might also help to have an event at the point where a handle is created, so there's another event to correlate with, but you certainly don't want to be eagerly formatting anything.

Assume the audit hook is merely print to a log that's going to be interpreted by a human (or a separate machine). It's not an opportunity for an application to change its behaviour.

Interesting.
If we add two events (handle started and handle finished) asyncio doesn't need to calculate the handle callback execution time.
The audit hook can do it itself if needed.
Also, it means that both hooks can be called even in non-debug mode.
The calling is cheap, analyzing can be expensive but it is not an asyncio problem.

Calculating the handle callback execution time seems really quite cheap, unless I'm missing something. In fact it seems it would be far cheaper to calculate the time difference and invoke an audit hook once rather than invoke an audit hook twice:

In [2]: sys.addaudithook(lambda *f: None) In [3]: a = sys.audit In [5]: %timeit a("foo") 174 ns ± 1.16 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) In [7]: m = time.monotonic In [8]: %timeit m() 60.3 ns ± 1.18 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

People who care about maximum performance won't have any audit hooks (or they'll have native ones, which are considerable quicker than pure Python ones), so it's not ~~a fair~~ the right comparison.

Still, I'd rather audit events be used for unusual/unexpected events rather than tracing. If this is a tight loop operation where 0.1us is important enough for performance to override functionality concerns, then we don't want audit events being raised that frequently. If it's fairly uncommon then may as well do the calculation and only raise relevant events to keep the noise down.

I'd argue that an event loop stall should be both unusual and unexpected. But you're right, I'm not sure audit events are a perfect fit here.

The whole point behind this is that it's hard, or even impossible, to get metrics about event loop stalls in production right now without running your entire app in async debug mode. Which is interesting, because people who care about maximum performance would surely want accurate metrics on stalls whilst being totally adverse to running things in debug mode.

Some way to do something in response to a stall would be ideal. We maybe even don't need an exact time, just an idea that a stall took place and a way to consume that to do custom stuff ™️.

I see different numbers with the fresh main branch:

>>> import timeit >>> timeit.repeat("sys.audit(1); sys.audit(1)", "import sys") [0.09353699500206858, 0.07102967600076227, 0.07030134299566271, 0.07094913300534245, 0.06989944599627052] >>> timeit.repeat("time.monotonic(); time.monotonic()", "import time") [0.19186055000318447, 0.16878722599358298, 0.16965712599630933, 0.1693656099960208, 0.16873898899939377]

Note: there is no audit hook subscribed.

Looking up the monotonic attribute dominates the benchmark. You get much better speeds by assigning time.monotonic to a local variable once and using that.

I'm using 3.10, so maybe things have changed, but sys.audit(1) throws an exception for me.

I suggest measuring Python 3.11 and the real scenario.
E.g. two loop.time() calls plus one sys.audit() vs two sys.audit() calls.

bedevere-bot · 2022-03-18T00:22:32Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

bpo-44075: Add asyncio.stalled audit hook

f587294

orf requested review from 1st1 and asvetlov as code owners May 8, 2021 12:46

This comment has been minimized.

Sign in to view

bedevere-bot added the awaiting review label May 8, 2021

the-knights-who-say-ni added the CLA not signed label May 8, 2021

orf added 2 commits May 8, 2021 17:30

Fix tests

1e59585

Add docs

1d6240d

the-knights-who-say-ni added CLA signed and removed CLA not signed labels May 14, 2021

terryjreedy requested a review from zooba May 14, 2021 21:38

github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 14, 2021

zooba reviewed Jul 6, 2021

View changes

asvetlov requested changes Mar 18, 2022

View changes

bedevere-bot removed the awaiting review label Mar 18, 2022

bedevere-bot added the awaiting changes label Mar 18, 2022

orf mannequin mentioned this pull request Apr 10, 2022

Add a PEP578 audit hook for Asyncio loop stalls #88241

Open

ezio-melotti removed the CLA signed label Jul 13, 2022

github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 7, 2022

bpo-44075: Add asyncio.stalled audit hook #25990

bpo-44075: Add asyncio.stalled audit hook #25990

orf commented May 8, 2021 •

edited

This comment has been minimized.

github-actions bot commented Jun 14, 2021

zooba Jul 6, 2021

zooba Jul 6, 2021

asvetlov Mar 18, 2022

orf Mar 18, 2022 •

edited

asvetlov Mar 18, 2022

asvetlov Mar 18, 2022 •

edited

zooba Mar 18, 2022

asvetlov Mar 18, 2022

orf Mar 18, 2022

zooba Mar 18, 2022 •

edited

orf Mar 18, 2022

asvetlov Mar 18, 2022

orf Mar 18, 2022

asvetlov Mar 19, 2022

bedevere-bot commented Mar 18, 2022

		@@ -62,6 +62,8 @@ When the debug mode is enabled:
		minimum execution duration in seconds that is considered "slow".


		.. audit-event:: asyncio.stalled time,handle,formatted_handle,loop

bpo-44075: Add asyncio.stalled audit hook #25990

Are you sure you want to change the base?

bpo-44075: Add asyncio.stalled audit hook #25990

Conversation

orf commented May 8, 2021 • edited

This comment has been minimized.

github-actions bot commented Jun 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orf Mar 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asvetlov Mar 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zooba Mar 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-bot commented Mar 18, 2022

orf commented May 8, 2021 •

edited

orf Mar 18, 2022 •

edited

asvetlov Mar 18, 2022 •

edited

zooba Mar 18, 2022 •

edited