New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable informing callee it's awaited via vector call flag #91121
Comments
The idea here is to add a new flag to the vectorcall nargs that indicates the call is being awaited: _Py_AWAITED_CALL_MARKER. This flag will allow the callee to know that it's being eagerly evaluated. When the call is eagerly evaluated the callee can potentially avoid various amounts of overhead. For a coroutine the function can avoid creating the coroutine object and instead returns a singleton instance of a wait handle indicating eager execution has occurred: This gives a small win by reducing the overhead of allocating the co-routine object. For something like gather much more significant wins can be achieved. If all of the inputs have already been computed the creation of tasks and scheduling of them to the event loop can be elided. An example implementation of this is available in Cinder: https://github.com/facebookincubator/cinder/blob/cinder/3.8/Modules/_asynciomodule.c#L7103 Again the gather implementation uses the singleton wait handle object to return the value indicating the computation completed synchronously. We've used this elsewhere in Cinder as well - for example if we have an "AsyncLazyValue" which lazily performs a one-time computation of a value and caches it. Therefore the common case becomes that the value is already available, and the await can be performed without allocating any intermediate values. |
The link https://github.com/facebookincubator/cinder/blob/cinder/3.8/Python/ceval.c#L6617 points to something that I wouldn't associate with the subject. @dino, could you provide a new link (preferably a permalink)? FWIW rather than dynamically checking what the next opcode is, maybe we could use a super-instruction for CALL + GET_AWAITABLE? (Understanding that there are a bunch of different CALL opcodes.) The gather code you link to is all in C. Is rewriting gather in C the only way to benefit from this speedup? (I guess you could just add your gather impl to the existing _asynciomodule.c, in the same or a separate PR.) |
Doh, sorry about that link, this one goes to a specific commit: https://github.com/facebookincubator/cinder/blob/6863212ada4b569c15cd95c4e7a838f254c8ccfb/Python/ceval.c#L6642 I do think a new opcode is a good way to go, and that could just be emitted by the compiler when it recognizes the pattern. I think we mainly avoided that because we had some issues around performance testing when we updated the byte code version and the peek was negligible, but with improved call performance in 3.11 that may not be the case anymore. It's probably possible to keep most of gather in Python if necessary, there'd still need to be a C wrapper which could flow the wrapper in and the wait handle creation would need to be possible from Python (which slightly scares me). There's probably also a perf win from the C implementation - I'll see if @v2m has any data on that. |
I will wait until there is a draft PR to review, or until you ping me.-- |
Before making gather await-aware if always have to follow the standard process and convert awaitables into tasks that are queued into the event loop for execution. In our workload task creation/queueing were adding a noticeable overhead. With await-aware gather we can execute coroutine objects eagerly and if they were not suspended - bypass task creation entirely.
|
Add a benchmark for testing async workloads, specifically an async tree workload that simulates simpler versions of a typical Instagram endpoint. (See python/cpython#91121.)
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: