Enable informing callee it's awaited via vector call flag #91121

DinoV · 2022-03-09T00:37:25Z

BPO	46965
Nosy	@gvanrossum, @carljm, @DinoV, @markshannon, @corona10, @vladima, @brandtbucher, @itamaro

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/DinoV'
closed_at = None
created_at = <Date 2022-03-09.00:37:24.565>
labels = ['3.11', 'performance']
title = "Enable informing callee it's awaited via vector call flag"
updated_at = <Date 2022-03-22.16:50:02.812>
user = 'https://github.com/DinoV'

bugs.python.org fields:

activity = <Date 2022-03-22.16:50:02.812>
actor = 'v2m'
assignee = 'dino.viehland'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2022-03-09.00:37:24.565>
creator = 'dino.viehland'
dependencies = []
files = []
hgrepos = []
issue_num = 46965
keywords = []
message_count = 5.0
messages = ['414782', '415301', '415433', '415640', '415795']
nosy_count = 8.0
nosy_names = ['gvanrossum', 'carljm', 'dino.viehland', 'Mark.Shannon', 'corona10', 'v2m', 'brandtbucher', 'itamaro']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue46965'
versions = ['Python 3.11']

DinoV · 2022-03-09T00:37:24Z

The idea here is to add a new flag to the vectorcall nargs that indicates the call is being awaited: _Py_AWAITED_CALL_MARKER. This flag will allow the callee to know that it's being eagerly evaluated. When the call is eagerly evaluated the callee can potentially avoid various amounts of overhead. For a coroutine the function can avoid creating the coroutine object and instead returns a singleton instance of a wait handle indicating eager execution has occurred:
https://github.com/facebookincubator/cinder/blob/cinder/3.8/Python/ceval.c#L6617

This gives a small win by reducing the overhead of allocating the co-routine object.

For something like gather much more significant wins can be achieved. If all of the inputs have already been computed the creation of tasks and scheduling of them to the event loop can be elided. An example implementation of this is available in Cinder: https://github.com/facebookincubator/cinder/blob/cinder/3.8/Modules/_asynciomodule.c#L7103

Again the gather implementation uses the singleton wait handle object to return the value indicating the computation completed synchronously.

We've used this elsewhere in Cinder as well - for example if we have an "AsyncLazyValue" which lazily performs a one-time computation of a value and caches it. Therefore the common case becomes that the value is already available, and the await can be performed without allocating any intermediate values.

gvanrossum · 2022-03-15T21:47:47Z

The link https://github.com/facebookincubator/cinder/blob/cinder/3.8/Python/ceval.c#L6617 points to something that I wouldn't associate with the subject. @dino, could you provide a new link (preferably a permalink)?

FWIW rather than dynamically checking what the next opcode is, maybe we could use a super-instruction for CALL + GET_AWAITABLE? (Understanding that there are a bunch of different CALL opcodes.)

The gather code you link to is all in C. Is rewriting gather in C the only way to benefit from this speedup? (I guess you could just add your gather impl to the existing _asynciomodule.c, in the same or a separate PR.)

DinoV · 2022-03-17T17:27:01Z

Doh, sorry about that link, this one goes to a specific commit: https://github.com/facebookincubator/cinder/blob/6863212ada4b569c15cd95c4e7a838f254c8ccfb/Python/ceval.c#L6642

I do think a new opcode is a good way to go, and that could just be emitted by the compiler when it recognizes the pattern. I think we mainly avoided that because we had some issues around performance testing when we updated the byte code version and the peek was negligible, but with improved call performance in 3.11 that may not be the case anymore.

It's probably possible to keep most of gather in Python if necessary, there'd still need to be a C wrapper which could flow the wrapper in and the wait handle creation would need to be possible from Python (which slightly scares me). There's probably also a perf win from the C implementation - I'll see if @v2m has any data on that.

gvanrossum · 2022-03-20T22:54:07Z

I will wait until there is a draft PR to review, or until you ping me.--
--Guido (mobile)

vladima · 2022-03-22T16:50:03Z

introducing dedicated opcodes for each kind of awaited call is definitely an option. In fact first implementation used it however as Dino has mentioned it was more of a logistical issue (there were several spots that produced .pyc files so compiler needed to be up to date across all of them).
there was some perf win that was coming from rewriting gather in C however main reason for us to do it was the ability to be await-aware which we made only available in C (also returning completed waithandle is not exposed to Python either) to reduce the scope. Providing ability to follow await-aware protocol (read indicator that call is awaited + return completed waithandle for eagerly completed calls) in pure Python is definitely possible
to provide some context why it was beneficial: typical implementation of endpoint in IG is an async function that in turn calls into numerous other async functions to generate an output.
- gather is used all over the place in case if there are no sequential dependency between calls
- amount of unique pieces of data that are ultimately fetched by async calls is not very big, i.e. the same fragment of information can be requested by different async calls which makes memoization a very attractive strategy to reduce I/O and heavyweight computations.
- memoized pieces of data is represented effectively by completed futures and it is very common to have gather accept either memoized value or coroutine object that will be completed synchronously by awaiting memoized value.

Before making gather await-aware if always have to follow the standard process and convert awaitables into tasks that are queued into the event loop for execution. In our workload task creation/queueing were adding a noticeable overhead. With await-aware gather we can execute coroutine objects eagerly and if they were not suspended - bypass task creation entirely.

import asyncio
import time

async def step(i):
    if i == 0:
        return
    await asyncio.gather(*[step(i - 1) for _ in range(6)])

async def main():
    t0 = time.perf_counter()
    await step(6)
    t1 = time.perf_counter()
    print(f"{t1 - t0} s")

N = 0
def create_task(loop, coro):
    global N
    N += 1
    return asyncio.Task(coro, loop=loop)

loop = asyncio.get_event_loop()
loop.set_task_factory(create_task)
loop.run_until_complete(main())
print(f"{N} tasks created")

# Cinder
# 0.028410961851477623 s
# 1 tasks created

# Python 3.8
# 1.2157012447714806 s
# 55987 tasks created

Add a benchmark for testing async workloads, specifically an async tree workload that simulates simpler versions of a typical Instagram endpoint. (See python/cpython#91121.)

DinoV added the 3.11 label Mar 9, 2022

DinoV self-assigned this Mar 9, 2022

DinoV added performance 3.11 labels Mar 9, 2022

DinoV self-assigned this Mar 9, 2022

DinoV added the performance label Mar 9, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

arielin3 mentioned this issue Apr 27, 2022

Add benchmark for async tree workloads python/pyperformance#187

Merged

python / cpython Public

Enable informing callee it's awaited via vector call flag #91121

Enable informing callee it's awaited via vector call flag #91121

DinoV commented Mar 9, 2022

DinoV commented Mar 9, 2022

gvanrossum commented Mar 15, 2022

DinoV commented Mar 17, 2022

gvanrossum commented Mar 20, 2022

vladima mannequin commented Mar 22, 2022

python / cpython Public

Enable informing callee it's awaited via vector call flag #91121

Enable informing callee it's awaited via vector call flag #91121

Comments

DinoV commented Mar 9, 2022

DinoV commented Mar 9, 2022

gvanrossum commented Mar 15, 2022

DinoV commented Mar 17, 2022

gvanrossum commented Mar 20, 2022

vladima mannequin commented Mar 22, 2022