Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to allow extensions to set callback function on creation, modification, and destruction of PyFunctionObject #91053

Closed
mpage mannequin opened this issue Mar 1, 2022 · 8 comments
Labels
3.12 expert-C-API type-feature A feature request or enhancement

Comments

@mpage
Copy link
Mannequin

mpage mannequin commented Mar 1, 2022

BPO 46897
Nosy @carljm, @DinoV, @itamaro, @mpage

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-03-01.22:19:44.618>
labels = ['expert-C-API', 'type-feature', '3.11']
title = 'Add API to allow extensions to set callback function on creation, modification, and destruction of PyFunctionObject'
updated_at = <Date 2022-03-01.22:19:44.618>
user = 'https://github.com/mpage'

bugs.python.org fields:

activity = <Date 2022-03-01.22:19:44.618>
actor = 'mpage'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['C API']
creation = <Date 2022-03-01.22:19:44.618>
creator = 'mpage'
dependencies = []
files = []
hgrepos = []
issue_num = 46897
keywords = []
message_count = 1.0
messages = ['414308']
nosy_count = 4.0
nosy_names = ['carljm', 'dino.viehland', 'itamaro', 'mpage']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue46897'
versions = ['Python 3.11']

@mpage
Copy link
Mannequin Author

mpage mannequin commented Mar 1, 2022

CPython extensions providing optimized execution of Python bytecode (e.g. the Cinder JIT) may need to hook into the lifecycle of function objects to determine what to optimize, invalidate previously-optimized functions, or free resources allocated for functions that no longer exist. For example, when inlining a function, the Cinder JIT will use the bytecode of the inlined function that was known at compile-time. If the bytecode for the inlined function changes at runtime (i.e. if __code__ was reassigned) the JIT needs to invalidate any code into which the function was inlined. We propose adding an API to allow extensions to set callbacks that will be invoked whenever functions are created, modified, or destroyed.

Proposed API:

typedef enum {
  PYFUNC_LCEVT_CREATED,
  PYFUNC_LCEVT_MODIFIED,
  PYFUNC_LCEVT_DESTROYED
} PyFunction_LifecycleEvent;

typedef enum {
  PYFUNC_ATTR_CODE,
  PYFUNC_ATTR_GLOBALS,
  PYFUNC_ATTR_DEFAULTS,
  PYFUNC_ATTR_KWDEFAULTS,
  PYFUNC_ATTR_CLOSURE,
  PYFUNC_ATTR_NOT_APPLICABLE,
} PyFunction_AttrId;

// A callback to be called in response to events in a function's lifecycle.
//
// The callback is invoked after a function is created and before the function 
// is modified or destroyed.
//
// On modification the third argument indicates which attribute was modified
// and the fourth argument is the new value.
// Otherwise the third argument is PYFUNC_ATTR_NOT_APPLICABLE and the fourth
// argument is NULL.
typedef void(*PyFunction_LifecycleCallback)(
  PyFunction_LifecycleEvent event, 
  PyFunctionObject* func,
  PyFunction_AttrId attr,
  PyObject* new_value);

void PyFunction_SetLifecycleCallback(PyFunction_LifecycleCallback callback);
PyFunction_LifecycleCallback PyFunction_GetLifecycleCallback();

@mpage mpage mannequin added 3.11 expert-C-API type-feature A feature request or enhancement labels Mar 1, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@gvanrossum
Copy link
Member

gvanrossum commented May 25, 2022

@markshannon How does this proposal from @mpage fit into our current plans?

@erlend-aasland erlend-aasland added 3.12 and removed 3.11 labels May 26, 2022
@markshannon
Copy link
Member

markshannon commented Jul 22, 2022

First we need to do something about comprehensions (and nested functions).
Function objects are created for each comprehension.
Ideally we would not create functions for list comprehensions, or treat them specially, as the function objects are inaccessible.
Closures are trickier, as they can be ephemeral, but are accessible.

Provided we can avoid hooking into the lifetime of these short live objects, then adding hooks makes sense.
We will need to handle much the same set of events as Cinder does.

I'd like to add these hooks in a principled way in the broader context of handling potential de-optimization events.
Maybe extending the API for dictionary watchers?

@carljm
Copy link
Contributor

carljm commented Jul 31, 2022

We have code in cinder's compiler to inline comprehensions instead of creating a function. It is a perf win but there is a semantic compromise in scoping / name visibility, not sure that would be acceptable.

Can you outline what you're thinking in terms of unified API? My intuition is that the needs and details of eg dict watching vs function watching are sufficiently different that separate APIs in PyDict_* and PyFunction_* will probably be simpler and clearer to use, but open to suggestions.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 15, 2022

Ping? @markshannon ?

@markshannon
Copy link
Member

markshannon commented Sep 15, 2022

We have code in cinder's compiler to inline comprehensions instead of creating a function. It is a perf win but there is a semantic compromise in scoping / name visibility, not sure that would be acceptable.

I doubt that it would be in general, but for cases where the iteration variable is not shadowed or used outside the comprehension it might be. It would need a wider discussion.

Can you outline what you're thinking in terms of unified API?

It is a bit vague at the moment, but I do need to write it up properly. For now, I'm thinking that any object that is depended on, whether dict, function or class would be allocated an ID. Optimized code would depend on a set of IDs. If any object with an ID changes then the associated optimized code(s) would be invalidated. It should be possible to implement this reasonably efficiently with bloom filters, or radix trees, or some other suitable data structure.

How does Cinder handle this?

@markshannon
Copy link
Member

markshannon commented Sep 15, 2022

Why do functions need callbacks, rather than code objects?

Once we have implemented faster-cpython/ideas#446, we can effectively specialize calls to comprehensions and nested functions, without caring about the lifetimes of the individual function objects.

@carljm
Copy link
Contributor

carljm commented Sep 15, 2022

I'm thinking that any object that is depended on, whether dict, function or class would be allocated an ID. Optimized code would depend on a set of IDs. If any object with an ID changes then the associated optimized code(s) would be invalidated.

I don't think this type of API will be sufficient for us in general. We want to be able to do code-patching on many updates, not just invalidate all generated code on any change. One reason this matters is because it handles the problem of OSR for functions further up the stack. If, say, a global value changes, we want to patch the generated code at the point where that specific global value is loaded with an unconditional deopt instruction, which will result in the correct behavior even if that optimized function is already mid-execution somewhere up the stack. So e.g. in the dict watchers API, we need details about what changed in the dict, not just the fact that the dict changed. Another example is that we do granular invalidation of our "inline" caches if the type whose information we cached is changed, we won't just throw away the generated code (which is independent of the caches and is still valid.)

How does Cinder handle this?

We use callback functions on relevant modifications to dictionaries, types, and functions, with custom handling appropriate to each case.

Why do functions need callbacks, rather than code objects?

I think the only case in which we actually depend on func-modified hooks today is if the function's __code__ is changed. Because we change a function's vectorcall entrypoint when we compile it, if its __code__ changes we need to reset to the default vectorcall entrypoint. I'm not sure if we have use cases in mind for hooking into other changes to funcs; maybe @mpage or @swtaarrs can weigh in if I'm missing something. We do modify MAKE_FUNCTION today, which in the future we might want to use a func-created hook for instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 expert-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants