Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguation of multiple Interpreters/VMs and associated data #5410

Open
kawogi opened this issue Sep 21, 2024 · 9 comments
Open

Disambiguation of multiple Interpreters/VMs and associated data #5410

kawogi opened this issue Sep 21, 2024 · 9 comments
Assignees
Labels
C-enhancement New feature or request

Comments

@kawogi
Copy link

kawogi commented Sep 21, 2024

Summary

I think there's a need for a way to give VMs an identity and associate user data with them. The user-data shall be accessible to the Rust-side that handles calls from a Python script.

(I'm very new to this project so I might have missed an existing feature that would solve my issue. Please let me know if that is the case.)

Expected use case

I'm working on a framework where multiple Python VMs shall run in parallel. All of them share the same API so every call from a Python Script ends up in the same Rust function. Imagine something simple such as fn increment_counter() where each script instance shall have its own counter.

fn increment_counter(vm: &VirtualMachine) {
   // where to store the counter and to select the one associated with `vm`?
}

As a workaround I abuse the wasm_id to disambiguate between multiple VMs and store all state in a global HashMap with the VM's id as key.

// pseudocode

thread_local! {
    pub(super) static COUNTERS: RefCell<HashMap<String, u32>> = RefCell::default();
}

fn increment_counter(vm: &VirtualMachine) {
    let vm_id = vm.wasm_id.as_ref().unwrap();
    API_CLIENTS.with_borrow_mut(|counters| {
        let mut counter = counters.get_mut(&self.id).expect("unknown VM");
        counter += 1;
    });
}

Alternatively: Assuming that only a single VM can be running per thread, I could use a global thread_local variable to indicate which VM is running and setting this whenever I plan to enter a VM.

Maybe a new field id or name would make this pattern a little more official. A name could also be used to name the VM in error messages and logs.

An even better approach might be to store the associated data itself within the VirtualMachine. IIUC this data would be the rust-counterpart of the Python context.

Suggested change (all names to be bike-shedded):

pub struct VirtualMachine<UserData = ()> {
    pub user_data: UserData,
}

The default should make it (mostly) backwards-compatible and also zero-cost for all existing implementations.

The above example would become:

fn increment_counter(vm: &VirtualMachine<AtomicU32>) {
    vm.user_data.fetch_add(1, Ordering::Relaxed);
}

As vm is passed immutably, one would have to rely on inner mutability for the UserData type.

I know VirtualMachine is a central structure with a lot of references so I don't expect this change to be accepted without a thorough discussion. If we can agree about a solution I would volunteer to do the implementation.

@kawogi kawogi added the C-enhancement New feature or request label Sep 21, 2024
@kawogi
Copy link
Author

kawogi commented Sep 21, 2024

Another (maybe less invasive and more versatile) approach would be to pass the aforementioned user-data as argument to the enter-method (maybe as enter_with<UserData>(user_data: UserData, …)).

The passed data would be added as another parameter next to the vm, when calling into a function:

let mut my_counter: u32 = 0;
interpreter.enter_with(&mut my_counter, |vm| {});

fn increment_counter(vm: &VirtualMachine, counter: &mut u32) {
    counter += 1;
}

A huge benefit would be that the user_data doesn't need to be stored anywhere, so that Mutex/RefCell could be avoided.

I don't know how hard that would be to implement in the macro so that there's no ambiguity between ordinary arguments coming from Python and UserData.

@youknowone
Copy link
Member

Adding a customizable user data field requires customizable area of VirtualMachine, which leads generic type parameter which will be not very useful for most of users.

On the other hand, adding a simple, probably incremental id for each VM can be easier and less fragile.

Typically a pointer value of VM object can be an id of it, but I am not sure it is safe in RustPython.

@coolreader18 Do you have a good idea?

@kawogi
Copy link
Author

kawogi commented Sep 22, 2024

I'm facing a new problem which looks like it would benefit from a solution that doesn't rely on shared state via global variables:

I have an ApplicationState which needs to be updated in a transactional manner. It can be accessed from multiple threads so I wrapped it in a Mutex. While holding the lock I call into an existing python VM which shall perform modification on that state. For this to work, I need to pass the MutexGuard to the callback function which is getting called by the script+vm. I did not find a way to accomplish this task, as the MutexGuard has an attached non-static lifetime - so I cannot store it in a global static variable.

some pseudocode to explain the problem:

fn update(state: &Arc<Mutex<ApplicationState>>) {
    let mut state_lock = state.lock().unwrap();

    // work with the state here

    interpreter.enter(|vm| {
        vm.call_into_some_python_function();
        // this function will in turn call `update_application_state` (see below)
    });    

    // work with the state here even more

    // state will be unlocked here; transaction completed
}

#[pymodule]
pub(crate) mod py_handlers {

    #[pyfunction]
    fn update_application_state(vm: &VirtualMachine) -> PyResult<()> {
        // ← here I need access to the `MutexGuard`
    }

}

If I was able to pass the MutexGuard into the vm-call I might be able to do:

fn update(state: &Arc<Mutex<ApplicationState>>) {
    let mut state_lock = state.lock().unwrap();

    // work with the state here

    interpreter.enter_with_userdata(&mut state_lock, |vm| {
        vm.call_into_some_python_function();
        // this function will in turn call `update_application_state` (see below)
    });    

    // work with the state here even more

    // state will be unlocked here; transaction completed
}

#[pymodule]
pub(crate) mod py_handlers {

    #[pyfunction]
    fn update_application_state(vm: &VirtualMachine, application_state: &mut MutexGuard<ApplicationState>) -> PyResult<()> {
        application_state.update();
    }

}

I could imagine to work around this problem by storing the entire Arc<Mutex<ApplicationState>> in a global variable and add another Mutex-mechanism somewhere to protect the surrounding transaction, but this would be very error-prone, complex and slower.

Is there an easier way to achieve that?

@kawogi
Copy link
Author

kawogi commented Sep 22, 2024

Typically a pointer value of VM object can be an id of it, but I am not sure it is safe in RustPython.

This would imply that a VM never gets moved in memory (in other words Pin IIUC). It's at least a workaround without involving any changes to RustPython - it would be the responsibility of the caller to make this safe.

@kawogi
Copy link
Author

kawogi commented Sep 22, 2024

I tried to implement the idea to get a feeling on how big the fallout would be … doesn't look good :(

Adding the user data to just enter doesn't help, as it needs to get passed into the closure anyway. Then one would have to add user_data to each call that might invoke python code. I tried that for import and had to give up as the generic type of the user data would have to be added to a lot of types, just to pass-through a single value.

I hope there's a way to keep user_data around for the duration of a call into the VM and then (when a rust-function is being called back) use that value as an additional argument. But I'm not sure if this is feasible without major changes.

I'll wait for more feedback now and try to find a hack to work around the Mutex-Problem shown above.

@youknowone
Copy link
Member

Is it possible to add a parameter to call_into_some_python_function?

Then making a #[pyclass] type including ApplicationState reference (not the & reference, but any kind of) will be possible to expose it as a Python object.

@youknowone
Copy link
Member

youknowone commented Sep 24, 2024

Here is a concept without a new type

fn update(state: &Arc<Mutex<ApplicationState>>) {
    let mut state_lock = state.lock().unwrap();

    // work with the state here
    let ptr = state_lock.deref_mut() as *mut u8;  // This is ugly, but safe as long as this is only used in the lock
    interpreter.enter(|vm| {
        let context = vm.ctx.new_int(ptr);
        vm.call_into_some_python_function(context);
        // this function will pass context to `update_application_state`
    });    

    // work with the state here even more

    // state will be unlocked here; transaction completed
}

@kawogi
Copy link
Author

kawogi commented Sep 26, 2024

call_into_some_python_function was meant as a placeholder to any vm that might trigger a callback. This could be vm.import(), vm.call_*(), etc. where I cannot just add a parameter by myself. (maybe should've expressed that more explicitly, sorry)

Also I guess I would have to access the context by using unsafe Rust, correct?

@coolreader18
Copy link
Member

An idea that's been in my head for a while is an AnyMap stored in the vm that users could insert into - this could also work as a interpreter-distinguishing substitute for the global variables that some cpython modules store things in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants