Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-28053: Allow custom reducer when using multiprocessing #15058

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

@pierreglaser
Copy link
Contributor

@pierreglaser pierreglaser commented Jul 31, 2019

This PR is a follow up on #9959. It implements the suggestions I made here #9959 (comment).

I'm making it a new PR to get the CI working now and gather separate feedback.

  • pass the current context to Process objects
  • split AbstractPickler into AbstractPickler and AbstractUnpickler
  • clean everything up (still WIP)

https://bugs.python.org/issue28053

@pierreglaser
Copy link
Contributor Author

@pierreglaser pierreglaser commented Aug 1, 2019

You will notice that for the Process class, I chose to pass the reducer (which is the only bit of information we need from the context for now) to the Process class at construction instead of the Context

    def process(self, *args, **kwargs):
        return self._Process(*args, reducer=self.get_reducer(), **kwargs)

This comes from the fact that theProcess is actually relying on the default context to be properly defined:

class Process(process.BaseProcess):
_start_method = None
@staticmethod
def _Popen(process_obj):
return _default_context.get_context().Process._Popen(process_obj)

For the process class, passing the context object a Process bind to would require changing it to something like:

class Process(process.BaseProcess):
    _start_method = None
    @staticmethod
    def _Popen(process_obj):
          if self._ctx is not None:
              return self._ctx.get_context()._Process._Popen(process_obj)
          return _default_context.get_context()._Process._Popen(process_obj)

with _Process being a Process class specified (or not, in this case the default is Process) by the user. You can see the infinite recursion here as if _Process is not specified or set to Process, self._ctx.get_context()._Process._Popen is Process._Popen... To prevent this situation we would need to add a bunch of guards, and this ends up being not so trivial.

So implementation-wise, it is easier to simply pass the reducer in BaseContext.process, and not the whole context. Also, one noticeable consequence is that the reducer in Process is static, as it is not called using something like self._ctx.get_reducer() as it is done elsewhere.

But I am opened to suggestions.

Lib/multiprocessing/context.py Outdated Show resolved Hide resolved
Lib/multiprocessing/context.py Outdated Show resolved Hide resolved
@reducer.setter
def reducer(self, reduction):
globals()['reduction'] = reduction
if not isinstance(reduction, original_reducer.AbstractReducer):
Copy link

@pppery pppery Aug 5, 2019

The checks below aren't strict enough; they pass for a .register()ed subclass of Abstract<Reducer|Pickler|Unpickler>, but the code won't actually work with such a class due to depending on undocumented APIs.

Copy link
Contributor Author

@pierreglaser pierreglaser Aug 7, 2019

I do not understand what you mean by "Undocumented API". This PR comes with documentation for the AbstractReducer, AbstractPickler, and AbstractUnpickler.

Copy link

@pppery pppery Aug 7, 2019

What I mean is that the documentation suggests that the below code should work:

class DummyReducer:
    def get_pickler_class():
        return pickle.Pickler
    def get_unpickler_class():
        return pickle.Unpickler
AbstractReducer.register(DummyReducer)
multiprocessing.set_reducer(DummyReducer())

However, in practice, it appears that running that code (and then doing something that uses reduction) will produce a (rather cryptic) error message in some cases. Either the documentation should be clear that virtual subclasses don't work and the code should raise an error when a virtual subclass is passed, or the code should be updated to handle virtual subclasses.

Copy link
Contributor Author

@pierreglaser pierreglaser Aug 7, 2019

What I mean is that the documentation suggests that the below code should work

Not really: the get_pickler_class API documentation specifies that get_pickler_class must return anAbstractPickler subclass, but your DummyReducer (which itself, should be an AbstractReducer subclass) simply returns a Pickler, which is not an AbstractPickler subclass. Or maybe I misunderstood your message?

Copy link

@pppery pppery Aug 7, 2019

You didn't misunderstand anything, my example was accidentally bogus, and I misunderstood the code. The actual bug I was trying to point out appears not to exist with reducers, but instead with picklers.

class DummyReducer(AbstractReducer):
    def get_pickler_class():
        return DummyPickler
class DummyPickler(pickle.Pickler):
    pass
AbstractPickler.register(DummyPickler)
multiprocessing.set_reducer(DummyReducer())

This seemingly harmless code appears to have the effect of breaking the passing of all of the custom types that multiprocessing registers, in an entirely non-intuitive way, which I think should either be fixed or documented.

Lib/multiprocessing/forkserver.py Outdated Show resolved Hide resolved
@pierreglaser
Copy link
Contributor Author

@pierreglaser pierreglaser commented Aug 7, 2019

@pitrou what could I do to ease the review process of this PR?

@pitrou
Copy link
Member

@pitrou pitrou commented Aug 7, 2019

@pierreglaser The first necessary condition is for me to find some time... sorry :-/

@pierreglaser
Copy link
Contributor Author

@pierreglaser pierreglaser commented Aug 7, 2019

@pitrou no worries :)

Lib/multiprocessing/context.py Outdated Show resolved Hide resolved
Lib/multiprocessing/context.py Outdated Show resolved Hide resolved
'''Abstract base class for use in implementing a Reduction class
suitable for use in replacing the standard reduction mechanism
used in multiprocessing.'''
ForkingPickler = ForkingPickler
register = register
dump = dump
send_handle = send_handle
Copy link

@pppery pppery Aug 7, 2019

None of the attributes defined between here and __init__ appear to be used.

Copy link
Contributor Author

@pierreglaser pierreglaser Feb 2, 2020

Same comment as below.


def __init__(self, *args):
register(type(_C().f), _reduce_method)
register(type(list.append), _reduce_method_descriptor)
register(type(int.__add__), _reduce_method_descriptor)
register(functools.partial, _reduce_partial)
register(socket.socket, _reduce_socket)

@property
def HAVE_SEND_HANDLE(self):
Copy link

@pppery pppery Aug 7, 2019

This property does not appear to be used.

Copy link
Contributor Author

@pierreglaser pierreglaser Feb 2, 2020

This is work taken from @pablogsal previous commits (see #9959). I assume there was a good reason for this, but he'll probably know better than me.

def get_unpickler_class(self):
return AbstractUnpickler

def register(self, type_, reduce_):
Copy link

@pppery pppery Aug 7, 2019

This method (a) doesn't appear to be used and (b) doesn't necessarily work, because nothing requires the get_pickler_class to define a register method.

Copy link
Contributor Author

@pierreglaser pierreglaser Feb 2, 2020

Regarding (b), this is stated in the documentation that get_pickler_class should return a subclass of an AbstractPickler, that itself (as a subclass of ForkingPickler), implements a register method. Regarding (a), same comment as above (this @pablogsal's work), but I'll think about whether this function can be removed.

@pierreglaser
Copy link
Contributor Author

@pierreglaser pierreglaser commented Feb 2, 2020

@pablogsal friendly ping: I am curious to know what are your thoughts on this.

@@ -0,0 +1 @@
Fix the implementation of custom reducers for in `multiprocessing`.
Copy link
Member

@terryjreedy terryjreedy Feb 3, 2020

'for in' Either a word is missing or there is an extra word.

Copy link
Member

@aeros aeros Feb 12, 2020

To me, it looks like "for" is the extra word. I'd personally go with:

Suggested change
Fix the implementation of custom reducers for in `multiprocessing`.
Fix the implementation of custom reducers in :mod:`multiprocessing`.

(Also adds the sphinx role :mod: to provide an inline link to the module. It's optional for Misc/NEWS.d, but we may as well include it with the typo fix. This makes it easier to move over to the "What's New" doc later.)

@terryjreedy
Copy link
Member

@terryjreedy terryjreedy commented Feb 3, 2020

(sorry, failed rebasing attempt)

Rebasing is, as evidenced by repeated failures like this, error prone. Besides the obvious noise, removing the spurious review requests does not unsubscribe people.

If you were just trying to update your branch to your current repository/fork master (once synchronized), rebasing is also not needed. After checking out the branch, git pull origin/master will, if there are no conflicts, pull, merge, and commit updates to the branch with a standard commit message.

@tiran tiran removed their request for review Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants