Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-35378: Fix multiprocessing.Pool references #11627

Merged
merged 7 commits into from Feb 11, 2019

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Jan 21, 2019

  1. Use a strong reference between the Pool and associated iterators

  2. Rework PR bpo-34172: multiprocessing.Pool leaks resources after being deleted #8450 to eliminate a cycle in the Pool.

There is no test in this PR because any test that automatically tests this behaviour needs to eliminate the pool before joining the pool to check that the pool object is garbaged collected/does not hang. But doing this will potentially leak threads and processes (see https://bugs.python.org/issue35413).

https://bugs.python.org/issue35378

@pablogsal pablogsal added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir labels Jan 21, 2019
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implementation question below.

@@ -656,13 +682,14 @@ def __exit__(self, exc_type, exc_val, exc_tb):

class ApplyResult(object):

def __init__(self, cache, callback, error_callback):
def __init__(self, pool, callback, error_callback, cache=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the separate cache argument?

Copy link
Member Author

@pablogsal pablogsal Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from the last PR.

Originally, in the last PR, I made the cache as a keyword argument to decouple the cache from the pool, as this allows a broader set of strategies (in the future) to have these separated and therefore reducing dependency cycles and lifetime chains.

In this PR I am targeting the simplest solution, so I am going to remove the cache argument and just get the cache out of the pool.

In edee524 I have reverted to use always the cache from the pool.

@@ -701,16 +729,16 @@ def _set(self, i, obj):

class MapResult(ApplyResult):

def __init__(self, cache, chunksize, length, callback, error_callback):
ApplyResult.__init__(self, cache, callback,
def __init__(self, pool, chunksize, length, callback, error_callback, cache=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check my answer in the previous comment.

@vstinner
Copy link
Member

It seems like this chang is more for https://bugs.python.org/issue34172 no ?

@vstinner
Copy link
Member

Does this change affect the following code?


import multiprocessing

def the_test():
    print("Begin")
    for x in multiprocessing.Pool().imap(int,
            ["4", "3"]):
        print(x)
    print("End")

the_test()

Ref: https://bugs.python.org/issue34172#msg330864

@pablogsal
Copy link
Member Author

pablogsal commented Jan 23, 2019

It seems like this chang is more for https://bugs.python.org/issue34172 no ?

This is combining https://bugs.python.org/issue35378 and https://bugs.python.org/issue34172 because both issues neet to be solved at the same time (one fix does not have total sense without the other). This PR breaks the reference cycle and ties the lifetime of the iterators so there is no change in behaviour.

Does this change affect the following code?

No, that's the point of solving both issues (bpos) at the same time: it preserves backwards compatibility:

This patch

Begin
/home/pablogsal/github/cpython/Lib/multiprocessing/pool.py:234: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.Pool state=RUN pool_size=12>
  _warn(f"unclosed running multiprocessing pool {self!r}",
ResourceWarning: Enable tracemalloc to get the object allocation traceback
4
3
End

python3.7

Begin
4
3
End

python3.6

Begin
4
3
End

@@ -0,0 +1,3 @@
Fix a reference issue inside :class:`multiprocessing.Pool` that caused
the pool to remain alive if it was deleted without being closed or
terminated explicitly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that your change also adds a strong reference to the pool in iterators. Would you mind to also explain that?

@pablogsal
Copy link
Member Author

@pitrou @vstinner I have added a note about the strong references in the NEWS entry. Could you review/accept the PR? I would like to be able to merge this early in the dev cycle so this fix can be tested better.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks fine to me, though it would be better if there could be some tests.

@pablogsal
Copy link
Member Author

The PR looks fine to me, though it would be better if there could be some tests.

Thanks for the review! I agree that tests should be desirable but sadly, the reason there is no test is that any test that automatically checks this behaviour needs to eliminate the pool before joining the pool to check that the pool object is garbaged collected/does not hang but doing this will potentially leak threads and processes and is full of races.

I would advise against anything that risks more races on multiprocessing testing.

Being said that if someone knows of a reliable way of testing this, I am happy to implement it. :)

@pitrou
Copy link
Member

pitrou commented Feb 7, 2019

Well, if your fix is right, the threads and processes shouldn't leak, they should just finish after a while. So the test suite's helpers for reaping threads and processes should do their job, no? (assuming we call them)

@pablogsal
Copy link
Member Author

pablogsal commented Feb 7, 2019

Well, if your fix is right, the threads and processes shouldn't leak, they should just finish after a while. So the test suite's helpers for reaping threads and processes should do their job, no? (assuming we call them)

But the test forces a situation in which the leaks happen because you need to destroy the pool without joining it before. This PR prevents a deadlock situation and a big leak, not a perfect finalization of the pool in every situation. And the hang happens mostly when you don't finalize properly the pool.

@pitrou
Copy link
Member

pitrou commented Feb 7, 2019

Thanks for the explanation. Then I'd say it's good for merging :-)

@pablogsal pablogsal merged commit 3766f18 into python:master Feb 11, 2019
@pablogsal pablogsal deleted the bpo35378 branch February 11, 2019 17:29
return self._ctx.Process(*args, **kwds)
@staticmethod
def Process(ctx, *args, **kwds):
return ctx.Process(*args, **kwds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this need to change? I feel like pool.Process existed solely as a convenience method, and this breaks any code that used it. Instead of changing this function, you could just use ctx.Process instead of self.Process below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants