Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object #1560

Merged
merged 2 commits into from Sep 1, 2017

Conversation

grzgrzgrz3
Copy link
Contributor

@grzgrzgrz3 grzgrzgrz3 commented May 12, 2017

@mention-bot
Copy link

mention-bot commented May 12, 2017

@grzgrzgrz3, thanks for your PR! By analyzing the history of the files in this pull request, we identified @brianquinlan, @asvetlov and @ezio-melotti to be potential reviewers.

@pitrou pitrou changed the title bpo-27144: concurrent.futures as_complie and map iterators do not keep bpo-27144: concurrent.futures as_complete and map iterators do not keep Jul 3, 2017
@pitrou pitrou changed the title bpo-27144: concurrent.futures as_complete and map iterators do not keep bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object Jul 3, 2017
@@ -170,6 +170,17 @@ def _create_and_install_waiters(fs, return_when):

return waiter


def _yield_future(fs, waiter, ref_collect=()):
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give this function a more descriptive name?


def _yield_future(fs, waiter, ref_collect=()):
while fs:
with fs[0]._condition:
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you need to do this?

Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your question is about line 176. Based on issue20319, changes on future waiters list should be locked.

Copy link
Member

@pitrou pitrou Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't understand why you need to remove the waiter. Previous code didn't do this.

Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not remove waiter, second result set on future will trigger waiter and cause KeyError, future is already returned and reference cleared.

For example:

>>> from concurrent.futures import Future, as_completed
>>> fs_finished = Future()
>>> fs = Future()
>>> fs_finished.set_result("")
>>> for x in as_completed([fs_finished, fs, Future()]):
...     fs.set_result(None)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/grzegorz/cpython/Lib/concurrent/futures/_base.py", line 235, in as_completed
    pending.remove(future)
KeyError: <Future at 0x7f566ff6fc20 state=finished returned NoneType>

This error occurs in both version.

However docstring for Future.set_result and Future.set_exception says:

Should only be used by Executor implementations and unit tests.

So maybe we should ignore this case?

Copy link
Member

@pitrou pitrou Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly then, yes, I would certainly consider it an error to call set_result twice.

Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, this part is not relate to origin issue, so i ll remove discussed code.

Maybe in future someone encounters it and create new issue, so it can be discussed there.

fs[0]._waiters.remove(waiter)

for future_list in ref_collect:
future_list.remove(fs[0])
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If future_list is a list, then this is O(n)...

Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this use case future_list will be always a set.
set().remove is O(1).
I will rename future_list to futures_collection

else:
yield future.result(end_time - time.time())
yield fs.pop(0).result(end_time - time.time())
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fs is a list, then this is O(n).


for future_list in ref_collect:
future_list.remove(fs[0])
yield fs.pop(0)
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too.

Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case fs is always a list, however I can modify this function to yield last element. I think order here does not matter. If order matter, reversing list is the option. What do you think?

Copy link
Member

@pitrou pitrou Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either reverse the list or make it a deque, as you prefer.

@@ -357,6 +357,12 @@ def _check_system_limits():
raise NotImplementedError(_system_limited)


def _chain_from_iterable(iterable):
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Please add a comment and/or docstring.

@@ -59,6 +59,10 @@ def my_method(self):
pass


def _map_fn(_):
Copy link
Member

@pitrou pitrou Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give this function a more descriptive name, new_object perhaps?

@pitrou
Copy link
Member

pitrou commented Jul 3, 2017

Thanks for caring about this issue. I think the proposed fix needs improvements, see comments.

@grzgrzgrz3
Copy link
Contributor Author

grzgrzgrz3 commented Jul 3, 2017

Thank you for review 👍. To be honest when i was writing it i do not think about complexity at all.

Please review new version. I have pushed with force, delete branch before pulling.

@@ -170,6 +170,17 @@ def _create_and_install_waiters(fs, return_when):

return waiter


def _yield_and_decref(fs, waiter, ref_collect=()):
Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my best, i could not came up with anything better. Maybe you can suggest name more descriptive.

Copy link
Member

@pitrou pitrou Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a comment explaining why this function exists, then?

# We don't particularly care what the default name is, just that
# it has a default name implying that it is a ThreadPoolExecutor
# followed by what looks like a thread number.
self.assertRegex(t.name, r'^.*ThreadPoolExecutor.*_[0-4]$')
Copy link
Member

@pitrou pitrou Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why does your PR affect this?

Copy link
Contributor Author

@grzgrzgrz3 grzgrzgrz3 Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, when rebasing.

for result_object in self.executor.map(_dummy_object_fn, range(10)):
self.assertEqual(sys.getrefcount(result_object), 2)

def test_map_result_order(self):
Copy link
Member

@pitrou pitrou Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already tested by test_map? Or perhaps you just want to augment test_map with a chunksize-using test.

@grzgrzgrz3
Copy link
Contributor Author

grzgrzgrz3 commented Jul 26, 2017

Any update on review. Please have a look at last revision.

@pitrou
Copy link
Member

pitrou commented Aug 29, 2017

Sorry for the delay @grzgrzgrz3. The PR now has conflicts, could you please resolve them?
You'll also need to add a NEWS entry using the blurb CLI tool.

@grzgrzgrz3
Copy link
Contributor Author

grzgrzgrz3 commented Aug 30, 2017

Done.

@pitrou
Copy link
Member

pitrou commented Sep 1, 2017

I'm trying to push some small changes to your branch. Crossing fingers.

@pitrou
Copy link
Member

pitrou commented Sep 1, 2017

I'm merging this. Thank you very much!

@pitrou pitrou merged commit 97e1b1c into python:master Sep 1, 2017
pitrou pushed a commit to pitrou/cpython that referenced this pull request Sep 1, 2017
…not keep reference to returned object (pythonGH-1560)

* bpo-27144: concurrent.futures as_complie and map iterators do not keep
reference to returned object

* Some nits.  Improve wordings in docstrings and comments, and avoid relying on
sys.getrefcount() in tests.
(cherry picked from commit 97e1b1c)
pitrou added a commit that referenced this pull request Sep 1, 2017
…not keep reference to returned object (GH-1560) (#3266)

bpo-27144: concurrent.futures as_complie and map iterators do not keep
reference to returned object

(cherry picked from commit 97e1b1c)
jimmylai pushed a commit to jimmylai/cpython that referenced this pull request Sep 4, 2017
* 'master' of https://github.com/python/cpython: (601 commits)
  remove check for bug last seem in Solaris 9 (python#3285)
  Change code owners for hashlib and ssl to the crypto team (python#3284)
  bpo-31281: Fix pathlib.Path incompatibility in fileinput (pythongh-3208)
  remove autoconf check for select() (python#3283)
  remove configure check for 'volatile' (python#3281)
  Add missing _sha3 module to Setup.dist (python#2395)
  bpo-12383: Also ignore __PYVENV_LAUNCHER__ (python#3278)
  bpo-9146: add the missing NEWS entry. (python#3275)
  Fix a c.f.as_completed() refleak previously introduced in bpo-27144 (python#3270)
  bpo-31185: Fixed miscellaneous errors in asyncio speedup module. (python#3076)
  remove a redundant lower in urllib.parse.urlsplit (python#3008)
  bpo-31323: Fix reference leak in test_ssl (python#3263)
  bpo-31250, test_asyncio: fix EventLoopTestsMixin.tearDown() (python#3264)
  bpo-31326: ProcessPoolExecutor waits for the call queue thread (python#3265)
  bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object (python#1560)
  bpo-31250, test_asyncio: fix dangling threads (python#3252)
  bpo-31217: Fix regrtest -R for small integer (python#3260)
  bpo-30096: Use ABC in abc reference examples (python#1220)
  bpo-30737: Update DevGuide links to new URL (pythonGH-3228)
  [Trivial] Remove now redundant assert (python#3245)
  ...
GadgetSteve pushed a commit to GadgetSteve/cpython that referenced this pull request Sep 10, 2017
…ep reference to returned object (python#1560)

* bpo-27144: concurrent.futures as_complie and map iterators do not keep
reference to returned object

* Some nits.  Improve wordings in docstrings and comments, and avoid relying on
sys.getrefcount() in tests.
@@ -191,16 +205,18 @@ def as_completed(fs, timeout=None):
if timeout is not None:
end_time = timeout + time.time()

total_futures = len(fs)

fs = set(fs)
Copy link
Contributor

@ambv ambv Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression. The ordering of instructions here is wrong. Now fs must be a sequence to support the len(fs).

ambv pushed a commit to ambv/cpython that referenced this pull request Sep 29, 2017
…ted()`

This was possible before.  pythonGH-1560 introduced a regression after 3.6.2 got
released where only sequences were accepted now.  This commit addresses this
problem.
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 29, 2017
…completed()` (pythonGH-3830)

This was possible before.  pythonGH-1560 introduced a regression after 3.6.2 got
released where only sequences were accepted now.  This commit addresses this
problem.
(cherry picked from commit 574562c)
ambv added a commit that referenced this pull request Sep 29, 2017
…ted()` (#3830)

This was possible before.  GH-1560 introduced a regression after 3.6.2 got
released where only sequences were accepted now.  This commit addresses this
problem.
ambv pushed a commit that referenced this pull request Sep 29, 2017
…completed()` (GH-3830) (#3831)

This was possible before.  GH-1560 introduced a regression after 3.6.2 got
released where only sequences were accepted now.  This commit addresses this
problem.
(cherry picked from commit 574562c)
ned-deily pushed a commit to ned-deily/cpython that referenced this pull request Oct 3, 2017
…completed()` (pythonGH-3830) (python#3831)

This was possible before.  pythonGH-1560 introduced a regression after 3.6.2 got
released where only sequences were accepted now.  This commit addresses this
problem.
(cherry picked from commit 574562c)
dalcinl added a commit to dalcinl/pythonfutures that referenced this pull request Oct 3, 2017
agronholm pushed a commit to agronholm/pythonfutures that referenced this pull request Nov 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants