bpo-38938: Added special case to heapq.merge for small numbers of iterables #17422

sweeneyde · 2019-12-01T03:05:23Z

https://bugs.python.org/issue38938

rhettinger · 2019-12-01T03:19:52Z

Please don't reformat the whole module. Make sure the diff touches a minimum number of lines.

Also, I had in mind just the special case for n<=2 (just a single variables for each iterable, no binary tree).

This reverts commit 8cd419e.

rhettinger · 2019-12-01T18:42:22Z

Lib/heapq.py

@@ -329,6 +329,116 @@ def merge(*iterables, key=None, reverse=False):

    '''

+    n = len(iterables)
+    if n == 0:


Let's drop the n==0 and n==1 cases. They were already fast.

bedevere-bot · 2019-12-01T18:42:40Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

sweeneyde · 2019-12-01T20:16:06Z

I have made the requested changes; please review again.

bedevere-bot · 2019-12-01T20:16:09Z

Thanks for making the requested changes!

@rhettinger: please review the changes made to this pull request.

rhettinger · 2019-12-02T02:02:57Z

Stand-by for a bit. Am still reviewing the code. The binary tree variant is looking better than expected:

Am trying out starting the function with:

def merge(*seq):
    if len(seq) > 2:
        n = (len(seq) + 1) // 2
        seq = merge(*seq[:n]), merge(*seq[n:])
   ...

So far in testing, the binary tree variant is making fewer comparisons than the heap variant when n > 3.

rhettinger · 2019-12-02T02:05:12Z

Also, for the comparisons, we're going to need to flip them around and/or invert them so that only __lt__ is used, the same as we do for bisect(), the other heapq functions, and sorted().

sweeneyde · 2019-12-02T03:29:06Z

Using the binary-tree-of-iterables approach, is there a good way to prevent the re-computation of keys? The only way I've thought of is by taking iterables = [((key(x), i, x) for i, x in enumerate(it)) for it in iterables], combining using the same approach, then yielding only the last part of each tuple in the final step.

tim-one · 2019-12-03T02:41:46Z

is there a good way to prevent the re-computation of keys? The only way I've thought of is by taking iterables = [((key(x), i, x) for i, x in enumerate(it))

I don't think there's a need for enumerate(), because this other approach really doesn't need to use tuple comparisons at all. Comparing tuples is another layer of expense.

Instead each input iterable should be wrapped in a generator that yields (element, key(element)) pairs (or in the other order - doesn't matter). The keys should be compared directly; the elements should never be compared; and an index from enumerate() would be useless then.

sweeneyde · 2019-12-03T03:52:53Z

So then for the case where n > 2, to account for the key recursively, I think we could do something like the following:

# outside the function:
_SUB0 = operator.itemgetter(0)
_SUB1 = operator.itemgetter(1)

...

# after ensuring len(iterables) > 2:
n2 = len(iterables)//2
if key is None:
    result_1 = merge(*iterables[:n2], key=None, reverse=reverse)
    result_2 = merge(*iterables[n2:], key=None, reverse=reverse)
    yield from merge(result_1, result_2, key=None, reverse=reverse)
elif key is _SUB1:
    # recursive call, keys already computed
    result_1 = merge(*iterables[:n2], key=_SUB1, reverse=reverse)
    result_2 = merge(*iterables[n2:], key=_SUB1, reverse=reverse)
    yield from merge(result_1, result2, key=_SUB1, reverse=reverse)
else:
    # Top-level call: zip all of the iterators up with their keys once and for all.
    key_iterables = [zip(it, map(key, it)) for it in iterables]
    # merge the zips, keyed by the second attribute (where we put the key)
    result_1 = merge(*key_iterables[:n2], key=_SUB1, reverse=reverse)
    result_2 = merge(*key_iterables[n2:], key=_SUB1, reverse=reverse)
    result = merge(result_1, result_2, key=_SUB1, reverse=reverse)
    # Now unpack the original values
    yield from map(_SUB0, result)

Are inter-module dependencies like this operator call okay? Or would we be better doing
def _SUB0(x): return x[0], etc?

tim-one · 2019-12-03T18:54:47Z

Using functions from other modules generally isn't a problem. I'm not exactly clear on what you have in mind, though. I had this in mind, just showing a typical chunk of what would be the path in the case where there is a key function and it's not reversed:

if a_key < b_key:
    yield a_tuple
    try:
        a_value, a_key = a_tuple = next_a()
    except StopIteration:
        yield b_tuple
        yield from b_iter
        return

I can't guess whether this would be slower or faster then using an itemgetter, and really don't care - I don't stress over micro-optimizations one way or the other, as they usually end up depending a whole lot on the specific compiler+options being used, and "the answer" can change over time anyway. If timing shows a "big" difference, then it may be worth giving up a bit of clarity.

sweeneyde · 2019-12-04T10:12:25Z

I was suggesting leaving the n==2 case as I had implemented, and then for more iterables, reducing to this 2-case in a way that prevents the re-computation of keys. The current diff contains this alternative algorithm.

I did some rough benchmarking on this version using the code here (https://pastebin.com/qhKzcLzZ), and it was finding better performance for the tree-of-iterables version in almost all situations, especially when no key is used, and especially especially in the n==2 case.

I don't think there should be too much of a concern about a very unbalanced tree either, because each comparison advances some item toward the root of the tree, and each item must be advanced treedepth times, which is bounded above by ceil(log2(len(iterables))). The total number of comparisons needed is bounded above by (# of total items) * ceil(log2(len(iterables))), fewer than is typically used by repeated applications of heapreplace.

tim-one · 2019-12-04T17:00:18Z

I don't believe that viewing this as a binary tree was helpful to begin with 😉. At the extreme, we're merging N 1-element iterables, and then the new approach is just an ordinary top-down recursive merge sort, but with lazy merging. As such, it "should" do fewer compares than a heap-based sort.

tim-one · 2019-12-04T22:25:19Z

Without trying it, my guess is that this kind of thing would be a relatively poor case for the newer approach:

>>> import heapq
>>> N = 2**20
>>> raw = [[0]] * N
>>> raw.append([1] * N)
>>> xs = heapq.merge(*raw)
>>> xs = list(xs)
>>> xs == [0] * N + [1] * N
True

There are over a million iterables here, so the heap grows to about 20 levels deep, and a recursive mergesort goes down about 20 levels. After about half the results are delivered, the heap is reduced to a single element, which pushes and pops 1 N times. The mergesort is similarly reduced to being fed by only the original [1] * N iterable, but that remains buried under about 20 levels of recursive yielding. Unless I'm picturing something wrong.

If I'm not, the layers of recursive yielding could be considerably more expensive than pushing and popping from a 1-element heap N times.

But this is all highly contrived.

This reverts commit 937d313.

sweeneyde · 2019-12-05T03:02:04Z

Oops--it looks like you're right: I didn't benchmark diverse enough inputs. I did a different benchmark (https://pastebin.com/88jkw5Yx with results visualized at https://imgur.com/a/mtEO0uA) testing different levels of "balance" for the lengths of the iterables, and it looks like the 2-iterable case was the only one that made nontrivial improvement across the board. The rest of the time, in general, I believe the automatic shrinking of the heap and indifference of the heap performance to the sources of its items beat out any small constant factor decrease in the number of comparisons from the recursive mergesort approach.

My apologies for the repeated changes, and thank you both for your guidance.

tim-one · 2019-12-05T03:21:31Z

I'm not making a judgment 😉. Comparisons can be arbitrarily expensive in Python, and when they are an approach that cuts the number of compares can beat anything else.

Raymond needs to chime in too. What I guess is most common: no more than a few dozen iterable inputs, all of approximately the same length, with perhaps one unusually short oddball. That's what you naturally get when, e.g., you use merge() to sort a sequence too large to fit in RAM, and break it into sub-RAM sized pieces, each of which is .sort()ed independently, and written to disk. In such cases the heap never gets very deep, but neither does the recursion depth, and the heap doesn't get materially smaller until very near the end.

Optimizing for a million iterables (as my "bad case" example did) would be plain insane 😉.

… to avoid masking StopIteration from key computation or comparison. This reverts commit c149212.

rhettinger · 2019-12-15T22:31:14Z

Thanks for all the comments. I'll have more time for this one over the holidays and will bring it to fruition.

sweeneyde · 2019-12-19T23:11:05Z

https://pastebin.com/t048iBeL

Here's an interesting non-recursive pure-Python version of the mergesort-style algorithm. It seems to regularly beat out a forced-Python import of the current heapq module, and it was beating the heapq module across the board on my machine running pypy, so maybe that is an indication that it would outperform the existing heapq.merge function if implemented in C.

sweeneyde · 2019-12-28T10:27:03Z

I was curious and wanted to get to know some cpython internals, so I took a crack at a C implementation: #17729.

sweeneyde added 2 commits November 30, 2019 21:21

heapq.merge special case for few iterables

8cd419e

fixed spacing

d93f918

sweeneyde requested a review from rhettinger as a code owner December 1, 2019 03:05

the-knights-who-say-ni added the CLA signed label Dec 1, 2019

bedevere-bot added the awaiting review label Dec 1, 2019

rhettinger self-assigned this Dec 1, 2019

sweeneyde added 5 commits November 30, 2019 22:24

oops

6b286a3

oops

31989bf

oops

abd79b4

Revert "heapq.merge special case for few iterables"

3c0a902

This reverts commit 8cd419e.

added special case for two iterables

159f245

sweeneyde force-pushed the master branch from 6450253 to 159f245 Compare December 1, 2019 03:59

📜🤖 Added by blurb_it.

40ccb5d

rhettinger requested changes Dec 1, 2019

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Dec 1, 2019

sweeneyde added 2 commits December 1, 2019 14:53

removed special cases for zero or one iterator

9807e68

Merge branch 'master' of https://github.com/sweeneyde/cpython

99cdc9b

bedevere-bot removed the awaiting changes label Dec 1, 2019

bedevere-bot added the awaiting change review label Dec 1, 2019

bedevere-bot requested a review from rhettinger December 1, 2019 20:16

sweeneyde added 3 commits December 1, 2019 21:56

fixed missing key calculation. Now using only <

de8fb05

removed negation of reverse

02ede9e

made comment placement consistent

208d0d9

changed test_merge case to check many different numbers of iterables

c373a76

committed to the recursive binary tree approach

937d313

Revert "committed to the recursive binary tree approach"

9f43541

This reverts commit 937d313.

sweeneyde added 2 commits December 6, 2019 09:53

factored out try/except from loop for more performance gains

c149212

Revert "factored out try/except from loop for more performance gains"…

b900aa1

… to avoid masking StopIteration from key computation or comparison. This reverts commit c149212.

sweeneyde closed this May 31, 2020

sweeneyde deleted the master branch January 25, 2022 23:10

sweeneyde mentioned this pull request Apr 10, 2022

Possible performance improvement for heapq.merge() #83119

Closed

Uh oh!

bpo-38938: Added special case to heapq.merge for small numbers of iterables #17422

bpo-38938: Added special case to heapq.merge for small numbers of iterables #17422

Uh oh!

Conversation

sweeneyde commented Dec 1, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Dec 1, 2019

Uh oh!

rhettinger Dec 1, 2019

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Dec 1, 2019

Uh oh!

sweeneyde commented Dec 1, 2019

Uh oh!

bedevere-bot commented Dec 1, 2019

Uh oh!

rhettinger commented Dec 2, 2019

Uh oh!

rhettinger commented Dec 2, 2019

Uh oh!

sweeneyde commented Dec 2, 2019

Uh oh!

tim-one commented Dec 3, 2019

Uh oh!

sweeneyde commented Dec 3, 2019

Uh oh!

tim-one commented Dec 3, 2019

Uh oh!

sweeneyde commented Dec 4, 2019

Uh oh!

tim-one commented Dec 4, 2019

Uh oh!

tim-one commented Dec 4, 2019

Uh oh!

sweeneyde commented Dec 5, 2019

Uh oh!

tim-one commented Dec 5, 2019

Uh oh!

rhettinger commented Dec 15, 2019

Uh oh!

sweeneyde commented Dec 19, 2019

Uh oh!

sweeneyde commented Dec 28, 2019

Uh oh!

Uh oh!

sweeneyde commented Dec 1, 2019 •

edited by bedevere-bot

Loading