-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-38938: Added special case to heapq.merge for small numbers of iterables #17422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please don't reformat the whole module. Make sure the diff touches a minimum number of lines. Also, I had in mind just the special case for n<=2 (just a single variables for each iterable, no binary tree). |
Lib/heapq.py
Outdated
@@ -329,6 +329,116 @@ def merge(*iterables, key=None, reverse=False): | |||
|
|||
''' | |||
|
|||
n = len(iterables) | |||
if n == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop the n==0 and n==1 cases. They were already fast.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @rhettinger: please review the changes made to this pull request. |
Stand-by for a bit. Am still reviewing the code. The binary tree variant is looking better than expected: Am trying out starting the function with:
So far in testing, the binary tree variant is making fewer comparisons than the heap variant when |
Also, for the comparisons, we're going to need to flip them around and/or invert them so that only |
Using the binary-tree-of-iterables approach, is there a good way to prevent the re-computation of keys? The only way I've thought of is by taking |
I don't think there's a need for Instead each input iterable should be wrapped in a generator that yields |
So then for the case where # outside the function:
_SUB0 = operator.itemgetter(0)
_SUB1 = operator.itemgetter(1)
...
# after ensuring len(iterables) > 2:
n2 = len(iterables)//2
if key is None:
result_1 = merge(*iterables[:n2], key=None, reverse=reverse)
result_2 = merge(*iterables[n2:], key=None, reverse=reverse)
yield from merge(result_1, result_2, key=None, reverse=reverse)
elif key is _SUB1:
# recursive call, keys already computed
result_1 = merge(*iterables[:n2], key=_SUB1, reverse=reverse)
result_2 = merge(*iterables[n2:], key=_SUB1, reverse=reverse)
yield from merge(result_1, result2, key=_SUB1, reverse=reverse)
else:
# Top-level call: zip all of the iterators up with their keys once and for all.
key_iterables = [zip(it, map(key, it)) for it in iterables]
# merge the zips, keyed by the second attribute (where we put the key)
result_1 = merge(*key_iterables[:n2], key=_SUB1, reverse=reverse)
result_2 = merge(*key_iterables[n2:], key=_SUB1, reverse=reverse)
result = merge(result_1, result_2, key=_SUB1, reverse=reverse)
# Now unpack the original values
yield from map(_SUB0, result) Are inter-module dependencies like this |
Using functions from other modules generally isn't a problem. I'm not exactly clear on what you have in mind, though. I had this in mind, just showing a typical chunk of what would be the path in the case where there is a key function and it's not reversed: if a_key < b_key:
yield a_tuple
try:
a_value, a_key = a_tuple = next_a()
except StopIteration:
yield b_tuple
yield from b_iter
return I can't guess whether this would be slower or faster then using an itemgetter, and really don't care - I don't stress over micro-optimizations one way or the other, as they usually end up depending a whole lot on the specific compiler+options being used, and "the answer" can change over time anyway. If timing shows a "big" difference, then it may be worth giving up a bit of clarity. |
I was suggesting leaving the n==2 case as I had implemented, and then for more iterables, reducing to this 2-case in a way that prevents the re-computation of keys. The current diff contains this alternative algorithm. I did some rough benchmarking on this version using the code here (https://pastebin.com/qhKzcLzZ), and it was finding better performance for the tree-of-iterables version in almost all situations, especially when no key is used, and especially especially in the n==2 case. I don't think there should be too much of a concern about a very unbalanced tree either, because each comparison advances some item toward the root of the tree, and each item must be advanced treedepth times, which is bounded above by ceil(log2(len(iterables))). The total number of comparisons needed is bounded above by |
I don't believe that viewing this as a binary tree was helpful to begin with 😉. At the extreme, we're merging |
Without trying it, my guess is that this kind of thing would be a relatively poor case for the newer approach: >>> import heapq
>>> N = 2**20
>>> raw = [[0]] * N
>>> raw.append([1] * N)
>>> xs = heapq.merge(*raw)
>>> xs = list(xs)
>>> xs == [0] * N + [1] * N
True There are over a million iterables here, so the heap grows to about 20 levels deep, and a recursive mergesort goes down about 20 levels. After about half the results are delivered, the heap is reduced to a single element, which pushes and pops 1 If I'm not, the layers of recursive yielding could be considerably more expensive than pushing and popping from a 1-element heap But this is all highly contrived. |
This reverts commit 937d313.
Oops--it looks like you're right: I didn't benchmark diverse enough inputs. I did a different benchmark (https://pastebin.com/88jkw5Yx with results visualized at https://imgur.com/a/mtEO0uA) testing different levels of "balance" for the lengths of the iterables, and it looks like the 2-iterable case was the only one that made nontrivial improvement across the board. The rest of the time, in general, I believe the automatic shrinking of the heap and indifference of the heap performance to the sources of its items beat out any small constant factor decrease in the number of comparisons from the recursive mergesort approach. My apologies for the repeated changes, and thank you both for your guidance. |
I'm not making a judgment 😉. Comparisons can be arbitrarily expensive in Python, and when they are an approach that cuts the number of compares can beat anything else. Raymond needs to chime in too. What I guess is most common: no more than a few dozen iterable inputs, all of approximately the same length, with perhaps one unusually short oddball. That's what you naturally get when, e.g., you use Optimizing for a million iterables (as my "bad case" example did) would be plain insane 😉. |
… to avoid masking StopIteration from key computation or comparison. This reverts commit c149212.
Thanks for all the comments. I'll have more time for this one over the holidays and will bring it to fruition. |
Here's an interesting non-recursive pure-Python version of the mergesort-style algorithm. It seems to regularly beat out a forced-Python import of the current heapq module, and it was beating the heapq module across the board on my machine running pypy, so maybe that is an indication that it would outperform the existing heapq.merge function if implemented in C. |
I was curious and wanted to get to know some cpython internals, so I took a crack at a C implementation: #17729. |
https://bugs.python.org/issue38938