Skip to content

Enhance itertools.takewhile() to allow the failed transition element to captured #113479

Closed
@rhettinger

Description

@rhettinger

The current version of takewhile() has a problem. The element that first fails the predicate condition is consumed from the iterator and there is no way to access it. This is the premise behind the existing recipe before_and_after().

I propose to extend the current API to allow that element to be captured. This is fully backwards compatible but addresses use cases that need all of the data not returned by the takewhile iterator.

Option 0:

In pure Python, the new takewhile() could look like this:

def takewhile(predicate, iterable, *, transition=None):
    # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4
    for x in iterable:
        if predicate(x):
            yield x
        else:
            if transition is not None:      # <-- This is the new part
                transition.append(x)        # <-- This is the new part
            break

It could be used like this:

>>> input_it = iter([1, 4, 6 ,4, 1])
>>> transition_list = []
>>> takewhile_it = takewhile(lambda x: x<5, input_it, transition=transition_list)
>>> print('Under five:', list(takewhile_it))
[1, 4]
>>> remainder = chain(transition_list, input_it)
>>> print('Remainder:', list(remainder)) 
[6, 4, 1]

The API is a bit funky. While this pattern is common in C programming, I rarely see something like it in Python. This may be the simplest solution for accessing the last value (if any) consumed from the input. The keyword argument transition accurately describes a list containing the transition element if there is one, but some other parameter name may be better.

Option 1:

We could have a conditional signature that returns two iterators if a flag is set:

true_iterator = takewhile(predicate, iterable, remainder=False)
true_iterator, remainder_iterator = takewhile(predicate, iterable, remainder=True)

Option 2:

Create a completely separate itertool by promoting the before_and_after() recipe to be a real itertool:

true_iterator, remainder_iterator = before_and_after(predicate, iterable)

I don't really like option 2 because it substantially duplicates takewhile() leaving a permanent tension between the two.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions