Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Memory-efficient Implementation of itertools.cycle #17783

Open
wants to merge 1 commit into
base: master
from

Conversation

@youkaichao
Copy link

youkaichao commented Jan 1, 2020

Requires no additional copy of the iterable.

Standard implementation of cycle looks like:

def cycle(iterable): # v1
    # cycle('ABCD') --> A B C D A B C D A B C D ...
    saved = []
    for element in iterable:
        yield element
        saved.append(element)
    while saved:
        for element in saved:
              yield element

We propose to change it to

def cycle(iterable): # v2
    # cycle('ABCD') --> A B C D A B C D A B C D ...
    while True:
        for element in iterable:
              yield element

Drawback: a little incompatible with the previous cycle

v1 can accept an iteratable which can be iterated only once while v2 cannot.

Benefit: more memory-efficient

It doesn't require an additional copy.

I think v2 is better. If the caller wants to cycle through an iteratable (named it) which can be iterated only once, he can use cycle(list(it)). In most cases, the caller holds repeatable iterable, so cycle(it) is ok without additional memory consumption.

Benefit for DeepLearning

And there are cases where multiple iter calls to an iterable returns different iterators! The dataloader for DeepLearning is a perfect example:

# dl is a dataloader
first = [img, label for img, label in dl]
second = [img, label for img, label in dl]
assert first[0] != second[0]

What's more, in DeepLearning, it is not possible to store the whole dataset in memory, but the dataloader can be called multiple times to get the iterator:

# v1
cycle(dl) # memory not enough

# v2
cycle(dl) # no problem

So I hope we can get this more memory efficient cycle :)

Requires no additional copy of the iterable.
@youkaichao youkaichao requested a review from rhettinger as a code owner Jan 1, 2020
@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Jan 1, 2020

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@youkaichao

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.