Skip to content

random_combination_with_replacement recipe has misleading docstring #102653

Closed
@pochmann

Description

@pochmann

Documentation

The random module has four recipes that are supposed to "efficiently make random selections from the combinatoric iterators in the itertools module". And their docstrings all say "Random selection from [iterator]". Both suggest they're equivalent to random.choice(list(iterator)), just efficiently.

For example, itertools.combinations_with_replacement([0, 1], r=4) produces these five combinations:

(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 1, 1)
(0, 1, 1, 1)
(1, 1, 1, 1)

So random.choice(list(iterator)) would return one of those five with 20% probability each.

But the random_combination_with_replacement recipe instead produces these probabilities:

(0, 0, 0, 0)  6.25%
(0, 0, 0, 1) 25.00%
(0, 0, 1, 1) 37.50%
(0, 1, 1, 1) 25.00%
(1, 1, 1, 1)  6.25%

Here's an implementation that is equivalent to random.choice(list(iterator)):

def random_combination_with_replacement(iterable, r):
    "Random selection from itertools.combinations_with_replacement(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.sample(range(n+r-1), k=r))
    return tuple(pool[i-j] for j, i in enumerate(indices))

One can view the combinations as the result of actually simulating r random draws with replacement, where the multiset {0,0,1,1} indeed occurs more often, namely as 0011, 0101, 0110, etc. But that is not the only valid view and isn't the view suggested by the documentation (as my first paragraph argued). Though if that view and the bias is the intention, then I suggest its documentation should mention the bias.

Test code

Attempt This Online!

import random
import itertools
from collections import Counter

iterable = [0, 1]
r = 4


#-- itertools ----------------------

print('itertools')
for comb in itertools.combinations_with_replacement(iterable, r):
    print(comb)


#-- from iterator ------------------

def random_combination_with_replacement_from_iterator(iterable, r):
    "Random selection from itertools.combinations_with_replacement(iterable, r)"
    iterator = itertools.combinations_with_replacement(iterable, r)
    return random.choice(list(iterator))


#-- current random recipe ----------

def random_combination_with_replacement(iterable, r):
    "Random selection from itertools.combinations_with_replacement(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.choices(range(n), k=r))
    return tuple(pool[i] for i in indices)


#-- proposed random recipe ---------

def random_combination_with_replacement_proposal(iterable, r):
    "Random selection from itertools.combinations_with_replacement(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.sample(range(n+r-1), k=r))
    return tuple(pool[i-j] for j, i in enumerate(indices))


#-- Comparison

for func in random_combination_with_replacement_from_iterator, random_combination_with_replacement, random_combination_with_replacement_proposal:
    print()
    print(func.__name__)
    N = 100000
    ctr = Counter(func(iterable, r) for _ in range(N))
    for comb, freq in sorted(ctr.items()):
        print(comb, f'{freq/N:6.2%}')
Test results
itertools
(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 1, 1)
(0, 1, 1, 1)
(1, 1, 1, 1)

random_combination_with_replacement_from_iterator
(0, 0, 0, 0) 19.89%
(0, 0, 0, 1) 20.08%
(0, 0, 1, 1) 20.01%
(0, 1, 1, 1) 19.88%
(1, 1, 1, 1) 20.14%

random_combination_with_replacement
(0, 0, 0, 0)  6.14%
(0, 0, 0, 1) 24.98%
(0, 0, 1, 1) 37.71%
(0, 1, 1, 1) 25.04%
(1, 1, 1, 1)  6.13%

random_combination_with_replacement_proposal
(0, 0, 0, 0) 20.17%
(0, 0, 0, 1) 19.82%
(0, 0, 1, 1) 20.18%
(0, 1, 1, 1) 19.88%
(1, 1, 1, 1) 19.95%

Linked PRs

Metadata

Metadata

Assignees

Labels

3.11only security fixes3.12only security fixesdocsDocumentation in the Doc dir

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions