Add optional weights to random.sample() #84749

rhettinger · 2020-05-08T19:56:48Z

BPO	40569
Nosy	@tim-one, @rhettinger, @mdickinson, @cryvate, @dhdavvie
PRs	#20014

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-05-08.19:56:47.608>
labels = ['type-feature', 'library', '3.9']
title = 'Add optional weights to random.sample()'
updated_at = <Date 2020-05-10.09:06:35.481>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2020-05-10.09:06:35.481>
actor = 'dheiberg'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2020-05-08.19:56:47.608>
creator = 'rhettinger'
dependencies = []
files = []
hgrepos = []
issue_num = 40569
keywords = ['patch']
message_count = 5.0
messages = ['368458', '368559', '368562', '368565', '368573']
nosy_count = 5.0
nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'cryvate', 'dheiberg']
pr_nums = ['20014']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue40569'
versions = ['Python 3.9']

rhettinger · 2020-05-08T19:56:47Z

Weighted sampling without replacement isn't easy for a user to do with the current tooling. There is a StackOverflow question about how to do it. Also, this service is currently offered by numpy.

Use it like this:

    >>> sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[20, 1, 42, 10], k=2)
    ['prim', 'peeta']

Here's an efficient implementation similar to how numpy does it:

--- a/Lib/random.py
+++ b/Lib/random.py
@@ -331,7 +331,7 @@ class Random(_random.Random):
j = _int(random() * (i+1))
x[i], x[j] = x[j], x[i]

def sample(self, population, k, *, counts=None):
+ def sample(self, population, k, *, counts=None, weights=None):
"""Chooses k unique random elements from a population sequence or set.

         Returns a new list containing elements from the population while
@@ -392,6 +392,18 @@ class Random(_random.Random):
         if not isinstance(population, _Sequence):
             raise TypeError("Population must be a sequence.  For dicts or sets, use sorted(d).")
         n = len(population)
+        if weights is not None:
+            if counts is not None:
+                raise TypeError('Cannot specify both counts and weights')
+            weights = list(weights)
+            positions = range(n)
+            indices = []
+            while (needed := k - len(indices)):
+                for i in choices(positions, weights, k=needed):
+                    if weights[i]:
+                        weights[i] = 0.0
+                        indices.append(i)
+            return [population[i] for i in indices]
         if counts is not None:
             cum_counts = list(_accumulate(counts))
             if len(cum_counts) != n:

dhdavvie · 2020-05-10T01:19:05Z

Just playing around with this and I think one thing to consider is how to handle negative weights. Should they even be allowed? One interesting behaviour I encountered with the current draft is the following:

    >>> r.sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[-2,1,1,1], k=1)
    ['peeta']

This will always return ['peeta'], or whatever happens to be in the last index. I believe this is because in choices, the cum_weights list would be [-2, -1, 0, 1], causing it to only be able to select the last value in the population.

Looking into random.choices, it suggests that the weights were designed with negative ones in mind. However they were not accounted for and end up influencing the weights of the indexes around it. Another example is this:

    >>> r.choices(['alice', 'bob', 'camile', 'david'], weights=[1, 1, -2, 1])
    ['alice']

This will always return ['alice'], showing that the negative weight of 'camile' is affecting indexes around it.
Is there something I am missing? Otherwise this seems like it may warrant its own bug.

Cryvate · 2020-05-10T02:39:40Z

@rhettinger: I like this proposal.

@dheiberg: to me, it seems negative weights are explicitly not supported, from: https://docs.python.org/3/library/random.html#random.choices

"Weights are assumed to be non-negative."

Unless I am missing something. In my mind negative weights make no sense and the fact it produces garbage in your example is garbage-in-garbage-out.

Maybe an enhancement could be made (but it would be breaking for some abusing the behaviour you showed) for it to error (or do something sane? If such a thing exists) but that should be a separate issue.

rhettinger · 2020-05-10T03:41:56Z

Negative weights are undefined for choices() as well. Just like bisect() and merge() don't verify their inputs are sorted. Usually, full scan precondition checks are expensive in pure python and aren't worth penalizing correct code. As Henk-Jaap points-out, negative weights are just a special case of meaningless inputs.

dhdavvie · 2020-05-10T09:06:35Z

Ahh I see, thanks for clearing that up!

On Sun, May 10, 2020, 04:41 Raymond Hettinger <report@bugs.python.org>
wrote:

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

Negative weights are undefined for choices() as well. Just like bisect()
and merge() don't verify their inputs are sorted. Usually, full scan
precondition checks are expensive in pure python and aren't worth
penalizing correct code. As Henk-Jaap points-out, negative weights are just
a special case of meaningless inputs.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue40569\>

rhettinger added 3.9 stdlib type-feature labels May 8, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

python / cpython Public

Add optional weights to random.sample() #84749

Add optional weights to random.sample() #84749

rhettinger commented May 8, 2020

rhettinger commented May 8, 2020

dhdavvie mannequin commented May 10, 2020

Cryvate mannequin commented May 10, 2020

rhettinger commented May 10, 2020

dhdavvie mannequin commented May 10, 2020

python / cpython Public

Add optional weights to random.sample() #84749

Add optional weights to random.sample() #84749

Comments

rhettinger commented May 8, 2020

rhettinger commented May 8, 2020

dhdavvie mannequin commented May 10, 2020

Cryvate mannequin commented May 10, 2020

rhettinger commented May 10, 2020

dhdavvie mannequin commented May 10, 2020