New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional weights to random.sample() #84749
Comments
Weighted sampling without replacement isn't easy for a user to do with the current tooling. There is a StackOverflow question about how to do it. Also, this service is currently offered by numpy. Use it like this: >>> sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[20, 1, 42, 10], k=2)
['prim', 'peeta'] Here's an efficient implementation similar to how numpy does it: --- a/Lib/random.py
Returns a new list containing elements from the population while
@@ -392,6 +392,18 @@ class Random(_random.Random):
if not isinstance(population, _Sequence):
raise TypeError("Population must be a sequence. For dicts or sets, use sorted(d).")
n = len(population)
+ if weights is not None:
+ if counts is not None:
+ raise TypeError('Cannot specify both counts and weights')
+ weights = list(weights)
+ positions = range(n)
+ indices = []
+ while (needed := k - len(indices)):
+ for i in choices(positions, weights, k=needed):
+ if weights[i]:
+ weights[i] = 0.0
+ indices.append(i)
+ return [population[i] for i in indices]
if counts is not None:
cum_counts = list(_accumulate(counts))
if len(cum_counts) != n: |
Just playing around with this and I think one thing to consider is how to handle negative weights. Should they even be allowed? One interesting behaviour I encountered with the current draft is the following: >>> r.sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[-2,1,1,1], k=1)
['peeta'] This will always return ['peeta'], or whatever happens to be in the last index. I believe this is because in Looking into random.choices, it suggests that the weights were designed with negative ones in mind. However they were not accounted for and end up influencing the weights of the indexes around it. Another example is this: >>> r.choices(['alice', 'bob', 'camile', 'david'], weights=[1, 1, -2, 1])
['alice'] This will always return ['alice'], showing that the negative weight of 'camile' is affecting indexes around it. |
@rhettinger: I like this proposal. @dheiberg: to me, it seems negative weights are explicitly not supported, from: https://docs.python.org/3/library/random.html#random.choices "Weights are assumed to be non-negative." Unless I am missing something. In my mind negative weights make no sense and the fact it produces garbage in your example is garbage-in-garbage-out. Maybe an enhancement could be made (but it would be breaking for some abusing the behaviour you showed) for it to error (or do something sane? If such a thing exists) but that should be a separate issue. |
Negative weights are undefined for choices() as well. Just like bisect() and merge() don't verify their inputs are sorted. Usually, full scan precondition checks are expensive in pure python and aren't worth penalizing correct code. As Henk-Jaap points-out, negative weights are just a special case of meaningless inputs. |
Ahh I see, thanks for clearing that up! On Sun, May 10, 2020, 04:41 Raymond Hettinger <report@bugs.python.org>
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: