Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional weights to random.sample() #84749

Open
rhettinger opened this issue May 8, 2020 · 5 comments
Open

Add optional weights to random.sample() #84749

rhettinger opened this issue May 8, 2020 · 5 comments
Labels
3.9 stdlib type-feature

Comments

@rhettinger
Copy link
Contributor

@rhettinger rhettinger commented May 8, 2020

BPO 40569
Nosy @tim-one, @rhettinger, @mdickinson, @cryvate, @dhdavvie
PRs
  • #20014
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-05-08.19:56:47.608>
    labels = ['type-feature', 'library', '3.9']
    title = 'Add optional weights to random.sample()'
    updated_at = <Date 2020-05-10.09:06:35.481>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2020-05-10.09:06:35.481>
    actor = 'dheiberg'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-05-08.19:56:47.608>
    creator = 'rhettinger'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40569
    keywords = ['patch']
    message_count = 5.0
    messages = ['368458', '368559', '368562', '368565', '368573']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'cryvate', 'dheiberg']
    pr_nums = ['20014']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40569'
    versions = ['Python 3.9']

    @rhettinger
    Copy link
    Contributor Author

    @rhettinger rhettinger commented May 8, 2020

    Weighted sampling without replacement isn't easy for a user to do with the current tooling. There is a StackOverflow question about how to do it. Also, this service is currently offered by numpy.

    Use it like this:

        >>> sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[20, 1, 42, 10], k=2)
        ['prim', 'peeta']

    Here's an efficient implementation similar to how numpy does it:

    --- a/Lib/random.py
    +++ b/Lib/random.py
    @@ -331,7 +331,7 @@ class Random(_random.Random):
    j = _int(random() * (i+1))
    x[i], x[j] = x[j], x[i]

    • def sample(self, population, k, *, counts=None):
      + def sample(self, population, k, *, counts=None, weights=None):
      """Chooses k unique random elements from a population sequence or set.
             Returns a new list containing elements from the population while
    @@ -392,6 +392,18 @@ class Random(_random.Random):
             if not isinstance(population, _Sequence):
                 raise TypeError("Population must be a sequence.  For dicts or sets, use sorted(d).")
             n = len(population)
    +        if weights is not None:
    +            if counts is not None:
    +                raise TypeError('Cannot specify both counts and weights')
    +            weights = list(weights)
    +            positions = range(n)
    +            indices = []
    +            while (needed := k - len(indices)):
    +                for i in choices(positions, weights, k=needed):
    +                    if weights[i]:
    +                        weights[i] = 0.0
    +                        indices.append(i)
    +            return [population[i] for i in indices]
             if counts is not None:
                 cum_counts = list(_accumulate(counts))
                 if len(cum_counts) != n:

    @rhettinger rhettinger added 3.9 stdlib type-feature labels May 8, 2020
    @dhdavvie
    Copy link
    Mannequin

    @dhdavvie dhdavvie mannequin commented May 10, 2020

    Just playing around with this and I think one thing to consider is how to handle negative weights. Should they even be allowed? One interesting behaviour I encountered with the current draft is the following:

        >>> r.sample(['katniss', 'prim', 'gale', 'peeta'] , weights=[-2,1,1,1], k=1)
        ['peeta']

    This will always return ['peeta'], or whatever happens to be in the last index. I believe this is because in choices, the cum_weights list would be [-2, -1, 0, 1], causing it to only be able to select the last value in the population.

    Looking into random.choices, it suggests that the weights were designed with negative ones in mind. However they were not accounted for and end up influencing the weights of the indexes around it. Another example is this:

        >>> r.choices(['alice', 'bob', 'camile', 'david'], weights=[1, 1, -2, 1])
        ['alice']

    This will always return ['alice'], showing that the negative weight of 'camile' is affecting indexes around it.
    Is there something I am missing? Otherwise this seems like it may warrant its own bug.

    @Cryvate
    Copy link
    Mannequin

    @Cryvate Cryvate mannequin commented May 10, 2020

    @rhettinger: I like this proposal.

    @dheiberg: to me, it seems negative weights are explicitly not supported, from: https://docs.python.org/3/library/random.html#random.choices

    "Weights are assumed to be non-negative."

    Unless I am missing something. In my mind negative weights make no sense and the fact it produces garbage in your example is garbage-in-garbage-out.

    Maybe an enhancement could be made (but it would be breaking for some abusing the behaviour you showed) for it to error (or do something sane? If such a thing exists) but that should be a separate issue.

    @rhettinger
    Copy link
    Contributor Author

    @rhettinger rhettinger commented May 10, 2020

    Negative weights are undefined for choices() as well. Just like bisect() and merge() don't verify their inputs are sorted. Usually, full scan precondition checks are expensive in pure python and aren't worth penalizing correct code. As Henk-Jaap points-out, negative weights are just a special case of meaningless inputs.

    @dhdavvie
    Copy link
    Mannequin

    @dhdavvie dhdavvie mannequin commented May 10, 2020

    Ahh I see, thanks for clearing that up!

    On Sun, May 10, 2020, 04:41 Raymond Hettinger <report@bugs.python.org>
    wrote:

    Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:

    Negative weights are undefined for choices() as well. Just like bisect()
    and merge() don't verify their inputs are sorted. Usually, full scan
    precondition checks are expensive in pure python and aren't worth
    penalizing correct code. As Henk-Jaap points-out, negative weights are just
    a special case of meaningless inputs.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue40569\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 stdlib type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant