Skip to content

[3.8] bpo-36018: Add another example for NormalDist() (GH-18191) #18192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions Doc/library/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,42 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
>>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP
[1.4591308524824727, 1.8035946855390597, 2.175091447274739]

Normal distributions can be used to approximate `Binomial
distributions <http://mathworld.wolfram.com/BinomialDistribution.html>`_
when the sample size is large and when the probability of a successful
trial is near 50%.

For example, an open source conference has 750 attendees and two rooms with a
500 person capacity. There is a talk about Python and another about Ruby.
In previous conferences, 65% of the attendees preferred to listen to Python
talks. Assuming the population preferences haven't changed, what is the
probability that the rooms will stay within their capacity limits?

.. doctest::

>>> n = 750 # Sample size
>>> p = 0.65 # Preference for Python
>>> q = 1.0 - p # Preference for Ruby
>>> k = 500 # Room capacity

>>> # Approximation using the cumulative normal distribution
>>> from math import sqrt
>>> round(NormalDist(mu=n*p, sigma=sqrt(n*p*q)).cdf(k + 0.5), 4)
0.8402

>>> # Solution using the cumulative binomial distribution
>>> from math import comb, fsum
>>> round(fsum(comb(n, r) * p**r * q**(n-r) for r in range(k+1)), 4)
0.8402

>>> # Approximation using a simulation
>>> from random import seed, choices
>>> seed(8675309)
>>> def trial():
... return choices(('Python', 'Ruby'), (p, q), k=n).count('Python')
>>> mean(trial() <= k for i in range(10_000))
0.8398

Normal distributions commonly arise in machine learning problems.

Wikipedia has a `nice example of a Naive Bayesian Classifier
Expand Down