Description
The correctness of the Fraction.limit_denominator
method is not immediately obvious, and extracting a proof of correctness from external sources is nontrivial: Wikipedia has statements without proofs at https://en.wikipedia.org/wiki/Continued_fraction#Best_rational_approximations (and at the time of writing those statements aren't 100% correct), and a typical elementary number theory book will have you wade through a whole chapter of theory on continued fractions to get to something useful. In spite of that, a targeted correctness proof for limit_denominator
is straightforward, and doesn't require developing the whole theory of continued fractions (or even mentioning them).
For future maintainers of Fraction.limit_denominator
, it would be useful to have the correctness proof recorded somewhere: when I recently had to update the limit_denominator
code in #93730, I had to stare at the code for a good hour to figure out why it worked at all, despite the fact that I wrote that code in the first place (albeit a good number of years ago). Moreover, the new tie-break condition, while more efficient than the old one, is also less obvious.
I'm not sure what the best place to record the proof is: the fractions.py
source? A text file alongside the source? For now, I'll record the proof in this issue, which is at least better than nothing.
Note on originality: the proof below is my own, and not a transcription from another source. I haven't seen any proof along these lines in the literature, but this is an extremely well-traveled road, so it's very likely that there's something closely resembling it somewhere. References would be welcome.
Set-up: simplifying the code
We'll prove correctness for a slightly simpler version of the code than what's currently in the fractions
module. Here's that simpler version:
def limit_denominator(m: int, n: int, l: int) -> tuple[int, int]:
"""
Given a fraction m/n and a positive integer l, return integers r and s such
that r/s is the closest fraction to m/n with denominator bounded by l.
m/n need not be in lowest terms, but n must be positive.
On return, 0 < s ≤ l and gcd(r, s) = 1.
"""
a, b, p, q, r, s, v = n, m % n, 1, 0, m // n, 1, 1
while 0 < b and q + a // b * s <= l:
a, b, p, q, r, s, v = b, a % b, r, s, p + a // b * r, q + a // b * s, -v
t, u = p + (l - q) // s * r, q + (l - q) // s * s
return (r, s) if 2 * b * u <= n else (t, u)
The differences with the existing code are mostly cosmetic: there are some de-optimizations (like computing a//b
several times); we deal purely with integers to avoid the Fraction
unpacking and repacking; some variables are renamed, and we've removed the fast path where a Fraction
is returned unaltered if its denominator is already smaller than l
. The most significant difference is that we've unrolled the first loop-and-a-half of the while loop, with the side-effect of shifting the loop exit condition to the top of the loop. There's also an extra variable v
, which alternates between values 1
and -1
. It's not needed for computing the final result, but we'll be making use of it in the statements and proofs below. Finally, the original version of the code doesn't check that b
is positive before entering the loop - that check proves to be unnecessary if the fast path is present.
Proof overview
In a nutshell, the main loop is a variant of the extended Euclidean algorithm applied to
In more detail, we'll show in the sections below that after exiting the while loop and defining
$(ts - ru)v = 1$ $0 < s \le l$ $0 < u \le l$ $l < s + u$ -
$r/s \le m/n < t/u$ if$v=1$ , and$t/u < m/n \le r/s$ if$v = -1$
We now show that any fraction
Now
So in either case,
So either
Note that it follows from limit_denominator
code makes use of this to avoid doing extra unnecessary gcd
computations when creating the returned Fraction
instance.
Details: loop invariants
The initial state, and the state after every iteration of the while loop, satisfies the following six invariants:
$(ps - rq)v = 1$ $ar + bp = m$ $as + bq = n$ $0 \le b < a$ $0 \le q \le s \le l$ $0 < s$
These are all easily proved by induction: they hold for the initial state by inspection, and it's easy to see that the way that the variables are updated within the while loop keeps all of these properties invariant. Note that at the end of an iteration of the while loop, the condition
It also follows from these loop invariants that
To see these, simply expand the values of
Details: loop termination
For a complete proof of correctness we need to establish that our while
loop is not infinite - that it terminates after a finite number of iterations. But this follows immediately from the observation that the value of
Details: after the loop
Write
and
and
and finally,
By definition of floor,
We also note that since we've exited the loop at this point, either
Furthermore, since
Moreover, the above two inequalities also hold in the case
We've now established almost all the facts that were used in the proof overview above. The only thing that we're missing is that
Tie-breaking, part I
By the time we get to the final line of limit_denominator
, we know that either
Adding
which is the check that we use in the code.
In the case that limit_denominator
implementation claim that the bound with smaller denominator is returned in that case: i.e., that
Tie breaking, part II
There's a second part to the tie-breaking comment in the limit_denominator
source, that in the case of a tie, if both denominators are equal, then the lower bound is returned. (This can only happen in extremely limited circumstances, namely when
First note that in general it's true that at any point before, during, or after the loop, we have:
Since
Now in a tie-break situation we have
Now suppose that the denominators of our two candidates are equal:
But an examination of the while loop shows that the only time
Optimization: getting rid of the $0 < b$ check
Suppose that on entry to the limit_denominator
function, we already know that
So with