Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2020-10735: Prevent DoS by large int<->str conversions #95778

Open
gpshead opened this issue Aug 8, 2022 · 13 comments
Open

CVE-2020-10735: Prevent DoS by large int<->str conversions #95778

gpshead opened this issue Aug 8, 2022 · 13 comments
Assignees
Labels
3.7 3.8 3.9 3.10 3.11 3.12 type-bug An unexpected behavior, bug, or error type-feature A feature request or enhancement type-security A security issue

Comments

@gpshead
Copy link
Member

gpshead commented Aug 8, 2022

Problem

A Denial Of Service (DoS) issue was identified in CPython because we use binary bignum’s for our int implementation. A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.

It is quite common for Python code implementing network protocols and data serialization to do int(untrusted_string_or_bytes_value) on input to get a numeric value, without having limited the input length or to do log("processing thing id %s", unknowingly_huge_integer) or any similar concept to convert an int to a string without first checking its magnitude. (http, json, xmlrpc, logging, loading large values into integer via linear-time conversions such as hexadecimal stored in yaml, or anything computing larger values based on user controlled inputs… which then wind up attempting to output as decimal later on). All of these can suffer a CPU consuming DoS in the face of untrusted data.

Everyone auditing all existing code for this, adding length guards, and maintaining that practice everywhere is not feasible nor is it what we deem the vast majority of our users want to do.

This issue has been reported to the Python Security Response Team multiple times by a few different people since early 2020, most recently a few weeks ago while I was in the middle of polishing up the PR so it’d be ready before 3.11.0rc2.

Mitigation

After discussion on the Python Security Response Team mailing list the conclusion was that we needed to limit the size of integer to string conversions for non-linear time conversions (anything not a power-of-2 base) by default. And offer the ability to configure or disable this limit.

The Python Steering Council is aware of this change and accepts it as necessary.

@gpshead gpshead added type-bug An unexpected behavior, bug, or error type-security A security issue 3.11 3.10 3.9 3.8 3.7 3.12 labels Aug 8, 2022
@gpshead gpshead self-assigned this Aug 8, 2022
@gpshead gpshead changed the title Placeholder issue for a specific security fix CVE-2020-10735: Prevent DoS by large int<->str conversions Sep 2, 2022
@gpshead gpshead added the type-feature A feature request or enhancement label Sep 2, 2022
gpshead added a commit that referenced this issue Sep 2, 2022
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
gpshead added a commit that referenced this issue Sep 2, 2022
)

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
gpshead added a commit that referenced this issue Sep 2, 2022
)

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
@mgorny
Copy link
Contributor

mgorny commented Sep 3, 2022

Thanks a lot for doing this, and especially for the backports to older versions. It's really nice I don't have to figure them all out myself ;-).

@mdickinson
Copy link
Member

mdickinson commented Sep 3, 2022

As commented on the PR, the current code in main doesn't prevent the potential DOS for the int to str direction: the size check is only performed after the quadratic-time algorithm has run. For example:

Python 3.12.0a0 (heads/main-dirty:ac4ddab405, Sep  3 2022, 16:35:07) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> n = 10**(10**7)  # takes a few seconds to run 
>>> s = str(n)  # expect instant ValueError, but takes many minutes

One possible fix would be to keep the existing post-loop check (so that we get the benefit of knowing exactly what the threshold is), but also adding a cruder pre-check that would catch the majority of bad cases before the loop is entered.

I think we want something like the following as a pre-check

    if (size_a >= 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD / (3 * PyLong_SHIFT) + 2) {
        PyInterpreterState *interp = _PyInterpreterState_GET();
        int max_str_digits = interp->int_max_str_digits;
        if ((max_str_digits > 0) && (size_a >= 10 * max_str_digits / (3 * PyLong_SHIFT) + 2)) {
            <do error raising here>
        }
    }

Here 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD / (3 * PyLong_SHIFT) + 2 is a constant, so in normal use we get an extra predictable(?) never-taken branch.

For the logic used in computing the constant: 10/3 is an upper bound for log10(2), so

  • if size_a >= 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD / (3 * PyLong_SHIFT) + 2
  • then (size_a - 1) * (3 * PyLong_SHIFT) >= 10 * _PY_LONG_MAX_STR_DIGITS_THRESHOLD
  • so (size_a - 1) * PyLong_SHIFT >= log10(2) * _PY_LONG_MAX_STR_DIGITS_THRESHOLD
  • so abs(a) >= PyLong_BASE ** (size_a - 1) >= 10**_PY_LONG_MAX_STR_DIGITS_THRESHOLD

and similar logic applies for the size_a > 10 * max_str_digits / (3 * PyLong_SHIFT) + 2) expression within the loop.

One complication: we can no longer produce an error message that says exactly how many decimal digits there were, but I think that's going to be true no matter how we fix this.

@mdickinson
Copy link
Member

mdickinson commented Sep 3, 2022

I'll see if I can turn the above into a PR.

@mdickinson
Copy link
Member

mdickinson commented Sep 3, 2022

BTW, did the PR authors consider using type Py_ssize_t instead of int for the new int_max_str_digits field? The max value for int could potentially be as low as 32767, which could be quite limiting. Either Py_ssize_t or a fixed-size type like int32_t or int64_t might work better.

@merwok
Copy link
Member

merwok commented Sep 3, 2022

-X options and envvars are documented in 4 or 5 spots, I think a couple were missed by the PR.
I’ll make a PR to fix this little issue.

@gpshead
Copy link
Member Author

gpshead commented Sep 4, 2022

BTW, did the PR authors consider using type Py_ssize_t instead of int for the new int_max_str_digits field? The max value for int could potentially be as low as 32767, which could be quite limiting. Either Py_ssize_t or a fixed-size type like int32_t or int64_t might work better.

I did consider Py_ssize_t instead of int, but I'm unaware of any platform supported by CPython with a 16-bit int.

@mdickinson
Copy link
Member

mdickinson commented Sep 4, 2022

I did consider Py_ssize_t instead of int, but I'm unaware of any platform supported by CPython with a 16-bit int.

Okay. The rest of longobject.c tries hard not to make any assumptions about sizes of platform integers (beyond those guaranteed by the C standard), and it's possible that future platforms that we care about could have a 16-bit int. If we want to go the way of requiring that int has at least 32 bits, I think that should be a configure-time check, and should be clearly documented. (But it would be much less effort to use a different type instead of int here.)

gpshead added a commit that referenced this issue Sep 4, 2022
Per mdickinson@'s comment on the main branch PR.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 4, 2022
…ythonGH-96553)

Per mdickinson@'s comment on the main branch PR.
(cherry picked from commit 69bb83c)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 4, 2022
…ythonGH-96553)

Per mdickinson@'s comment on the main branch PR.
(cherry picked from commit 69bb83c)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit that referenced this issue Sep 4, 2022
Per mdickinson@'s comment on the main branch PR.
(cherry picked from commit 69bb83c)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit that referenced this issue Sep 4, 2022
Per mdickinson@'s comment on the main branch PR.
(cherry picked from commit 69bb83c)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@gpshead
Copy link
Member Author

gpshead commented Sep 4, 2022

it's possible that future platforms that we care about could have a 16-bit int.

No platform that wants to run real world existing C code would ever make such a decision.

The limit still works even if constrained to 32767 digits by a wacky 1980s worthy platform. Though some unit tests may assume higher limits are possible there is no point in changing those in any way until someone is actually offering up such a platform with a buildbot and credible desire for support.

@mdickinson
Copy link
Member

mdickinson commented Sep 4, 2022

Okay. I still think int is the wrong type here - is there any really good reason not to use Py_ssize_t? It's the obvious type for something that represents a count (of digits, bits, limbs, ...).

@gpshead
Copy link
Member Author

gpshead commented Sep 4, 2022

The computation for a billion digit conversion would never complete so a full Py_ssize_t isn't needed. Regardless, int32_t is pedantically correct so I've added switching to that as a main cleanup to do in #96512.

@mdickinson
Copy link
Member

mdickinson commented Sep 4, 2022

Yep, that works. That then makes it possible to tighten the check so that there are fewer false negatives - we can make 10 * int_max_str_digits safe from overflow by casting one of the multiplicands to int64_t first. (We couldn't do that with plain old int because of problems in the other direction: int could potentially be 64 bits.)

@mdickinson
Copy link
Member

mdickinson commented Sep 4, 2022

The computation for a billion digit conversion would never complete so a full Py_ssize_t isn't needed.

With current algorithms, that's true. But if some version of Tim's suggestions in #90716 went in, billion digit conversions might become feasible. :-) But at that point, maybe we'd be able to drop these limits entirely?

@mdickinson
Copy link
Member

mdickinson commented Sep 4, 2022

That then makes it possible to tighten the check [...]

To be clear, I can look at this later, for 3.12 only.

gpshead added a commit that referenced this issue Sep 4, 2022
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 4, 2022
…GH-96537)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196838bbaf5f4d35120e0e6bcde435b0b480)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
gpshead pushed a commit to gpshead/cpython that referenced this issue Sep 4, 2022
…ythonGH-96537)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
gpshead added a commit to gpshead/cpython that referenced this issue Sep 4, 2022
…#96537)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
miss-islington added a commit that referenced this issue Sep 4, 2022
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
gpshead added a commit to gpshead/cpython that referenced this issue Sep 4, 2022
…#96537)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
gpshead added a commit to gpshead/cpython that referenced this issue Sep 4, 2022
…#96537)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: pythongh-95778 -->
* Issue: pythongh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
gpshead added a commit that referenced this issue Sep 4, 2022
) (#96563)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
(cherry picked from commit b126196)

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
ambv pushed a commit that referenced this issue Sep 5, 2022
* Correctly pre-check for int-to-str conversion (#96537)

Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 3.8 3.9 3.10 3.11 3.12 type-bug An unexpected behavior, bug, or error type-feature A feature request or enhancement type-security A security issue
Projects
None yet
Development

No branches or pull requests

5 participants