Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-95778: CVE-2020-10735: Prevent DoS by very large int() #96499

Merged
merged 45 commits into from Sep 2, 2022

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Sep 2, 2022

Integer to and from text conversions via CPython's bignum int type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] christian@python.org
Tons-of-polishing-up-by: Gregory P. Smith [Google] greg@krypto.org
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

I wrote up a one pager for the release managers. Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.

remaining TODOs (aka project management)

@tiran
Copy link
Member

tiran commented Sep 2, 2022

Closing and re-opening ticket. CI isn't starting.

@tiran tiran closed this Sep 2, 2022
@tiran tiran reopened this Sep 2, 2022
tiran and others added 18 commits Sep 2, 2022
The text to int parser of Python's int type is not safe against
malicious input. Very large input strings with hundred thousands of
digits can consume several seconds.

The int() now limit the maximum amount of an input string to
5,000 digits.

For comparison total amount of protons in the observable universe is
known as Eddington number. That number has 80 digits.

Signed-off-by: Christian Heimes <christian@python.org>
…e json module.

"It's advised to limit the input to a sensible length." isn't very helpful for the JSON module as, while technically true, the limit needed to avoid hitting things like the int<->str base 10 conversion this issue is about would make a significant percentage of actual JSON used in the real world impractical.

we're limiting the int/str conversion length, that is the best that can be done here - unless someone wants to implement a feature request for JSON to reject int looking fields at a much smaller base10 length limit.
The lumps way too many changes together in one commit, but extricating
them into a series of individual pieces doesn't seem worthwhile at this
stage, it could be done by picking and choosing into a new branch if
there is a reason to do so.  Summarizing off the top of my head after
rereading diffs too many times:

- Renamed the environment variable, -X flag, and names of the new APIs.
- Cleaned up the documentation.
- Refactored and improved some tests.
- Moved the definitions of the default and threshold to a non-public
  header.
- Set the default to 2000 instead of disabled as the Steering Council
  agreed having a default limit made the most sense for our users.
- Lowered the minimum threshold because there seems no harm in allowing
  people to have a lower limit.
- Fixed the other-bases base-10 digits equivalent estimate to be
  consistent with the actual potential base-10 value instead of wildly
  off. Aimed for consistency of number size here rather than any attempt
  to match the performance between bases as that is easier for users to
  understand and doesn't change if our algorithm implementations change.
- Left a couple of notes about what changes look good or not good for a
  security fix backported into a stable release.  It appears the PyConfig
  struct is a public API so we cannot change its definition mid-release.
  Backporting is going to involve something outside of the normal code
  path wise to plumb the new value through.  Ugh.

Questions:
 * [ ] `test_embed` is failing in an odd manner... what's up with that?
 * [ ] Is `_pydecimal` important?  This change makes some of its APIs
   not work based on the values being used.  The unittest suite that
   tests both the C and Python Decimal implementations revealed this
   and needed a higher limit set.  If it relies on ints for arbitrary
   precision Decimal numbers, limits are frequently going to make it
   practically unusuable.
 * [ ] Backporting... we should produce what we believe a 3.11/3.10/3.9
   backport will look like in its own PR branch before making a final
   decision. The most annoying difficulty being the `struct PyConfig`
   issue.
 * [ ] I pondered having values >0 but <threshold just silently be
   rounded up to the threshold value instead of causing an error at
   Python startup time, but we do have other environment variables and
   flags that cause startup time errors based on their value so it
   doesn't seem to matter either way.  If we did that, the setter
   should also do that instead of raising.
Renamed to `int_max_str_digits` and simplified the logic per @Y1hgs's
comments on the earlier revision.  Less code, less awkward, and simpler
to explain.

Underscores and the sign are uncounted because that makes for the
easiest implementation.
Playing it safe, if this lands in 3.11 before 3.11.0 these can be
updated to 3.11.
@gpshead gpshead requested a review from tiran Sep 2, 2022
@gpshead gpshead changed the title [gh-95778] CVE-2020-10735: Prevent DoS by very large int() gh-95778: CVE-2020-10735: Prevent DoS by very large int() Sep 2, 2022
@gpshead gpshead marked this pull request as ready for review Sep 2, 2022
@tiran tiran added the 🔨 test-with-buildbots Test the PR with the buildbot fleet and report in the status section label Sep 2, 2022
@bedevere-bot
Copy link

bedevere-bot commented Sep 2, 2022

🤖 New build scheduled with the buildbot fleet by @tiran for commit 0b91f65 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test the PR with the buildbot fleet and report in the status section label Sep 2, 2022
zooba
zooba approved these changes Sep 2, 2022
Copy link
Member

@zooba zooba left a comment

Approved with typo fix (as pointed out by Nathaniel on one of the backports)

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.12.rst Outdated Show resolved Hide resolved
Lib/test/test_ast.py Outdated Show resolved Hide resolved
Lib/test/test_compile.py Outdated Show resolved Hide resolved
Parser/pegen.c Outdated Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
tiran and others added 2 commits Sep 2, 2022
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
@gpshead
Copy link
Member Author

gpshead commented Sep 2, 2022

LOL at least I managed a consistent misspelling?

@gpshead gpshead merged commit 511ca94 into python:main Sep 2, 2022
14 checks passed
gpshead added a commit that referenced this issue Sep 2, 2022
)

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
gpshead added a commit that referenced this issue Sep 2, 2022
)

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
Objects/longobject.c Show resolved Hide resolved
Copy link

@light-bringer light-bringer left a comment

good changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants