Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-90716: Refactor PyLong_FromString to separate concerns #96808

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

oscarbenjamin
Copy link

@oscarbenjamin oscarbenjamin commented Sep 13, 2022

This is a preliminary PR to refactor PyLong_FromString which is currently quite messy and has spaghetti like code that mixes up different concerns as well as duplicating logic.

In particular:

  • PyLong_FromString now only handles sign, base and prefix detection and calls a new function long_from_string_base to parse the main body of the string.
  • The long_from_string_base function handles all string validation and then calls long_from_binary_base or a new function long_from_non_binary_base to construct the actual PyLong.
  • The existing long_from_binary_base function is simplified by factoring duplicated logic to long_from_string_base.
  • The new function long_from_non_binary_base factors out much of the code from PyLong_FromString including in particular the quadratic algorithm reffered to in CVE-2020-10735: Prevent DoS by large int<->str conversions #95778 so that this can be seen separately from unrelated concerns such as string validation.

I intend to follow up on this with a PR to improve the algorithm used for decimal and other non binary bases but I think that would be a lot easier to do after this refactoring. I could also submit that algorithm in the same PR but I thought it would be easier to review this refactoring separately from a change of algorithm.

@bedevere-bot
Copy link

bedevere-bot commented Sep 13, 2022

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Sep 13, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

bedevere-bot commented Sep 13, 2022

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@gpshead
Copy link
Member

gpshead commented Sep 14, 2022

Please create a new github feature request issue to track this work.

@gpshead gpshead changed the title gh-95778: Refactor PyLong_FromString to separate concerns Refactor PyLong_FromString to separate concerns Sep 14, 2022
@gpshead gpshead requested review from mdickinson and tim-one Sep 14, 2022
@gpshead gpshead added the interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) label Sep 14, 2022
@oscarbenjamin oscarbenjamin changed the title Refactor PyLong_FromString to separate concerns gh-96812: Refactor PyLong_FromString to separate concerns Sep 14, 2022
@oscarbenjamin
Copy link
Author

oscarbenjamin commented Sep 14, 2022

Please create a new github feature request issue to track this work.

I don't really understand this workflow but does #96812 work for this?

@gpshead gpshead changed the title gh-96812: Refactor PyLong_FromString to separate concerns gh-90716: Refactor PyLong_FromString to separate concerns Sep 14, 2022
@gpshead
Copy link
Member

gpshead commented Sep 14, 2022

yep, that makes sense, though we've already got an issue for that - edited/redirected to it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants