Skip to content

gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

wants to merge 4 commits into from

Conversation

srinivasreddy
Copy link
Contributor

@srinivasreddy srinivasreddy commented Jun 26, 2020

@@ -1,4 +1,4 @@
""" Codec for the Punicode encoding, as specified in RFC 3492
""" Codec for the Punycode encoding, as specified in RFC 3492
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Contributor

@eamanu eamanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srinivasreddy
Copy link
Contributor Author

@pitrou Please review!

@doerwalter
Copy link
Contributor

The patch get the offsets in the exception object wrong for all non-trivial cases.

For example the following case:

>>> s = 'foo.' + 60*'\xff'
>>> s.encode('idna')

gives the stack trace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/walter/checkouts/cpython-doerwalter/Lib/encodings/idna.py", line 187, in encode
    result.extend(ToASCII(label))
  File "/Users/walter/checkouts/cpython-doerwalter/Lib/encodings/idna.py", line 105, in ToASCII
    raise UnicodeEncodeError("punycode", label.decode("punycode"), 0,
UnicodeEncodeError: 'punycode' codec can't encode characters in position 0-62: label too long

However the bad characters are not s[0:63] but s[4:64].

Fixing this would require to track the correct offset across multiple function calls. I'm not sure whether that added complexity is justified. idna enforces strict encoding anyway.

@arhadthedev arhadthedev changed the title bpo-41115: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError Apr 3, 2023
@arhadthedev
Copy link
Member

A review by @doerwalter hasn't been being addressed for three years and the OP blocked their branch from external pushes thus preventing us from applying all necessary fixes by ourselves.

@arhadthedev arhadthedev closed this Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants