gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

srinivasreddy · 2020-06-26T06:00:18Z

https://bugs.python.org/issue41115

Issue: Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError #85287

…rror in idna.py, utf_16.py, utf_32.py, punycode.py, undefined.py modules.

eamanu · 2020-06-26T16:29:11Z

Lib/encodings/punycode.py

@@ -1,4 +1,4 @@
-""" Codec for the Punicode encoding, as specified in RFC 3492
+""" Codec for the Punycode encoding, as specified in RFC 3492


eamanu

LGTM

srinivasreddy · 2020-06-29T09:48:29Z

@pitrou Please review!

doerwalter · 2020-06-29T16:57:10Z

The patch get the offsets in the exception object wrong for all non-trivial cases.

For example the following case:

>>> s = 'foo.' + 60*'\xff'
>>> s.encode('idna')

gives the stack trace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/walter/checkouts/cpython-doerwalter/Lib/encodings/idna.py", line 187, in encode
    result.extend(ToASCII(label))
  File "/Users/walter/checkouts/cpython-doerwalter/Lib/encodings/idna.py", line 105, in ToASCII
    raise UnicodeEncodeError("punycode", label.decode("punycode"), 0,
UnicodeEncodeError: 'punycode' codec can't encode characters in position 0-62: label too long

However the bad characters are not s[0:63] but s[4:64].

Fixing this would require to track the correct offset across multiple function calls. I'm not sure whether that added complexity is justified. idna enforces strict encoding anyway.

arhadthedev · 2023-04-03T04:27:18Z

A review by @doerwalter hasn't been being addressed for three years and the OP blocked their branch from external pushes thus preventing us from applying all necessary fixes by ourselves.

the-knights-who-say-ni added the CLA signed label Jun 26, 2020

bedevere-bot added the awaiting review label Jun 26, 2020

srinivasreddy added 4 commits June 26, 2020 15:12

bpo-41115: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeE…

6816eb3

…rror in idna.py, utf_16.py, utf_32.py, punycode.py, undefined.py modules.

bpo-41115: Convert label to str(label)

0d24207

bpo-41115: Fix build failures in idna.py

dd44d59

bpo-41115:Fix doc failure

64778d2

srinivasreddy mentioned this pull request Jun 26, 2020

bpo-41115: Modified src to raise rather Unicode{Encode, Decode}Error rather than plain UnicodeError #21170

Closed

eamanu reviewed Jun 26, 2020

View reviewed changes

eamanu approved these changes Jun 26, 2020

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Jun 26, 2020

pitrou mentioned this pull request Apr 10, 2022

Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError #85287

Closed

ezio-melotti removed the CLA signed label Jul 13, 2022

arhadthedev changed the title ~~bpo-41115: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError~~ gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError Apr 3, 2023

arhadthedev closed this Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

Uh oh!

srinivasreddy commented Jun 26, 2020 •

edited by bedevere-bot

Loading

Uh oh!

eamanu Jun 26, 2020

Uh oh!

eamanu left a comment

Uh oh!

srinivasreddy commented Jun 29, 2020

Uh oh!

doerwalter commented Jun 29, 2020

Uh oh!

arhadthedev commented Apr 3, 2023

Uh oh!

Uh oh!

		@@ -1,4 +1,4 @@
		""" Codec for the Punicode encoding, as specified in RFC 3492
		""" Codec for the Punycode encoding, as specified in RFC 3492

Uh oh!

gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

gh-85287: Convert UnicodeError to UnicodeEncodeError| UnicodeDecodeError #21165

Uh oh!

Conversation

srinivasreddy commented Jun 26, 2020 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eamanu Jun 26, 2020

Choose a reason for hiding this comment

Uh oh!

eamanu left a comment

Choose a reason for hiding this comment

Uh oh!

srinivasreddy commented Jun 29, 2020

Uh oh!

doerwalter commented Jun 29, 2020

Uh oh!

arhadthedev commented Apr 3, 2023

Uh oh!

Uh oh!

srinivasreddy commented Jun 26, 2020 •

edited by bedevere-bot

Loading