Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. #14304

Merged

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 22, 2019

  • The UTF-8 incremental decoders fails now fast if encounter
    a sequence that can't be handled by the error handler.
  • The UTF-16 incremental decoders with the surrogatepass error
    handler decodes now a lone low surrogate with final=False.

https://bugs.python.org/issue24214

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
@tirkarthi
Copy link
Member

tirkarthi commented Jun 22, 2019

Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for UnicodeDecodeError ? I could see the below test raising UnicodeDecodeError like older behavior with the PR where as it returns 'f' on master.

from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False))

@serhiy-storchaka
Copy link
Member Author

serhiy-storchaka commented Jun 23, 2019

I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited.

@miss-islington
Copy link
Contributor

miss-islington commented Jun 25, 2019

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒🤖

@serhiy-storchaka serhiy-storchaka deleted the utf8-utf16-incremental-decoder branch Jun 25, 2019
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-bot
Copy link

bedevere-bot commented Jun 25, 2019

GH-14368 is a backport of this pull request to the 3.8 branch.

@bedevere-bot
Copy link

bedevere-bot commented Jun 25, 2019

GH-14369 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington added a commit that referenced this pull request Jun 25, 2019
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
vstinner pushed a commit that referenced this pull request Jun 25, 2019
…-14304) (GH-14369)

* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ned-deily pushed a commit to ned-deily/cpython that referenced this pull request Jul 2, 2019
…thonGH-14304) (pythonGH-14369)

* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
…-14304)

* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants