New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-41894: Fix UTF-8 errors loading native module #22466
Conversation
@aeros, why the "skip news" label? This is clearly a bug fix and those usually entail NEWS entries. |
I added a NEWS entry. |
Misc/NEWS.d/next/Core and Builtins/2020-10-02-11-35-33.bpo-41894.ffmtOt.rst
Outdated
Show resolved
Hide resolved
When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere.
For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Ok, I believe all suggestions have been addressed. Wherever a filesystem path has been used, I have switched it to use PyUnicode_DecodeFSDefault and for any system error messages I have used PyUnicode_DecodeLocale. While PEP 489 dictates that the function name will always be ASCII, I have not switched to using PyUnicode_DecodeASCII since it does not solve any problem (other than being potentially less efficient), thus reducing the size of the changes. |
…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
GH-22704 is a backport of this pull request to the 3.9 branch. |
…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
GH-22705 is a backport of this pull request to the 3.8 branch. |
When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 2d2af32) Co-authored-by: Kevin Adler <kadler@us.ibm.com>
…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…GH-22466) When running in a non-UTF-8 locale, if an error occurs while importing a native Python module (say because a dependent share library is missing), the error message string returned may contain non-ASCII code points causing a UnicodeDecodeError. PyUnicode_DecodeFSDefault is used for buffers which may contain filesystem paths. For consistency with os.strerror(), PyUnicode_DecodeLocale is used for buffers which contain system error messages. While the shortname parameter is always encoded in ASCII according to PEP 489, it is left decoded using PyUnicode_FromString to minimize the changes and since it should not affect the decoding (albeit _potentially_ slower). In dynload_hpux, since the error buffer contains a message generated from a static ASCII string and the module filesystem path, PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as is used elsewhere. * bpo-41894: Fix bugs in dynload error msg handling For both dynload_aix and dynload_hpux, properly handle the possibility that decoding strings may return NULL and when such an error happens, properly decrement any previously decoded strings and return early. In addition, in dynload_aix, ensure that we pass the decoded string *object* pathname_ob to PyErr_SetImportError instead of the original pathname buffer. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.
While this is most likely to occur on AIX, which still uses non-UTF-8
locales by default, it could occur on any Unix-like system.
https://bugs.python.org/issue41894