Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-39287: Doc: Add UTF-8 mode section in using/windows. #17935
Conversation
|
||
.. versionadded:: 3.7 | ||
|
||
Windows doesn't use UTF-8 for the system encoding (the ANSI Code Page). |
This comment has been minimized.
This comment has been minimized.
eryksun
Jan 10, 2020
Contributor
Windows 10 supports setting the system locale's ANSI and OEM codepages to UTF-8 (65001), but it's not enabled by default.
There are still problems to be resolved with using UTF-8 at the system level. In particular, the console host (conhost.exe) doesn't support using UTF-8 as the input codepage for use with ReadFile
and ReadConsoleA
. It encodes the UTF-16 input buffer with an internal WideCharToMultiByte
call that assumes one byte per encoded character (at least in a Western locale, for which a single-byte encoding is assumed). This fails for non-ASCII characters, which in turn end up as null bytes in the result of a ReadFile
or ReadConsoleA
call. Python is immune to this problem for the most part. The I/O stack detects a console file and uses wide-character ReadConsoleW
instead, via io._WindowsConsoleIO
. The problem affects low-level os.write
and os.read
, however, because they're not integrated with _WindowsConsoleIO
.
methane commentedJan 10, 2020
•
edited by bedevere-bot
https://bugs.python.org/issue39287