Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39287: Doc: Add UTF-8 mode section in using/windows. #17935

Open
wants to merge 1 commit into
base: master
from

Conversation

@methane
Copy link
Member

methane commented Jan 10, 2020

@methane methane force-pushed the methane:win-utf8mode branch from 6cb32d7 to 812c0be Jan 10, 2020

.. versionadded:: 3.7

Windows doesn't use UTF-8 for the system encoding (the ANSI Code Page).

This comment has been minimized.

Copy link
@eryksun

eryksun Jan 10, 2020

Contributor

Windows 10 supports setting the system locale's ANSI and OEM codepages to UTF-8 (65001), but it's not enabled by default.

There are still problems to be resolved with using UTF-8 at the system level. In particular, the console host (conhost.exe) doesn't support using UTF-8 as the input codepage for use with ReadFile and ReadConsoleA. It encodes the UTF-16 input buffer with an internal WideCharToMultiByte call that assumes one byte per encoded character (at least in a Western locale, for which a single-byte encoding is assumed). This fails for non-ASCII characters, which in turn end up as null bytes in the result of a ReadFile or ReadConsoleA call. Python is immune to this problem for the most part. The I/O stack detects a console file and uses wide-character ReadConsoleW instead, via io._WindowsConsoleIO. The problem affects low-level os.write and os.read, however, because they're not integrated with _WindowsConsoleIO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.