Open
Description
Minor bug with decoding of EUC-JP character "㎝".
Bug report
the character "㎝" is part of the JIS_X_0208
encoding. The python core libraries include the EUC-JP
encoding, which represents the JIS X 0208
, JIS X 0212
, and JIS X 0201
encodings. However, attempting to decode the "㎝" character with the EUC-JP
codec results in decoding errors.
Example
As taken from https://stackoverflow.com/questions/73255012/python-fails-to-decode-euc-jp-strings-with-the-character:
print(b"58\xad\xd1".decode("EUC-JP"))
throws
Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
print(b"58\xad\xd1".decode("EUC-JP"))
UnicodeDecodeError: 'euc_jp' codec can't decode byte 0xad in position 2: illegal multibyte sequence
However, decoding with alternative codecs works
content = b"\xa5\xb5\xa5\xa4\xa5\xba\xa1\xa7XL \xcc\xf377\xad\xd1\xa1\xdf\xcc\xf358\xad\xd1"
print(b"58\xad\xd1".decode("euc_jisx0213"))
>58㎝
Your environment
- CPython versions tested on: 3.9, 3.10
- Operating system and architecture: Windows x64
Metadata
Metadata
Assignees
Projects
Status
No status