Skip to content

tokenize.detect_encoding has issues with \r #93608

Open
@plokmijnuhby

Description

@plokmijnuhby

Tokenize does not correctly detect encodings when the carriage return character is involved. In this program, it does not detect the header, since it does not consider the \r to start a new line.

import tokenize
import runpy

code = """\r# coding: big5

print('錯誤的答案')
"""

with open('file.py', 'w', encoding='big5') as file:
    file.write(code)
with open('file.py', 'rb') as file:
    print(tokenize.detect_encoding(file.readline))
    # result: ('utf-8', [b'\r# coding: big5\r\n', b'\r\n'])

runpy.run_path('file.py')
# prints 錯誤的答案 as required

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions