-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenize does not roundtrip {{ after \n #125008
Comments
Furthermore, here is the output of the following code: import tokenize, io
source_code = r'f"\n{{test}}"'
tokens = tokenize.generate_tokens(io.StringIO(source_code).readline)
for t in tokens:
print(t) TokenInfo(type=61 (FSTRING_START), string='f"', start=(1, 0), end=(1, 2), line='f"\\n{{test}}"')
TokenInfo(type=62 (FSTRING_MIDDLE), string='\\n{', start=(1, 2), end=(1, 5), line='f"\\n{{test}}"')
TokenInfo(type=62 (FSTRING_MIDDLE), string='test}', start=(1, 6), end=(1, 11), line='f"\\n{{test}}"')
TokenInfo(type=63 (FSTRING_END), string='"', start=(1, 12), end=(1, 13), line='f"\\n{{test}}"')
TokenInfo(type=4 (NEWLINE), string='', start=(1, 13), end=(1, 14), line='f"\\n{{test}}"')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='') So, it seems that the line is getting in alright, but the \n{{ is getting turned into a \n{ in the tokenizer somehow. Same erroneous output for the bytes version (with rb-string, BytesIO and tokenize.tokenize). |
It looks like this was a regression in Python 3.12; I can't reproduce the behaviour with Python 3.11. I'm guessing it was caused by the PEP-701 changes. |
Reproduced on the |
This seems to happen with other escape characters as well: import tokenize, io
source_code = r'f"""\t{{test}}"""'
tokens = tokenize.generate_tokens(io.StringIO(source_code).readline)
x = tokenize.untokenize((t,s) for t, s, *_ in tokens)
print(x) # f"""\t{test}}""" import tokenize, io
source_code = r'f"""\r{{test}}"""'
tokens = tokenize.generate_tokens(io.StringIO(source_code).readline)
x = tokenize.untokenize((t,s) for t, s, *_ in tokens)
print(x) # f"""\r{test}}""" |
I think the issue is in this method: Lines 187 to 208 in 16cd6cc
This PR fixed the handling of Unicode literals (e.g. if character == "{":
n_backslashes = sum(
1 for char in _itertools.takewhile(
"\\".__eq__,
characters[-2::-1]
)
)
- if n_backslashes % 2 == 0:
+ if n_backslashes % 2 == 0 or characters[-1] != "N":
characters.append(character)
else:
consume_until_next_bracket = True |
…onGH-125013) (cherry picked from commit db23b8bb13863fcd88ff91bc22398f8e0312039e) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
…onGH-125013) (cherry picked from commit db23b8bb13863fcd88ff91bc22398f8e0312039e) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Bug report
Bug description:
Expected:
Got:
Note the absence of a second { in the {{ after the \n — but in no other positions.
Unlike some other roundtrip failures of tokenize, some of which are minor infelicities, this one actually creates a syntactically invalid program on roundtrip, which is quite bad. You get a
SyntaxError: f-string: single '}' is not allowed
when trying to use the results.CPython versions tested on:
3.12
Operating systems tested on:
Linux, Windows
Linked PRs
tokenize.untokenize
roundtrip for\n{{
#125013tokenize.untokenize
roundtrip for\n{{
(GH-125013) #125020tokenize.untokenize
roundtrip for\n{{
(GH-125013) #125021The text was updated successfully, but these errors were encountered: