PEP 701 – Syntactic formalization of f-strings #102856

pablogsal · 2023-03-20T22:29:07Z

Changes in the C tokenizer
Categorize failing tests
Fix failing tests or modify/remove them as needed
Changes in Python tokenizer

Linked PRs

pablogsal · 2023-03-20T22:30:59Z

CC: @lysnikolaou @isidentical @mgmacias95

pablogsal · 2023-03-20T22:31:18Z

See this for the latest report on errors from @isidentical

pablogsal · 2023-03-20T22:31:36Z

Draft PR for the C tokenizer up: #102855

pablogsal · 2023-03-20T22:37:14Z

Things for the cleanup of #102855:

Cleaning up the grammar and the action helpers (the names are still ridiculous and there are multiple rules commented out).
Remove the old parsing code and check that we didn't break anything 😅
Clean/refactor the tokenizer struct (better names, factor stuff into its own structure as needed).
Consider factoring out tok_get_fstring_mode because is a monster.

pablogsal · 2023-03-20T22:49:30Z

Ok with #102855 we have the following failing tests:

Most of these are updating error messages, line numbers and other stuff but some may have actual bugs so we should check them. Please, mention which ones are you working on so we don't clash with one another.

mgmacias95 · 2023-03-20T23:03:11Z

Working on test_tokenize

Eclips4 · 2023-03-21T08:46:22Z

Hello, Pablo!
Can I get work on test_ast?
Recently I sent some PR's about this file (for example, #102797). So, I have some experience in that =)

ramvikrams · 2023-03-21T10:59:07Z

I can work with test_type_comments and test_unparse.

pablogsal · 2023-03-21T11:10:18Z

@Eclips4 @ramvikrams wonderful! Just make PRs against my fork!

Report here or ping any of us if you find something that could be a bug (don't just fix the tests blindly because there may be bugs lurking).

pablogsal · 2023-03-21T11:11:58Z

@lysnikolaou can you work on cleaning up the grammar + the actions?

@isidentical can you work on cleaning up some of the tokenizer layers? (This is quite a lot so we can probably work together here).

Eclips4 · 2023-03-21T16:10:53Z

@pablogsal
About test_ast.py
Seems thats like there only a one test will be failed, and how I undestand, that's a bug:

cpython/Lib/test/test_ast.py

Lines 779 to 780 in 7f760c2

    
           with self.assertRaises(SyntaxError): 
        
               ast.parse('f"{x=}"', feature_version=(3, 7))

I think, there's two solutions:

Remove this test, because support of python3.7 will be ended soon.
Now errors raised by tokenizer.c instead of string_parser.c, so as I understand, we should change python_gram, is it right? ( We need access to feature_version, which in tokenizer inaccessible )

pablogsal · 2023-03-21T16:59:02Z

2. Now errors raised by tokenizer.c instead of string_parser.c, so as I understand, we should change python_gram, is it right? ( We need access to feature_version, which in tokenizer inaccessible )

Probably we can do this but on the other hand I would prefer to not overcomplicate this so I think (1) is better

lysnikolaou · 2023-03-21T19:20:25Z

@lysnikolaou can you work on cleaning up the grammar + the actions?

Will do!

Eclips4 · 2023-03-21T21:01:37Z

Also, I can take a look at test_cmd_line_script. Seems easy.

pablogsal · 2023-03-21T21:08:34Z

Also, I can take a look at test_cmd_line_script. Seems easy.

All yours!

CharlieZhao95 · 2023-03-22T03:27:28Z

I found that no one has claimed test_eof yet, so I made some work. :)
Failed test case: test_eof.test_eof_with_line_continuation

I looked at its commit history. This test case is a regression test for crash, so it seems like a good choice to keep the case and update the error message directly.

cpython/Lib/test/test_eof.py

Lines 39 to 40 in 72186aa

    
           def test_eof_with_line_continuation(self): 
        
               expect = "unexpected EOF while parsing (<string>, line 1)"

Update unexpected EOF while parsing (<string>, line 1) to (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape (<string>, line 1)

ramvikrams · 2023-03-23T05:50:10Z

@pablogsal in test_type_comments we have not used any f-strings.

pablogsal · 2023-03-23T13:36:33Z

@pablogsal in test_type_comments we have not used any f-strings.

The one failing there is this problem:

======================================================================
FAIL: test_fstring (test.test_type_comments.TypeCommentTests.test_fstring)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 275, in test_fstring
    for tree in self.parse_all(fstring, minver=6):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 239, in parse_all
    with self.assertRaisesRegex(SyntaxError, expected_regex,
AssertionError: SyntaxError not raised : feature_version=(3, 4)

----------------------------------------------------------------------

which I think is a feature version problem.

Eclips4 · 2023-03-23T17:55:03Z

@pablogsal in test_type_comments we have not used any f-strings.

The one failing there is this problem:

======================================================================
FAIL: test_fstring (test.test_type_comments.TypeCommentTests.test_fstring)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 275, in test_fstring
    for tree in self.parse_all(fstring, minver=6):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 239, in parse_all
    with self.assertRaisesRegex(SyntaxError, expected_regex,
AssertionError: SyntaxError not raised : feature_version=(3, 4)

----------------------------------------------------------------------

which I think is a feature version problem.

cpython/Lib/test/test_type_comments.py

Line 223 in 4695709

lowest = 4 # Lowest minor version supported

We can just change this to 6, and this test will be pass.
I don't research this problem, but this solution looks like the simplest.
However... supporting syntax of python3.4 && 3.5 looks cinda strange.

pablogsal · 2023-03-23T18:24:29Z

We can just change this to 6, and this test will be pass.
I don't research this problem, but this solution looks like the simplest.
However... supporting syntax of python3.4 && 3.5 looks cinda strange.

I don't think that will work because we are not doing version checking anymore. See previous comments. The fix is probably to no pass feature_version.

sunmy2019 · 2023-03-24T08:55:05Z

Looks like no one is analyzing test_exceptions. I will look into it these two days.

4 platforms seem to have met the same problem here.

pablogsal · 2023-03-24T10:54:31Z

@isidentical @lysnikolaou i have pushed some rules for error messages, please take a look and complete them with more if you have some spare cycles. With these the failures in test_fstring have decreased notably

isidentical · 2023-03-25T00:36:44Z

I can confirm that the total number of failures has been decrased from 88 to 63. I'll try to see what are the most high impact ones and submit a PR to clear them.

isidentical · 2023-03-25T02:46:33Z

If anyone intends to work on any of the remaining tasks in test_fstring, please double check with this PR (pablogsal#52) since it brings down the total failures to 30 with some explanations/required decisions for the rest.

sunmy2019 · 2023-03-25T04:12:14Z

After looking into the failure in test_exceptions.

check(b'Python = "\xcf\xb3\xf2\xee\xed" +', 1, 18)

Old parser and the new parser raises the same exception (UnicodeDecodeError), but with different col_offset. This is because it was raised by the wrong token.

I would consider it a bug in the old parser. Just as this comment mentions,

cpython/Parser/string_parser.c

Lines 38 to 48 in 64cb1a4

    
                       /* This is needed, in order for the SyntaxError to point to the token t, 
        
                          since _PyPegen_raise_error uses p->tokens[p->fill - 1] for the 
        
                          error location, if p->known_err_token is not set. */ 
        
                       p->known_err_token = t; 
        
                       if (octal) { 
        
                           RAISE_SYNTAX_ERROR("invalid octal escape sequence '\\%.3s'", 
        
                                              first_invalid_escape); 
        
                       } 
        
                       else { 
        
                           RAISE_SYNTAX_ERROR("invalid escape sequence '\\%c'", c); 
        
                       }

the error token was not correctly set in the old parser.

Maybe we should open an issue for the old parse? But the possible fixing might be error-prone, since we might need to keep track of every possible code path.

As for the new parser, I think a change in the test case would be fine.

sunmy2019 · 2023-03-31T10:44:01Z

I am working on an PR to fix test_unparse this weekend.

_PyPegen_concatenate_strings did not implement concatenating empty Constant with FormattedValue, resulting unparse failure.

sunmy2019 · 2023-03-31T20:04:19Z

Hi, I got some bad news.

I have been testing against memory leaks with ./python -m test -j $(nproc) -R :
~30% of the tests failed on current head 270b661

For example,

0:01:01 load avg: 13.77 [157/433/42] test_unittest failed (reference leak)
beginning 9 repetitions
test_unittest leaked [89, 89, 89, 89] references, sum=356
test_unittest leaked [90, 89, 89, 89] memory blocks, sum=357
.......
0:01:16 load avg: 15.00 [185/433/49] test_inspect failed (reference leak)
beginning 9 repetitions
test_inspect leaked [429, 429, 429, 429] references, sum=1716
test_inspect leaked [318, 318, 318, 317] memory blocks, sum=1271

These references are most likely to leak during the compilation (such as using import, using compile/exec/eval, or using ast.parse)

We might need to look into that.

Update: Memory Leakage fixed by commit pablogsal@d8b12e2

The root cause is that someone forgot to use _PyArena_AddPyObject in 3 places.

This is very tricky because _PyArena_AddPyObject was scattered in many subroutines. Sometimes you should add _PyArena_AddPyObject, but sometimes you should not (add will cause a negative ref count).

Just like an old saying, managing memory by hand is so much pain, and also error-prone. This check can be done, by analyzing the PyObject*s registered to the arena by the time the AST was created, but that is a totally different story.

pablogsal · 2023-04-06T12:14:09Z

Only 12 test left in test_fstring and we are ready to go!

pablogsal · 2023-04-06T13:05:18Z

I think we can make it before beta 1. Formalizing fstring error messages (see pablogsal#52 (comment)) can be left to the end, since it is non-functional.

Yes and no: there may be some design consequences and I am not a fan of deactivating tests to reactivate them later.

lysnikolaou · 2023-04-06T15:20:44Z

There's a small discrepancy around the conversion character between our version and the string parser. The string parser does not allow spaces before or after the conversion character. For example:

>>> f"{'3'!r : >10s}"
  File "<stdin>", line 1
    f"{'3'!r : >10s}"
                     ^
SyntaxError: f-string: expecting '}'

Our version, while not allowing for spaces between ! and the character, allows spaces after.

>>> f"{'3'!r : >10s}"
"       '3'"

What do you think about this? I feel okay about it, although it isn't really consistent with our decision on spaces after the ! character.

pablogsal · 2023-04-06T15:43:30Z

What do you think about this? I feel okay about it, although it isn't really consistent with our decision on spaces after the ! character.

I feel that this is fine, and would be very difficult to disallow to be honest because these are different tokens anyway. It won't affect existing code and I don't see any reason to forbid it going foward.

pablogsal · 2023-04-06T15:48:32Z

7 tests to go thanks to @lysnikolaou ❤️

lysnikolaou · 2023-04-07T10:51:27Z

With pablogsal#65, we only have two errors to go.

======================================================================
FAIL: test_format_specifier_expressions (test.test_fstring.TestCase.test_format_specifier_expressions) (str='f\'{"s"!r{":10"}}\'')
----------------------------------------------------------------------
  File "<string>", line 1
    f'{"s"!r{":10"}}'
            ^
SyntaxError: f-string: expecting '}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lysnikolaou/repos/python/cpython/Lib/test/test_fstring.py", line 32, in assertAllRaise
    with self.assertRaisesRegex(exception_type, regex):
AssertionError: "f-string: invalid conversion character 'r{"': expected 's', 'r', or 'a'" does not match "f-string: expecting '}' (<string>, line 1)"

======================================================================
FAIL: test_format_specifier_expressions (test.test_fstring.TestCase.test_format_specifier_expressions) (str="f'{4:{/5}}'")
----------------------------------------------------------------------
  File "<string>", line 1
    f'{4:{/5}}'
          ^
SyntaxError: f-string: expecting '}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lysnikolaou/repos/python/cpython/Lib/test/test_fstring.py", line 32, in assertAllRaise
    with self.assertRaisesRegex(exception_type, regex):
AssertionError: "f-string: invalid syntax" does not match "f-string: expecting '}' (<string>, line 1)"

These are hard to handle correctly, if we have the invalid_replacement_field rule stay the same to catch those expected '}' cases. There are the following 3 options, I guess.

Take more time to figure out whether we can find a smart way to differentiate between these things (can't see something since yesterday)
Be okay with sometimes emitting a expecting '}' error, while there are other syntax errors within the f-string (keeping invalid_replacement_field as-is).
Be okay with sometimes emitting a invalid syntax error, when we could have done a better error (probably making invalid_replacement_field require both a conversion character and a valid format spec).

Thoughts?

sunmy2019 · 2023-04-07T13:57:43Z

Catching all errors seems unrealistic. But I think we do have an opportunity to improve.

By changing the invalid_replacement_field, we will have a better complain message about what to expect, and we can capture the syntax error from the field (yield_expr | star_expressions).

invalid_replacement_field:
    | '{' a='=' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: expression required before '='") }
    | '{' a=':' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: expression required before ':'") }
    | '{' a='!' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: expression required before '!'") }
    | '{' a='}' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: empty expression not allowed") }
    | '{' (yield_expr | star_expressions) !('=' | '!' | ':' | '}') {
        PyErr_Occurred() ? NULL : RAISE_SYNTAX_ERROR("f-string: expecting '=', or '!', or ':', or '}'")
    }
    | '{' (yield_expr | star_expressions) "=" !('!' | ':' | '}') {
        PyErr_Occurred() ? NULL : RAISE_SYNTAX_ERROR("f-string: expecting '!', or ':', or '}'")
    }
    | '{' (yield_expr | star_expressions) "="? invalid_conversion_character
    | '{' (yield_expr | star_expressions) "="? ['!' NAME] !(':' | '}') {
        PyErr_Occurred() ? NULL : RAISE_SYNTAX_ERROR("f-string: expecting ':' or '}'")
    }
    | '{' (yield_expr | star_expressions) "="? ['!' NAME] [':' fstring_format_spec*] !'}' {
        PyErr_Occurred() ? NULL : RAISE_SYNTAX_ERROR("f-string: expecting '}'")
    }

sunmy2019 · 2023-04-07T14:57:46Z

With this new grammer,

  File "a.py", line 1
    f'{"s"!r{":10"}}'
            ^
SyntaxError: f-string: expecting ':' or '}'

much better I think.

lysnikolaou · 2023-04-07T14:59:52Z

Totally agree. Good idea! Will you open a PR?

sunmy2019 · 2023-04-07T15:01:42Z

Sure

sunmy2019 · 2023-04-11T17:26:34Z

Hi, there. Will anyone take a look at pablogsal#67?

pablogsal · 2023-04-13T11:17:17Z

We are almost there, seems that we still have a small failure:

FAIL: test_random_files (test.test_tokenize.TestRoundtrip.test_random_files)

@isidentical this may be that we lack some changes in the untokenize function

sunmy2019 · 2023-04-13T11:37:09Z

@pablogsal We lack changes in both the tokenize and the untokenize function. It cannot recognize the new f-string grammar. See pablogsal#67 (comment)

pablogsal · 2023-04-13T11:53:26Z

@pablogsal We lack changes in both the tokenize and the untokenize function. It cannot recognize the new f-string grammar. See pablogsal#67 (comment)

We should be ok excluding the files that use the new funky syntax until we have the new tokenize implementation

pablogsal · 2023-04-13T13:39:09Z

Seems that we have a failing tests in the buildbots:

https://buildbot.python.org/all/#/builders/802/builds/760

lysnikolaou · 2023-04-13T13:46:14Z

Looking into the failure.

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Batuhan Taskaya <isidentical@gmail.com> Co-authored-by: Marta Gómez Macías <mgmacias@google.com> Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>

…ation

pablogsal · 2023-04-19T23:16:11Z

This code is kind of wrong:

cpython/Parser/tokenizer.c

Lines 2513 to 2517 in 6be7aee

    
           tok->cur = (char *)current_tok->f_string_start; 
        
           tok->cur++; 
        
           tok->line_start = current_tok->f_string_multi_line_start; 
        
           int start = tok->lineno; 
        
           tok->lineno = tok->first_lineno;

This fails when the f-string closes in a different line due to the expression taking multiple lines:

>>> f"blech{
... 1
... +
... 1
... }
... "

Or for example, if you do

>>> f"123{1}
...

In both cases the values of tok->cur and tok->line_start are garbage. I suspect the problem is that we are not updating current_tok->f_string_start and current_tok->f_string_multi_line_start if the f-string expression takes multiple lines and is not triple quotes.

To check this, please compile with --with-address-sanitizer

* main: (24 commits) pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561) pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545) pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634) pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633) pythongh-102856: Initial implementation of PEP 701 (python#102855) pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589) pythongh-83004: Harden msvcrt further (python#103420) pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626) pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314) pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586) pythongh-103583: Always pass multibyte codec structs as const (python#103588) pythongh-103617: Fix compiler warning in _iomodule.c (python#103618) pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600) pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576) pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613) [Doc] Fix a typo in optparse.rst (python#103504) pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531) pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039) pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569) pythongh-67230: update whatsnew note for csv changes (python#103598) ...

pablogsal · 2023-04-22T22:40:13Z

@cmaureir is going to work on allowing comments as per the specification.

bedevere-bot mentioned this issue Mar 20, 2023

gh-102856: Initial implementation of PEP 701 #102855

Merged

sunmy2019 mentioned this issue Mar 25, 2023

Handle invalid expressions pablogsal/cpython#54

Merged

pablogsal mentioned this issue Apr 13, 2023

capture more errors in fstring replacement field pablogsal/cpython#67

Merged

bedevere-bot mentioned this issue Apr 19, 2023

gh-102856: Skip test_mismatched_parens in WASI builds #103633

Merged

pablogsal added a commit to pablogsal/cpython that referenced this issue Apr 19, 2023

pythongh-102856: Skip test_mismatched_parens in WASI builds

dac07ca

pablogsal added a commit that referenced this issue Apr 19, 2023

gh-102856: Skip test_mismatched_parens in WASI builds (#103633)

5f7d68e

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Apr 19, 2023

pythongh-102856: Skip failing test in test_fstring to fix WASI buildbot

12ec6b0

pablogsal added a commit to pablogsal/cpython that referenced this issue Apr 19, 2023

pythongh-102856: Clean some of the PEP 701 tokenizer implementation

917ba81

bedevere-bot mentioned this issue Apr 19, 2023

gh-102856: Clean some of the PEP 701 tokenizer implementation #103634

Merged

pablogsal added a commit to pablogsal/cpython that referenced this issue Apr 19, 2023

fixup! pythongh-102856: Clean some of the PEP 701 tokenizer implement…

a377562

…ation

pablogsal added a commit that referenced this issue Apr 19, 2023

gh-102856: Clean some of the PEP 701 tokenizer implementation (#103634)

d4aa857

terryjreedy mentioned this issue Apr 21, 2023

Rewriting within f-strings asottile/pyupgrade#572

Open

jimbaker mentioned this issue Apr 23, 2023

Progress at PyCon 2023 jimbaker/tagstr#21

Open

PEP 701 – Syntactic formalization of f-strings #102856

PEP 701 – Syntactic formalization of f-strings #102856

Comments

pablogsal commented Mar 20, 2023 • edited by bedevere-bot

Linked PRs

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023

pablogsal commented Mar 20, 2023 • edited

pablogsal commented Mar 20, 2023 • edited

mgmacias95 commented Mar 20, 2023

Eclips4 commented Mar 21, 2023

ramvikrams commented Mar 21, 2023

pablogsal commented Mar 21, 2023

pablogsal commented Mar 21, 2023

Eclips4 commented Mar 21, 2023 • edited

pablogsal commented Mar 21, 2023

lysnikolaou commented Mar 21, 2023

Eclips4 commented Mar 21, 2023

pablogsal commented Mar 21, 2023

CharlieZhao95 commented Mar 22, 2023

ramvikrams commented Mar 23, 2023

pablogsal commented Mar 23, 2023 • edited

Eclips4 commented Mar 23, 2023

pablogsal commented Mar 23, 2023

sunmy2019 commented Mar 24, 2023

pablogsal commented Mar 24, 2023

isidentical commented Mar 25, 2023

isidentical commented Mar 25, 2023

sunmy2019 commented Mar 25, 2023

sunmy2019 commented Mar 31, 2023

sunmy2019 commented Mar 31, 2023 • edited

pablogsal commented Apr 6, 2023

pablogsal commented Apr 6, 2023

lysnikolaou commented Apr 6, 2023

pablogsal commented Apr 6, 2023

pablogsal commented Apr 6, 2023

lysnikolaou commented Apr 7, 2023

sunmy2019 commented Apr 7, 2023 • edited

sunmy2019 commented Apr 7, 2023

lysnikolaou commented Apr 7, 2023

sunmy2019 commented Apr 7, 2023

sunmy2019 commented Apr 11, 2023

pablogsal commented Apr 13, 2023

sunmy2019 commented Apr 13, 2023

pablogsal commented Apr 13, 2023

pablogsal commented Apr 13, 2023

lysnikolaou commented Apr 13, 2023

pablogsal commented Apr 19, 2023 • edited

pablogsal commented Apr 22, 2023 • edited

pablogsal commented Mar 20, 2023 •

edited by bedevere-bot

pablogsal commented Mar 20, 2023 •

edited

pablogsal commented Mar 20, 2023 •

edited

Eclips4 commented Mar 21, 2023 •

edited

pablogsal commented Mar 23, 2023 •

edited

sunmy2019 commented Mar 31, 2023 •

edited

sunmy2019 commented Apr 7, 2023 •

edited

pablogsal commented Apr 19, 2023 •

edited

pablogsal commented Apr 22, 2023 •

edited