Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar differences between Doc grammar and Pegen grammar for Python #100292

Open
kaby76 opened this issue Dec 16, 2022 · 0 comments
Open

Grammar differences between Doc grammar and Pegen grammar for Python #100292

kaby76 opened this issue Dec 16, 2022 · 0 comments
Labels
docs Documentation in the Doc dir

Comments

@kaby76
Copy link

kaby76 commented Dec 16, 2022

This is a question about the difference between the grammars at:

I am scraping and refactoring the Pegen grammar for Python at https://github.com/python/cpython/blob/9d2dcbbccdf4207249665b58c8bd28430c4f7afd/Grammar/python.gram mechanically. After removing the strings for lookahead, I performed a diff between the grammar in the Doc at https://docs.python.org/3/reference/grammar.html and the refactored grammar. It should be the same. But, it is not.

I noticed that some lookahead expressions are deleted for a rule in the Doc grammar, but others are not.

For example, in the Doc grammar, the simple_stmt rule does not have the &e lookahead expressions:

simple_stmt:
    | assignment
    | star_expressions 
    | return_stmt
    | import_stmt
    | raise_stmt
    | 'pass' 
    | del_stmt
    | yield_stmt
    | assert_stmt
    | 'break' 
    | 'continue' 
    | global_stmt
    | nonlocal_stmt

Similarly, for the compound_stmt rule, the &-lookahead expressions have been deleted from the Doc grammar:

compound_stmt:
    | function_def
    | if_stmt
    | class_def
    | with_stmt
    | for_stmt
    | try_stmt
    | while_stmt
    | match_stmt

Of course, the simple_stmt rule in the .gram file contains the & lookahead expressions:

simple_stmt[stmt_ty] (memo):
    | assignment
    | e=star_expressions { _PyAST_Expr(e, EXTRA) }
    | &'return' return_stmt
    | &('import' | 'from') import_stmt
    | &'raise' raise_stmt
    | 'pass' { _PyAST_Pass(EXTRA) }
    | &'del' del_stmt
    | &'yield' yield_stmt
    | &'assert' assert_stmt
    | 'break' { _PyAST_Break(EXTRA) }
    | 'continue' { _PyAST_Continue(EXTRA) }
    | &'global' global_stmt
    | &'nonlocal' nonlocal_stmt

So far, this all makes sense: remove the lookahead because it is not needed in the CFG.

But, if we now look further into the Doc grammar, we see that some rules still have the lookahead listed:

del_stmt:
    | 'del' del_targets &(';' | NEWLINE) 
...
slash_no_default:
    | param_no_default+ '/' ',' 
    | param_no_default+ '/' &')' 
slash_with_default:
    | param_no_default* param_with_default+ '/' ',' 
    | param_no_default* param_with_default+ '/' &')' 

Can someone clarify why some lookahead expressions are deleted and others are not in the Doc grammar?

@kaby76 kaby76 added the docs Documentation in the Doc dir label Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

1 participant