New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new guide on how to understand and use the PEG parser #734
Conversation
dd6c82c
to
bd19385
I cannot review and make suggestions while you are pushing multiple changes. Let me know when done for a while. |
c582807
to
c05411d
Something is wrong with the CI, I am trying to understand what's going on first. Will mark it as draft meanwhile |
0b38c0a
to
e9217d1
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Thanks for this, Pablo! It's truly amazing work! It'll be so helpful to newcomers that may be interested in working on parser-related tasks.
Also, I left some comments. Most of them are wording-related, so feel free to ignore them.
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
A few more nits here and there. Ignore them if you want to merge this ASAP.
Also, I think that it'd be nice if both of the example grammars generated the exact same parser, just in different languages. They're also somewhat outdated, so here's a suggestion for the ones to use, if you like:
C:
start[mod_ty]: a=expr_stmt* ENDMARKER { _PyAST_Module(a, NULL, p->arena) }
expr_stmt[stmt_ty]: a=expr NEWLINE { _PyAST_Expr(a, EXTRA) }
expr[expr_ty]:
| l=expr '+' r=term { _PyAST_BinOp(l, Add, r, EXTRA) }
| l=expr '-' r=term { _PyAST_BinOp(l, Sub, r, EXTRA) }
| term
term[expr_ty]:
| l=term '*' r=factor { _PyAST_BinOp(l, Mult, r, EXTRA) }
| l=term '/' r=factor { _PyAST_BinOp(l, Div, r, EXTRA) }
| factor
factor[expr_ty]:
| '(' e=expr ')' { e }
| atom
atom[expr_ty]:
| NAME
| NUMBER
Python:
start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or [], type_ignores=[]) }
expr_stmt: a=expr NEWLINE { ast.Expr(value=a, LOCATIONS) }
expr:
| l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, LOCATIONS) }
| l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, LOCATIONS) }
| term
term:
| l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, LOCATIONS) }
| l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, LOCATIONS) }
| factor
factor:
| '(' e=expr ')' { e }
| atom
atom:
| NAME
| NUMBER
parser.rst
Outdated
* A "failure" result, in which case no input is consumed. | ||
|
||
Notice that "failure" results do not imply that the program is incorrect or a | ||
parsing failure because as the choice operator is ordered, a "failure" result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that "failure" results do not imply that the program is incorrect or a
parsing failure or a parsing failure
This is a bit hard to read at first and needs a second pass to understand correctly, I feel. Maybe something like this would work a bit better?
Notice that "failure" results do not imply that the program is incorrect, nor do they necessarily mean that the parsing has failed. Since the choice operator is ordered, a failure very often merely indicates "try the following option".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.rst
Outdated
If the action is omitted and Python code is being generated, then a list | ||
with all the parsed expressions gets returned (this is meant for debugging). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true anymore. I think I remember Matthieu (I guess the state of the python generator in the pegen repo is going to be the standard for this document as well as far as the Python generator goes) changing this to most closely resemble what the C generator does: If there's a single name it gets returned, otherwise the list with all parsed expressions gets returned. Do you think we should change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's change it to the current behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I still want to keep the Python generator in the CPython repo updated with the one in pegen
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Great point! I will update the text with a version of these grammars. I will tweak them a bit so they are a bit more similar (name of the macro and some other touches). |
Thanks |
No description provided.