Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new guide on how to understand and use the PEG parser #734

Merged
merged 19 commits into from Aug 4, 2021

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Jul 31, 2021

No description provided.

@pablogsal pablogsal force-pushed the parser branch 8 times, most recently from dd6c82c to bd19385 Jul 31, 2021
@terryjreedy
Copy link
Member

@terryjreedy terryjreedy commented Jul 31, 2021

I cannot review and make suggestions while you are pushing multiple changes. Let me know when done for a while.

@pablogsal pablogsal force-pushed the parser branch 4 times, most recently from c582807 to c05411d Jul 31, 2021
@pablogsal
Copy link
Member Author

@pablogsal pablogsal commented Jul 31, 2021

I cannot review and make suggestions while you are pushing multiple changes. Let me know when done for a while.

Something is wrong with the CI, I am trying to understand what's going on first. Will mark it as draft meanwhile

@pablogsal pablogsal marked this pull request as draft Jul 31, 2021
@pablogsal pablogsal force-pushed the parser branch 14 times, most recently from 0b38c0a to e9217d1 Jul 31, 2021
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Thanks for this, Pablo! It's truly amazing work! It'll be so helpful to newcomers that may be interested in working on parser-related tasks.

Also, I left some comments. Most of them are wording-related, so feel free to ignore them.

parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
Copy link
Member

@encukou encukou left a comment

Thank you for writing this up!

parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
pablogsal and others added 3 commits Aug 2, 2021
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
parser.rst Outdated Show resolved Hide resolved
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
parser.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

A few more nits here and there. Ignore them if you want to merge this ASAP.

Also, I think that it'd be nice if both of the example grammars generated the exact same parser, just in different languages. They're also somewhat outdated, so here's a suggestion for the ones to use, if you like:

C:

start[mod_ty]: a=expr_stmt* ENDMARKER { _PyAST_Module(a, NULL, p->arena) }
expr_stmt[stmt_ty]: a=expr NEWLINE { _PyAST_Expr(a, EXTRA) }

expr[expr_ty]:
    | l=expr '+' r=term { _PyAST_BinOp(l, Add, r, EXTRA) }
    | l=expr '-' r=term { _PyAST_BinOp(l, Sub, r, EXTRA) }
    | term

term[expr_ty]:
    | l=term '*' r=factor { _PyAST_BinOp(l, Mult, r, EXTRA) }
    | l=term '/' r=factor { _PyAST_BinOp(l, Div, r, EXTRA) }
    | factor

factor[expr_ty]:
    | '(' e=expr ')' { e }
    | atom

atom[expr_ty]:
    | NAME
    | NUMBER

Python:

start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or [], type_ignores=[]) }
expr_stmt: a=expr NEWLINE { ast.Expr(value=a, LOCATIONS) }

expr:
    | l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, LOCATIONS) }
    | l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, LOCATIONS) }
    | term

term:
    | l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, LOCATIONS) }
    | l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, LOCATIONS) }
    | factor

factor:
    | '(' e=expr ')' { e }
    | atom

atom:
    | NAME
    | NUMBER

parser.rst Outdated
* A "failure" result, in which case no input is consumed.

Notice that "failure" results do not imply that the program is incorrect or a
parsing failure because as the choice operator is ordered, a "failure" result
Copy link
Contributor

@lysnikolaou lysnikolaou Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that "failure" results do not imply that the program is incorrect or a
parsing failure or a parsing failure

This is a bit hard to read at first and needs a second pass to understand correctly, I feel. Maybe something like this would work a bit better?

Notice that "failure" results do not imply that the program is incorrect, nor do they necessarily mean that the parsing has failed. Since the choice operator is ordered, a failure very often merely indicates "try the following option".

Copy link
Member Author

@pablogsal pablogsal Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Yeah, I agree, this version reads better. Thanks!

parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated Show resolved Hide resolved
parser.rst Outdated
If the action is omitted and Python code is being generated, then a list
with all the parsed expressions gets returned (this is meant for debugging).
Copy link
Contributor

@lysnikolaou lysnikolaou Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true anymore. I think I remember Matthieu (I guess the state of the python generator in the pegen repo is going to be the standard for this document as well as far as the Python generator goes) changing this to most closely resemble what the C generator does: If there's a single name it gets returned, otherwise the list with all parsed expressions gets returned. Do you think we should change it?

Copy link
Member Author

@pablogsal pablogsal Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's change it to the current behaviour

Copy link
Member Author

@pablogsal pablogsal Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I still want to keep the Python generator in the CPython repo updated with the one in pegen

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
@pablogsal
Copy link
Member Author

@pablogsal pablogsal commented Aug 3, 2021

start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or [], type_ignores=[]) }
expr_stmt: a=expr NEWLINE { ast.Expr(value=a, LOCATIONS) }

expr:
| l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, LOCATIONS) }
| l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, LOCATIONS) }
| term

term:
| l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, LOCATIONS) }
| l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, LOCATIONS) }
| factor

factor:
| '(' e=expr ')' { e }
| atom

atom:
| NAME
| NUMBER

Great point! I will update the text with a version of these grammars. I will tweak them a bit so they are a bit more similar (name of the macro and some other touches).

@pablogsal pablogsal requested a review from lysnikolaou Aug 3, 2021
Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Really great work, @pablogsal. Thanks!

@pablogsal
Copy link
Member Author

@pablogsal pablogsal commented Aug 4, 2021

Thanks ❤️

@pablogsal pablogsal merged commit 4edf354 into python:master Aug 4, 2021
2 checks passed
@pablogsal pablogsal deleted the parser branch Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants