Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42128: Structural Pattern Matching (PEP 634) #22917

Draft
wants to merge 217 commits into
base: master
from

Conversation

@brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Oct 23, 2020

PEP 634 has not yet been accepted, but we'd like to hit the ground running and get this into alphas as soon as it (hopefully) is.

Several people have volunteered to review the implementation, since it's so huge. Other reviews are very welcome, if anybody has a bit of time to pitch in. This touches tons of stuff: the parser, the compiler, the VM, the builtins, the stdlib, the tests... I'd like as many eyeballs as possible!

https://bugs.python.org/issue42128

brandtbucher and others added 30 commits May 24, 2020
Python/ceval.c Outdated Show resolved Hide resolved
static int compiler_enter_scope(struct compiler *, identifier, int, void *, int);
static void compiler_free(struct compiler *);
static basicblock *compiler_new_block(struct compiler *);
static int compiler_next_instr(basicblock *);
static int compiler_addop(struct compiler *, int);
static int compiler_addop_i(struct compiler *, int, Py_ssize_t);
static int compiler_addop_j(struct compiler *, int, basicblock *);
static int compiler_error(struct compiler *, const char *);
static int compiler_error(struct compiler *, const char *, ...);

This comment has been minimized.

@brandtbucher

brandtbucher Oct 23, 2020
Author Member

One other change I made here: compiler_error takes varargs now, like compiler_warn. I got tired of building strings for helpful error messages. 🙃

#define CHECK(X) \
if (!(X)) { \
return 0; \
}
Comment on lines +1567 to +1570

This comment has been minimized.

@brandtbucher

brandtbucher Oct 23, 2020
Author Member

This comes in handy to keep visual clutter down in the nastier compound patterns.

// To keep things simple, all compiler_pattern_* routines follow the convention
// of preserving TOS (the subject for the given pattern) and pushing either True
// (match) or False (no match) on top of it. We do this even for irrefutable
// patterns; the idea is that it's much easier to smooth out any redundant
// pushing, popping, and jumping in the peephole optimizer than to detect or
// predict it here.
Comment on lines +5449 to +5454

This comment has been minimized.

@brandtbucher

brandtbucher Oct 23, 2020
Author Member

NB!

Also, note that the current pattern compiler is designed to be as simple and easy-to-review as possible. The PEP gives us broad freedom to do trickier stuff, but I'm not going to begin work on a rewrite until we have a couple of releases using this simpler approach.

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Oct 24, 2020

🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit 5589f77 🤖

If you want to schedule another build, you need to add the "🔨 test-with-buildbots" label again.

@brandtbucher
Copy link
Member Author

@brandtbucher brandtbucher commented Oct 25, 2020

So I see three categories of buildbot failures:

  • Memory leaks in test_io (all Refleak jobs).
  • Undefined references to _Py_Mangle and PyAST_CompileObject during LTO (all Fedora Stable LTO jobs).
  • Segfaults in testlib2to3 (PPC64LE RHEL7 LTO, s390x Fedora LTO PR, s390x RHEL8 LTO + PGO).

I’ve verified that all of these same failures occur on the 3.x buildbots, so I think they can be safely ignored here.

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Oct 25, 2020

So I see three categories of buildbot failures:

Thanks @brandtbucher!

I am going to try to fix the segfaults first, as we have a release soon of 3.10 alpha 2 and I don't want those segfaults to be there.

@brandtbucher brandtbucher requested a review from ncoghlan Oct 27, 2020
{
// Don't blindly optimize the pattern as an expr; it plays by its own rules!
// Currently, this is only used to form complex/negative numeric constants.
switch (node_->kind) {

This comment has been minimized.

@ncoghlan

ncoghlan Oct 29, 2020
Contributor

Something I noticed while working on PEP 642's reference implementation is that this ast_opt code is currently relying on the parser to keep the operators that it doesn't handle away from it, so it's easy to segfault by feeding it your own AST tree, without invoking the parser.

While filling in validate_pattern in ast.c will be the main answer to that, I think there are some genuinely missing cases here as well (e.g. +1 will fail the UnaryOp check).

This comment has been minimized.

@isidentical

isidentical Oct 29, 2020
Member

While filling in validate_pattern in ast.c will be the main answer to that, I think there are some genuinely missing cases here as well (e.g. +1 will fail the UnaryOp check).

The drafted AST validator should prevent +1, but I can't say for sure is whether a full coverage exists (I tried to implement it by looking at the grammar but it would be nice if @brandtbucher can do a second pass on it):

>>> t = ast.parse("match test:\n\tcase -1: pass")
>>> t.body[0].cases[0].pattern.op = ast.UAdd()
>>> print(ast.unparse(t))
match test:
    case +1:
        pass
>>> compile(t, "<test>", "exec")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: only USub operator is allowed for UnaryOp nodes inside of a pattern

This comment has been minimized.

@brandtbucher

brandtbucher Oct 29, 2020
Author Member

Something I noticed while working on PEP 642's reference implementation is that this ast_opt code is currently relying on the parser to keep the operators that it doesn't handle away from it, so it's easy to segfault by feeding it your own AST tree, without invoking the parser.

I'm not sure it's easy to actually segfault the compiler with malformed patterns (at least --with-pydebug); most of the assumptions it makes here and in compile.c about what the parser can emit are guarded either by assert(...) or Py_UNREACHABLE() calls. (I'm probably a heavier user of assert than the historical authors of these files). As I see it, the goal of PyAST_Validate is to make sure these checks cannot be hit by people playing with ast and compile in userland.

So checking those same assumptions in PyAST_Validate should be sufficient (and more should be added if missing).

While filling in validate_pattern in ast.c will be the main answer to that, I think there are some genuinely missing cases here as well (e.g. +1 will fail the UnaryOp check).

For what it's worth, +1 isn't valid. Full grammar here.

I can't say for sure is whether a full coverage exists (I tried to implement it by looking at the grammar but it would be nice if @brandtbucher can do a second pass on it)

Ha! I knew I was forgetting something. I promised you a review, then promptly got distracted with ceval.c and forgot. Thanks for reminding me.

This comment has been minimized.

@brandtbucher

brandtbucher Oct 29, 2020
Author Member

@isidentical can you open up a PR in your own repo (merging isidentical/cpython:patma-ast-validators into isidentical/cpython:master)? That way I can leave review comments.

This comment has been minimized.

This comment has been minimized.

@brandtbucher

brandtbucher Oct 29, 2020
Author Member

Great, thanks!

@brandtbucher
Copy link
Member Author

@brandtbucher brandtbucher commented Oct 29, 2020

Also, if any core devs are eavesdropping and haven't voted for Batuhan's promotion yet, please do so! He's absolutely earned it.

@markshannon
Copy link
Contributor

@markshannon markshannon commented Oct 30, 2020

def f(t):
    a, b, p, q, x, y = [None]*6
    match t:
        case []:
            pass
        case [x]:
            y = 2
        case [a, b] if a == b:
            pass
        case [p, q]:
            pass
    print(a,b,p,q,x,y)

f((1,2))

This prints 1 2 1 2 None None, instead of None None 1 2 None None.
Since [a, b] if a == b doesn't match, why are values assigned to a and b?
[x] doesn't match, and x is not assigned to.

@brandtbucher
Copy link
Member Author

@brandtbucher brandtbucher commented Oct 30, 2020

Hi Mark.

PEP 634 says (in regard to name bindings in patterns):

These may happen earlier or later depending on the implementation strategy, the only constraint being that capture variables must be set before guards that use them explicitly are evaluated.

In general, there are two points to keep in mind for cases like yours:

  • Cases consist of patterns and guards. These are separate conditions. While the language of the PEP allows checking guard conditions while matching the pattern (which this implementation doesn't do), the main constraint is that any names used by the guard have already been bound, as you experienced.

  • The implementation, while still quite naïve at this point, will quit matching as soon as a pattern is proven to be a non-match. In your first and second cases, we jump to the next case as soon as we observe a sequence of incompatible length.

@markshannon
Copy link
Contributor

@markshannon markshannon commented Oct 30, 2020

So the bug is in the PEP, not the implementation?

@brandtbucher
Copy link
Member Author

@brandtbucher brandtbucher commented Oct 30, 2020

So the bug is in the PEP, not the implementation?

It's not a bug, it's part of the spec. We even mention it twice:

During failed pattern matches, some subpatterns may succeed. For example, while matching the pattern (0, x, 1) with the value [0, 1, 2], the subpattern x may succeed if the list elements are matched from left to right. The implementation may choose to either make persistent bindings for those partial matches or not. User code including a match statement should not rely on the bindings being made for a failed match, but also shouldn't assume that variables are unchanged by a failed match. This part of the behavior is left intentionally unspecified so different implementations can add optimizations, and to prevent introducing semantic restrictions that could limit the extensibility of this feature.

@markshannon
Copy link
Contributor

@markshannon markshannon commented Oct 30, 2020

Why inflict that on users?
Why not say "all variables in the matching pattern will be assigned, and none in patterns that don't match."
It sounds like you are deliberately trying to trip people up.

And please don't use "optimization" as an excuse, that's just lazy.

@brandtbucher
Copy link
Member Author

@brandtbucher brandtbucher commented Oct 30, 2020

This isn't the place to discuss PEP 634.

Perhaps take it up in gvanrossum/patma (relevant discussion in gvanrossum/patma#110). It's been pretty quiet lately since the spec has matured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants
You can’t perform that action at this time.