Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upString annotations [PEP 563] #4390
Conversation
I'm guessing there are still crasher bugs in here... E.g.
|
There are crashes because the work is unfinished. Some parts still are not implemented (in particularly subscribing). Error checking is minimal if exists. All concatenation has quadratic time. I think it is worth to implement simple accumulator that uses overallocated array and makes concatenations for linear time. Or you can reuse existing Yes, maybe unparsing the CST could be much simpler. But we don't have feature flags at this stage. |
ann_as_str = PyUnicode_InternFromString(ann_as_charp); | ||
if (!ann_as_str) | ||
return 0; | ||
ADDOP_O(c, LOAD_CONST, ann_as_str, consts); |
char *ann_as_charp; | ||
static PyObject *ann_as_str; | ||
|
||
ann_as_charp = PyAST_StrFromAstExpr(annotation, 1, 1); |
wrap_parens(char * str) | ||
{ | ||
char *result; | ||
result = PyMem_RawMalloc(strlen(str) + 3); |
return str_for_ast_list(e); | ||
case Tuple_kind: | ||
return str_for_ast_tuple(e, omit_parens); | ||
} |
If unparse the AST I would move the code into a separate file. There will be a lot of code, comparable with the size of |
Alright, I'm going to:
|
Sounds great! I am pretty happy to see where this is going, I'd like to see it in a solid state by the time beta 1 comes around. |
All comments from Serhiy's review acted upon, the import renamed back to "annotations" like commented above. The only missing piece in the implementation is f-string support but my battery will die soon so I wanted to push this out for you to look at. Shouldn't segfault anymore, trying to use f-strings raises an exception instead. |
Hm, the clang Travis CI build is failing due to invalid whitespace. This diff is suggesting a lot of changes but not on the single line that I modified |
I added a commit that fixes the whitespace according to the generated patch above so that I can see Travis CI passing. We can decide what to do with it later. AppVeyor is failing because we need to modify PCbuild/* but I don't have access to a Windows box. NEWS entry is not there yet since Blurb is tied to BPO issues and I'm wondering whether creating dummy issues for PEP work isn't redundant? I asked @larryhastings, we can also deal with this later. |
Changes:
This is ready for another review pass, @serhiy-storchaka. The only bit left is f-strings which is going to be a bit tedious so I'm waiting with it after a new round of feedback :-) |
How, do we organize the updates to def f() -> List['int']:
...
assert get_type_hints(f)['return'] == List[int] I suppose this can be part of the same PR, since this is not necessary in the backported version on PyPI. Also my updates to |
@ilevkivskyi, I want to fix |
@ambv OK, I am fine with this as well. |
It will take a time for making a review of such large change. But one comment I can say now. The unparser adds parenthesis for grouping subexpression. They are added even if not strictly needed, e.g. in I already encountered with similar problem when worked on the parser of plural form expressions in This is not a blocker, and we can solve this problem later, but you can think about this while I'm making a review. |
Good observation. Also, mypy doesn't like redundant parentheses in type
expressions. (Though it won't ever encounter these, since it parses the
source, so maybe it doesn't matter.)
…On Wed, Nov 22, 2017 at 11:32 AM, Serhiy Storchaka ***@***.*** > wrote:
It will take a time for making a review of such large change. But one
comment I can say now.
The unparser adds parenthesis for grouping subexpression. They are added
even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is
not that redundant parenthesis makes an expression less readable. The
problem is that they increase the stack consumption when parse the
expression again. It is possible that the original expression can be
parsed, but parsing the unparsed expression will fail or even crash.
I already encountered with similar problem when worked on the parser of
plural form expressions in gettext.py. A C-like syntax is parsed and
converted to Python syntax, and the result is evaluated. I minimized the
use of parenthesis. If the subexpression operator has higher priority than
the operator of the outer expression, parenthesis are not added.
This is not a blocker, and we can solve this problem later, but you can
think about this while I'm making a review.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4390 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACwrMqGgQ0gZQAtXZIJpySJuO8Q7_UV0ks5s5HbagaJpZM4Qco6U>
.
--
--Guido van Rossum (python.org/~guido)
|
ast_unparse.c is a close translation of the relevant parts of Tools/unparse.py. I didn't want to create an entire new thing from scratch as I'd miss edge cases that way for sure.
Tools/unparse.py uses parens liberally as this is the simplest way to ensure the order of operations is preserved. To know if it's safe to omit a paren, a sub-expression would need to know where it's being emitted (e.g. the super-expression). That's way more complicated than what is already done. So, Tools/unparse.py makes no effort to omit parens when they aren't needed.
This, as Guido points out, produces types that mypy is unable to parse (like "Dict[(str, int)]"). That won't fly for us so my C implementation allows for omitting parens under certain circumstances already (like the tuple index in the previous example). There are many tests around this. The goal was to not put any spurious parens in typical expressions used in typing. I didn't focus on minimizing parens in expressions which aren't valid types per PEP 484.
Two enhancements that are definitely possible:
- math operation ordering; and
- omitting comma-catching parens if there is no comma in the inner expression (like in comprehensions, dict literals, etc.)
I'll look into this next week. I am worried that such cleanup effort is likely to lead to bugs (expressions that end up semantically different from their original form). A spurious pair of parens is way less harmful than that.
Speaking of harm, Serhiy, do you have a legit example where we can get to a stack overflow due to spurious parens? While this is theoretically possible, I think we'd have to maliciously create an expression pathological enough to trigger this condition. Any other parse errors or crashes that spurious parens induce?
Serhiy, also, thank you for spending your time on reviewing this. I appreciate it. Your first round of review was already super helpful! Let me know if I can do anything to make review easier for you.
…--
Best regards,
Łukasz Langa
On Nov 22, 2017, at 11:37 AM, Guido van Rossum ***@***.***> wrote:
Good observation. Also, mypy doesn't like redundant parentheses in type
expressions. (Though it won't ever encounter these, since it parses the
source, so maybe it doesn't matter.)
On Wed, Nov 22, 2017 at 11:32 AM, Serhiy Storchaka ***@***.***
> wrote:
> It will take a time for making a review of such large change. But one
> comment I can say now.
>
> The unparser adds parenthesis for grouping subexpression. They are added
> even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is
> not that redundant parenthesis makes an expression less readable. The
> problem is that they increase the stack consumption when parse the
> expression again. It is possible that the original expression can be
> parsed, but parsing the unparsed expression will fail or even crash.
>
> I already encountered with similar problem when worked on the parser of
> plural form expressions in gettext.py. A C-like syntax is parsed and
> converted to Python syntax, and the result is evaluated. I minimized the
> use of parenthesis. If the subexpression operator has higher priority than
> the operator of the outer expression, parenthesis are not added.
>
> This is not a blocker, and we can solve this problem later, but you can
> think about this while I'm making a review.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#4390 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ACwrMqGgQ0gZQAtXZIJpySJuO8Q7_UV0ks5s5HbagaJpZM4Qco6U>
> .
>
--
--Guido van Rossum (python.org/~guido)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Yes, in case of This problem can be solved by assigning the numerical priority level to expressions and omitting parenthesis only if the current priority level is higher then the level of a super-expression (the sub-expression rather of a super-expression is responsible for adding parenthesis). I have yet two questions.
|
1. The string will be embedded in code objects. I don't want to have to
call out to Python code when creating code objects.
2. Yes it should support anything that's currently legal in an expression,
not just what we think are valid type expressions. Otherwise it's not
backwards compatible. (Also, otherwise the whole thing about string
literals was bogus.)
|
Added comments are mostly style comments (PEP 7) and suggestions for cleaning up the code, but there are several errors. |
@@ -15,6 +15,10 @@ PyAPI_FUNC(mod_ty) PyAST_FromNodeObject( | |||
PyCompilerFlags *flags, | |||
PyObject *filename, | |||
PyArena *arena); | |||
PyAPI_FUNC(PyObject *) PyAST_UnicodeFromAstExpr( |
serhiy-storchaka
Nov 27, 2017
Member
For uniformity with other names it would be better to name this function like PyAST_AsUnicode
.
And I think it would be better to make it private and don't include in a limited API for now.
serhiy-storchaka
Dec 2, 2017
Member
Actually, since this functions supports only AST corresponding to expressions, PyAST_AsUnicode
is not a good name. Maybe just add an underscore to PyAST_UnicodeFromAstExpr
? Or _PyAST_ExprAsUnicode
?
int co_nlocals; /* #local variables */ | ||
int co_stacksize; /* #entries needed for evaluation stack */ | ||
int co_flags; /* CO_..., see below */ | ||
int co_argcount; /* #arguments, except *args */ |
serhiy-storchaka
Nov 27, 2017
Member
Are all these differences just differences between tabs and spaces? It would be better to make this in a separate commit, and left only related changes.
ambv
Dec 2, 2017
Contributor
I did make this a separate commit. As I commented on it, it's only to make "patchcheck" happy.
serhiy-storchaka
Dec 2, 2017
Member
This already was done in #4583. You need just merge with master.
template = dedent( | ||
""" | ||
from __future__ import annotations | ||
def f() -> {ann}: |
@@ -357,6 +357,7 @@ | |||
<ClCompile Include="..\Python\_warnings.c" /> | |||
<ClCompile Include="..\Python\asdl.c" /> | |||
<ClCompile Include="..\Python\ast.c" /> | |||
<ClCompile Include="..\Python\ast_unparse.c" /> |
static int | ||
compiler_visit_annexpr(struct compiler *c, expr_ty annotation) | ||
{ | ||
static PyObject *ann_as_str; |
if (ret == -1) | ||
return ret; | ||
|
||
if (e->v.Yield.value) |
serhiy-storchaka
Nov 27, 2017
Member
The opening branch should be at the end of the same line as a condition (unless a condition is a multiline). PEP 7.
} | ||
|
||
static int | ||
append_ast_str(_PyUnicodeWriter *writer, expr_ty e, bool omit_string_brackets) |
serhiy-storchaka
Nov 27, 2017
Member
According to the meaning of this function it would be better to name the parameter raw
.
case Num_kind: | ||
return append_repr(writer, e->v.Num.n); | ||
case Str_kind: | ||
return append_ast_str(writer, e, omit_string_brackets); |
ambv
Dec 31, 2017
Contributor
Yes, however, this special handling of strings has since been dropped from the PEP, I'll remove it in the next commit.
case Bytes_kind: | ||
return append_repr(writer, e->v.Bytes.s); | ||
case Ellipsis_kind: | ||
return append_charp(writer, "..."); |
serhiy-storchaka
Nov 27, 2017
Member
Constant_kind
can be the Ellipsis too, in which case it outputs Ellipsis
instead of ...
.
ambv
Dec 31, 2017
Contributor
Well, yes, because apparently the user wrote "Ellipsis" and not "...". Do you think we should convert to "..." in this case, too?
While this is going on, let me continue to emphasize that I plan to accept PEP 563. I don't feel the need to wait until every nit in the code review is addressed, but I do think we need to have agreement on the issue of whether to generate redundant parentheses (e.g. |
Today, I'm going to make a pass to address Serhiy's (very detailed!) feedback, esp. the bugs. I'll fix the tab confusion, too. Next, having those things out of the way, I'm going to attempt what Serhiy's is suggesting (having a priority list for operators to avoid unnecessary parens). That's going to happen later in the week. I do agree that having fewer rather than more parens is typically more readable. Apart from this reason, why do we want to avoid the extra parens? Note that it's not about preserving original formatting, that one could very well have spurious parens that we are no longer aware of. We are also not preserving whitespace. So, if it's not original formatting then what, the stack overflows? I really don't think we will practically hit stack overflow due to parens in pathological expressions. If you both consider leaving out the parens important, I'll work on this. It's not free though (takes time and is to some degree risky, as I mentioned above). There's also other things that I must complete for PEP 563 to work as intended, like f-strings in this diff, or forward-ref handling in typing.py in a separate diff. |
The readability issue will occasionally come up when people are starting to
debug annotations. There's also the hypothetical issue that if someone were
to generate stub files based on these, the redundant parentheses may elicit
complaints from either mypy or pytype, both of which employ a limited
syntax for annotations. That said we can always improve this later, so
prioritize as you see fit.
|
As I said in my comment on Nov 23rd, to the best of my knowledge, the current state of the diff already omits all cases of spurious parens that occur in valid type annotations. |
Don't spend your time on fighting with the extra parens if this distracts you from more prioritized tasks. If you don't solve this problem in your PR I'm going to do this after its merging. I suppose this will not add too much complexity. Your code already avoid producing the extra parens in many cases. This is enough for the initial implementation. Yet one consideration. Could it help if introduce macros like the |
I was thinking about this when I was originally writing this but wasn't sure if macros aren't reserved just for special usage so I avoided them. If you'd like, I can refactor the file to use a macro instead, you're right, that should shorten it quite a bit. |
The string form is recovered by unparsing the AST.
This is required for PEP 563 and as such only implements a part of the unparsing process that covers expressions.
Alright, this is fully rebased without conflicts and all comments from previous code review are addressed. Things left to do:
|
Special handling of strings removed. I plan to add the missing f-string support in the first week of January so that the implementation is hopefully mergeable in 3.7.0a4. |
Alright, @serhiy-storchaka, this is complete now, including f-string support! I realize it's pretty last minute, sorry for that. A piece of useless statistics: this pull request was implemented in full during intercontinental flights. There's something very tranquil about sitting in one place for 10+ hours with no distractions. |
class B: | ||
... | ||
|
||
Since this change breaks compatibiltiy, the new behavior can be enabled |
Overall looks good. Code in |
This is unfinished work by @ambv.UPDATE: this is ready for review now.I'm adding it here because patching and reviewing are easier (for me anyway) when it's in PR form. Also, @serhiy-storchaka your eye would be appreciated, esp. for the hairy AST unparsing code in C. (Also, if you had to do this from scratch, would it be easier to unparse the CST instead?)