Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String annotations [PEP 563] #4390

Merged
merged 7 commits into from Jan 26, 2018
Merged

String annotations [PEP 563] #4390

merged 7 commits into from Jan 26, 2018

Conversation

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Nov 14, 2017

This is unfinished work by @ambv. UPDATE: this is ready for review now.

I'm adding it here because patching and reviewing are easier (for me anyway) when it's in PR form. Also, @serhiy-storchaka your eye would be appreciated, esp. for the hairy AST unparsing code in C. (Also, if you had to do this from scratch, would it be easier to unparse the CST instead?)

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Nov 14, 2017

I'm guessing there are still crasher bugs in here... E.g.

>>> from __future__ import string_annotations
>>> def f(a: list[str]): pass
... 
Segmentation fault: 11
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

There are crashes because the work is unfinished. Some parts still are not implemented (in particularly subscribing). Error checking is minimal if exists.

All concatenation has quadratic time. I think it is worth to implement simple accumulator that uses overallocated array and makes concatenations for linear time. Or you can reuse existing _PyBytesWriter or _PyUnicodeWriter.

Yes, maybe unparsing the CST could be much simpler. But we don't have feature flags at this stage.

ann_as_str = PyUnicode_InternFromString(ann_as_charp);
if (!ann_as_str)
return 0;
ADDOP_O(c, LOAD_CONST, ann_as_str, consts);

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 14, 2017
Member

ann_as_str is leaked. Use ADDOP_N.

char *ann_as_charp;
static PyObject *ann_as_str;

ann_as_charp = PyAST_StrFromAstExpr(annotation, 1, 1);

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 14, 2017
Member

Check if not NULL.

wrap_parens(char * str)
{
char *result;
result = PyMem_RawMalloc(strlen(str) + 3);

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 14, 2017
Member

Results of all allocations should be checked for NULL.

return str_for_ast_list(e);
case Tuple_kind:
return str_for_ast_tuple(e, omit_parens);
}

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 14, 2017
Member

Add the default case. At least for debugging.

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 14, 2017

If unparse the AST I would move the code into a separate file. There will be a lot of code, comparable with the size of compile.c.

@ambv
Copy link
Contributor

@ambv ambv commented Nov 18, 2017

Alright, I'm going to:

  • change the future name back to annotations (yay!)
  • stick to AST unparsing
  • move this work to a separate file
  • change concatenation to use the accumulator pattern (reusing _PyBytesWriter would be great)
  • finish the functionality
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Nov 18, 2017

Sounds great! I am pretty happy to see where this is going, I'd like to see it in a solid state by the time beta 1 comes around.

@ambv ambv force-pushed the ambv:string_annotations branch from b49dfa1 to c56d5bd Nov 18, 2017
@ambv ambv changed the title [WIP] String annotations String annotations [PEP 563] Nov 18, 2017
@ambv ambv self-assigned this Nov 18, 2017
@ambv ambv added the skip issue label Nov 18, 2017
@ambv
Copy link
Contributor

@ambv ambv commented Nov 18, 2017

All comments from Serhiy's review acted upon, the import renamed back to "annotations" like commented above. The only missing piece in the implementation is f-string support but my battery will die soon so I wanted to push this out for you to look at.

Shouldn't segfault anymore, trying to use f-strings raises an exception instead.

@ambv ambv force-pushed the ambv:string_annotations branch from c56d5bd to 1e71626 Nov 18, 2017
@ambv
Copy link
Contributor

@ambv ambv commented Nov 18, 2017

Hm, the clang Travis CI build is failing due to invalid whitespace. make patchcheck is complaining about Include/code.h, suggesting the following diff:
https://gist.github.com/ambv/9f0874d56c7cf0d5ce98dfef38de0ce6

This diff is suggesting a lot of changes but not on the single line that I modified ¯\_(ツ)_/¯

@ambv
Copy link
Contributor

@ambv ambv commented Nov 18, 2017

I added a commit that fixes the whitespace according to the generated patch above so that I can see Travis CI passing. We can decide what to do with it later.

AppVeyor is failing because we need to modify PCbuild/* but I don't have access to a Windows box.

NEWS entry is not there yet since Blurb is tied to BPO issues and I'm wondering whether creating dummy issues for PEP work isn't redundant? I asked @larryhastings, we can also deal with this later.

@gvanrossum gvanrossum requested a review from python/windows-team as a code owner Nov 18, 2017
@ambv ambv added the skip news label Nov 18, 2017
@ambv ambv force-pushed the ambv:string_annotations branch from 4cdb25d to 98231af Nov 20, 2017
@ambv ambv added skip news and removed skip news labels Nov 20, 2017
@ambv
Copy link
Contributor

@ambv ambv commented Nov 20, 2017

Changes:

  • Rebased on top of latest master
  • Added a NEWS entry (manually since this is a PEP; the bot doesn't recognize this, therefore the skip news label has to stay)
  • Moved the changes to PCbuild to the respective implementation commit ("Implement unparsing...")

This is ready for another review pass, @serhiy-storchaka. The only bit left is f-strings which is going to be a bit tedious so I'm waiting with it after a new round of feedback :-)

@ilevkivskyi
Copy link
Contributor

@ilevkivskyi ilevkivskyi commented Nov 22, 2017

How, do we organize the updates to typing.get_type_hints() to work with "doubly quoted" strings? I mean this should still work with the __future__ import:

def f() -> List['int']:
    ...

assert get_type_hints(f)['return'] == List[int]

I suppose this can be part of the same PR, since this is not necessary in the backported version on PyPI. Also my updates to typing following PEP 560 should not have merge conflicts.

@ambv
Copy link
Contributor

@ambv ambv commented Nov 22, 2017

@ilevkivskyi, I want to fix get_type_hints() separately since there's really no reason why accepting "List['int']" shouldn't be supported even today. Same with fixing the self-class reference as you suggested on python-dev (I think I'd do it with a ChainMap though), fixing the forward ref cache conflicts, etc.

@ilevkivskyi
Copy link
Contributor

@ilevkivskyi ilevkivskyi commented Nov 22, 2017

@ambv OK, I am fine with this as well.

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 22, 2017

It will take a time for making a review of such large change. But one comment I can say now.

The unparser adds parenthesis for grouping subexpression. They are added even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is not that redundant parenthesis makes an expression less readable. The problem is that they increase the stack consumption when parse the expression again. It is possible that the original expression can be parsed, but parsing the unparsed expression will fail or even crash.

I already encountered with similar problem when worked on the parser of plural form expressions in gettext.py. A C-like syntax is parsed and converted to Python syntax, and the result is evaluated. I minimized the use of parenthesis. If the subexpression operator has higher priority than the operator of the outer expression, parenthesis are not added.

This is not a blocker, and we can solve this problem later, but you can think about this while I'm making a review.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Nov 22, 2017

@ambv
Copy link
Contributor

@ambv ambv commented Nov 23, 2017

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 23, 2017

Yes, in case of gettext the one of purposes was to guard against a malicious input. In the case of annotations it is less likely. The other purpose -- speeding up the compilation.

This problem can be solved by assigning the numerical priority level to expressions and omitting parenthesis only if the current priority level is higher then the level of a super-expression (the sub-expression rather of a super-expression is responsible for adding parenthesis).

I have yet two questions.

  1. Is the performance important here (I afraid yes)? The Python implementation would be simpler and more reliable. But much slower.

  2. Do we need to support the full Python expression syntax? In particular arithmetic operations. Or it would be enough to support only the small subset used in type annotations? Names, attributes, indexing, tuples, what more?

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Nov 23, 2017

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Added comments are mostly style comments (PEP 7) and suggestions for cleaning up the code, but there are several errors.

@@ -15,6 +15,10 @@ PyAPI_FUNC(mod_ty) PyAST_FromNodeObject(
PyCompilerFlags *flags,
PyObject *filename,
PyArena *arena);
PyAPI_FUNC(PyObject *) PyAST_UnicodeFromAstExpr(

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

For uniformity with other names it would be better to name this function like PyAST_AsUnicode.

And I think it would be better to make it private and don't include in a limited API for now.

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Dec 2, 2017
Member

Actually, since this functions supports only AST corresponding to expressions, PyAST_AsUnicode is not a good name. Maybe just add an underscore to PyAST_UnicodeFromAstExpr? Or _PyAST_ExprAsUnicode?

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

OK, I like _PyAST_ExprAsUnicode.

int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_argcount; /* #arguments, except *args */

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Are all these differences just differences between tabs and spaces? It would be better to make this in a separate commit, and left only related changes.

This comment has been minimized.

@ambv

ambv Dec 2, 2017
Contributor

I did make this a separate commit. As I commented on it, it's only to make "patchcheck" happy.

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Dec 2, 2017
Member

This already was done in #4583. You need just merge with master.

This comment has been minimized.

@ambv

ambv Dec 2, 2017
Contributor

With your changes from bpo-32150 this is no longer necessary :-)

template = dedent(
"""
from __future__ import annotations
def f() -> {ann}:

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Should we test also an argument's annotation?

This comment has been minimized.

@ambv

ambv Dec 2, 2017
Contributor

That's the same code but sure, I can add tests with that, too.

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Done.

@@ -357,6 +357,7 @@
<ClCompile Include="..\Python\_warnings.c" />
<ClCompile Include="..\Python\asdl.c" />
<ClCompile Include="..\Python\ast.c" />
<ClCompile Include="..\Python\ast_unparse.c" />

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Update also PCbuild/pythoncore.vcxproj.filters

This comment has been minimized.

@ambv

ambv Dec 2, 2017
Contributor

Done.

static int
compiler_visit_annexpr(struct compiler *c, expr_ty annotation)
{
static PyObject *ann_as_str;

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Why static?

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

A left-over from something else I tried before. Removed.

if (ret == -1)
return ret;

if (e->v.Yield.value)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

The opening branch should be at the end of the same line as a condition (unless a condition is a multiline). PEP 7.

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Fixed.

if (ret == -1)
return ret;

if (e->v.Yield.value)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Await. Move the opening brace.

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Fixed.

}

static int
append_ast_str(_PyUnicodeWriter *writer, expr_ty e, bool omit_string_brackets)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

According to the meaning of this function it would be better to name the parameter raw.

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Function removed in "Strings are no longer treated special".

case Num_kind:
return append_repr(writer, e->v.Num.n);
case Str_kind:
return append_ast_str(writer, e, omit_string_brackets);

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Constant_kind can be a string too.

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Yes, however, this special handling of strings has since been dropped from the PEP, I'll remove it in the next commit.

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Removed in "Strings are no longer treated special".

case Bytes_kind:
return append_repr(writer, e->v.Bytes.s);
case Ellipsis_kind:
return append_charp(writer, "...");

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Nov 27, 2017
Member

Constant_kind can be the Ellipsis too, in which case it outputs Ellipsis instead of ....

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

Well, yes, because apparently the user wrote "Ellipsis" and not "...". Do you think we should convert to "..." in this case, too?

This comment has been minimized.

@ambv

ambv Dec 31, 2017
Contributor

FWIW:

>>> Ellipsis is ...
True

so this looks relatively low-pri.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Nov 27, 2017

While this is going on, let me continue to emphasize that I plan to accept PEP 563. I don't feel the need to wait until every nit in the code review is addressed, but I do think we need to have agreement on the issue of whether to generate redundant parentheses (e.g. (x + y) + z).

@ambv
Copy link
Contributor

@ambv ambv commented Nov 27, 2017

Today, I'm going to make a pass to address Serhiy's (very detailed!) feedback, esp. the bugs. I'll fix the tab confusion, too.

Next, having those things out of the way, I'm going to attempt what Serhiy's is suggesting (having a priority list for operators to avoid unnecessary parens). That's going to happen later in the week.

I do agree that having fewer rather than more parens is typically more readable. Apart from this reason, why do we want to avoid the extra parens? Note that it's not about preserving original formatting, that one could very well have spurious parens that we are no longer aware of. We are also not preserving whitespace. So, if it's not original formatting then what, the stack overflows? I really don't think we will practically hit stack overflow due to parens in pathological expressions.

If you both consider leaving out the parens important, I'll work on this. It's not free though (takes time and is to some degree risky, as I mentioned above). There's also other things that I must complete for PEP 563 to work as intended, like f-strings in this diff, or forward-ref handling in typing.py in a separate diff.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Nov 27, 2017

@ambv
Copy link
Contributor

@ambv ambv commented Nov 27, 2017

As I said in my comment on Nov 23rd, to the best of my knowledge, the current state of the diff already omits all cases of spurious parens that occur in valid type annotations.

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 27, 2017

Don't spend your time on fighting with the extra parens if this distracts you from more prioritized tasks. If you don't solve this problem in your PR I'm going to do this after its merging. I suppose this will not add too much complexity. Your code already avoid producing the extra parens in many cases. This is enough for the initial implementation.

Yet one consideration. Could it help if introduce macros like the VISIT macro in symtable.c and compile.c? They should call specified function with explicit and implicit arguments, check a result and return a failure if it is failed. Most functions could be just a short sequence of invocations of these macros. This technique is used in many places in CPython sources.

@ambv
Copy link
Contributor

@ambv ambv commented Nov 27, 2017

Could it help if introduce macros like the VISIT macro in symtable.c and compile.c?

I was thinking about this when I was originally writing this but wasn't sure if macros aren't reserved just for special usage so I avoided them. If you'd like, I can refactor the file to use a macro instead, you're right, that should shorten it quite a bit.

ambv and others added 4 commits Nov 10, 2017
The string form is recovered by unparsing the AST.
This is required for PEP 563 and as such only implements a part of the
unparsing process that covers expressions.
Lukasz Langa
@ambv ambv force-pushed the ambv:string_annotations branch from 98231af to 7f88115 Dec 31, 2017
@ambv
Copy link
Contributor

@ambv ambv commented Dec 31, 2017

Alright, this is fully rebased without conflicts and all comments from previous code review are addressed. Things left to do:

  • remove special handling of strings
  • add support for f-strings
  • (maybe?) refactor using a VISIT-style macro
@ambv
Copy link
Contributor

@ambv ambv commented Dec 31, 2017

Special handling of strings removed. I plan to add the missing f-string support in the first week of January so that the implementation is hopefully mergeable in 3.7.0a4.

@ambv
Copy link
Contributor

@ambv ambv commented Jan 16, 2018

Alright, @serhiy-storchaka, this is complete now, including f-string support! I realize it's pretty last minute, sorry for that.

A piece of useless statistics: this pull request was implemented in full during intercontinental flights. There's something very tranquil about sitting in one place for 10+ hours with no distractions.

@auvipy
auvipy approved these changes Jan 23, 2018
class B:
...

Since this change breaks compatibiltiy, the new behavior can be enabled

This comment has been minimized.

@emilyemorehouse

emilyemorehouse Jan 25, 2018
Member

Small typo - compatibiltiy -> compatibility

This comment has been minimized.

@ambv

ambv Jan 25, 2018
Contributor

Fixed, thanks!

@1st1
1st1 approved these changes Jan 25, 2018
Copy link
Member

@1st1 1st1 left a comment

Overall looks good. Code in Python/ast_unparse.c looks fine, I didn't see any refleaks or non-checked return values. I think we can go ahead with this one and merge it, we'll have plenty of time to catch any bugs during the beta/rc period.

@ambv ambv merged commit 95e4d58 into python:master Jan 26, 2018
4 checks passed
4 checks passed
bedevere/issue-number Issue report skipped
bedevere/news "skip news" label found
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants
You can’t perform that action at this time.