Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 646: Decide on substitution behavior #91162

Open
JelleZijlstra opened this issue Mar 13, 2022 · 69 comments
Open

PEP 646: Decide on substitution behavior #91162

JelleZijlstra opened this issue Mar 13, 2022 · 69 comments
Assignees
Labels
3.11 expert-typing type-bug

Comments

@JelleZijlstra
Copy link
Member

@JelleZijlstra JelleZijlstra commented Mar 13, 2022

BPO 47006
Nosy @gvanrossum, @serhiy-storchaka, @JelleZijlstra, @Fidget-Spinner, @mrahtz, @mrahtz, @AlexWaygood
PRs
  • #32341
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/JelleZijlstra'
    closed_at = None
    created_at = <Date 2022-03-13.20:46:16.314>
    labels = ['type-bug', 'release-blocker', '3.11']
    title = 'PEP 646: Decide on substitution behavior'
    updated_at = <Date 2022-04-07.01:50:13.871>
    user = 'https://github.com/JelleZijlstra'

    bugs.python.org fields:

    activity = <Date 2022-04-07.01:50:13.871>
    actor = 'gvanrossum'
    assignee = 'JelleZijlstra'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2022-03-13.20:46:16.314>
    creator = 'JelleZijlstra'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 47006
    keywords = ['patch']
    message_count = 17.0
    messages = ['415100', '415107', '415108', '415109', '415557', '415623', '415637', '415694', '415710', '415712', '415734', '415752', '415753', '416707', '416795', '416813', '416913']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'serhiy.storchaka', 'JelleZijlstra', 'kj', 'matthew.rahtz', 'mrahtz', 'AlexWaygood']
    pr_nums = ['32341']
    priority = 'release blocker'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue47006'
    versions = ['Python 3.11']

    @JelleZijlstra
    Copy link
    Member Author

    @JelleZijlstra JelleZijlstra commented Mar 13, 2022

    We've had some disagreement about the behavior of TypeVarTuple substitution related to PEP-646, and the discussion has now spilled around multiple PRs. I'd like to use this issue to come to an agreement so we don't have to chase through so many different places.

    Links:

    • #75204 (merged) implements PEP-646 in typing.py. Matthew initially implemented full TypeVar substitution support, but took it out after I suggested an alternative solution in #31021 (review).
    • #75981 (merged without review) implements TypeVarTuple substitution in Python.
    • #76009 implements TypeVarTuple substitution in C.
    • #76025 and #76027 add additional test cases.
    • #75985 (closed) implements my proposed solution.

    I'd like to ask that until we come to an agreement we hold off on making any more changes, so we don't have to go back and forth and we ensure that the eventual solution covers all edge cases.

    The disagreement is about what to do with TypeVarTuple substitution: the behavior when a generic type is subscripted, like tuple[*Ts][int, str].

    There are two possible extreme approaches:

    • Implement full substitution support, just as we have it for existing TypeVars. This is complicated because TypeVarTuple makes it much harder to match up the types correctly. However, it is consistent with the behavior for other TypeVar-like objects. My example would turn into GenericAlias(tuple, (int, str)).
    • Give up on substitution and just return a new GenericAlias object: GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str). This avoids implementing any complex runtime behavior, but it inconsistent with existing behavior and less pretty when you print out the type. I prefer this approach because there's less risk that future enhancements to typing will break it. I also want to explore extending this approach to ParamSpec substitution.

    @JelleZijlstra JelleZijlstra self-assigned this Mar 13, 2022
    @JelleZijlstra JelleZijlstra self-assigned this Mar 13, 2022
    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Mar 13, 2022

    Thanks for starting this, Jelle - I was a bit unsure about how to proceed here.

    Given that #31800 is already merged, I'd also propose something halfway between the two extremes: return a sensible substitution when the logic to compute that isn't too onerous, and a new GenericAlias object when it is. The upsides are that we'd probably be able to return reasonable substitutions for the vast majority of cases, and that we wouldn't have to remove what's already been merged. The downsides would be lack of consistency, and the potential for changing rules about what does and doesn't return a full substitution as time goes on and new features are added.

    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Mar 13, 2022

    (Having said that, to be clear: my preferred solution currently would still be the solution where we just return a new GenericAlias for anything involving a TypeVarTuple. The crux is what Serhiy is happy with.)

    @JelleZijlstra
    Copy link
    Member Author

    @JelleZijlstra JelleZijlstra commented Mar 13, 2022

    Thanks Matthew! Merged PRs can still be reverted, and we have some time before the feature freeze. I'd like to hear what Guido and Ken think too.

    If we go with the GenericAlias substitution, we need to make sure that such aliases still work as base class. That would need some C work to make types.GenericAlias.__mro_entries__ recurse if the alias's origin is itself a GenericAlias. There's a few other subtleties to think about; I can work on that but don't have a ton of time today.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Mar 19, 2022

    I am for consistent behavior. If return GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str)) for tuple[*Ts][int, str], we should also return GenericAlias(GenericAlias(list, T), int) for list[T][int], etc. And it will cause multiple problems:

    • A repr can be less readable.
    • It will break equality comparison and hashing. Good bye caching.
    • What about __origin__, __parameters__, __args__? How will they be calculated?
    • It can break code which uses annotations for something. For example it can break dataclasses.

    It may be that will need to use it as a fallback for cases like tuple[T, *Ts][*Ts2] (currently it is error). But I am not sure that such cases should be supported.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Mar 20, 2022

    I think I'm with Serhiy, I don't understand the hesitance to transform tuple[*Ts][int, str] into tuple[int, str].

    What would be an example of a substitution that's too complex to do?

    @JelleZijlstra
    Copy link
    Member Author

    @JelleZijlstra JelleZijlstra commented Mar 20, 2022

    It's simple if you only look at simple examples.

    Here are some examples current main (with Serhiy's patch for the Python version of typing) gets wrong:

    >>> from typing import *
    >>> Ts = TypeVarTuple("Ts")
    >>> T1 = TypeVar("T1")
    >>> T2 = TypeVar("T2")
    >>> Tuple[T1, Unpack[Ts], T2][int, Unpack[tuple[int]]]  # expect error
    typing.Tuple[int, *tuple[int]]
    >>> Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts]]  # expect error (T2 missing)
    typing.Tuple[int, str, *Ts]  # it put *Ts in the wrong place
    >>> Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts], Unpack[Ts]]  # expect error (*Ts can't substitute T2)
    typing.Tuple[int, *Ts, str, *Ts]
    >>> class G(Generic[T1, Unpack[Ts], T2]): pass
    ... 
    >>> G[int]  # expect error
    __main__.G[int]

    We can probably fix that, but I'm not looking forward to implementing the fixed logic in both Python and C. Also, I'm worried that it won't work with future extensions to the type system (e.g., the rumored Map operator) that may go into 3.12 or later versions.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Mar 21, 2022

    The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

    Two other cases will be fixed by GH 32031. It does not require any C code.

    In the last case no error is raised because some error checks are skipped if any of Generic arguments is a TypeVarTuple. We just need to add such checks. This is Python-only code too.

    Note that the alternative proposition is even more lenient to errors.

    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Mar 21, 2022

    [Guido]

    What would be an example of a substitution that's too complex to do?

    We also need to remember the dreaded arbitrary-length tuple. For example, I think it should be the case that:

    T = TypeVar('T')
    Ts = TypeVarTuple('Ts')
    class C(Generic[*Ts]): pass
    Alias = C[T, *Ts]
    Alias2 = Alias[*tuple[int, ...]]
    # Alias2 should be C[int, *tuple[int, ...]]

    Ok, this is a bit of a silly example, but if we're committing to evaluating substitutions correctly, we should probably make even this kind of example behave correctly so that users who accidentally do something silly can debug what's gone wrong.

    [Serhiy]

    A repr can be less readable.

    Definitely true.

    It will break equality comparison and hashing. Good bye caching.

    Huh, I didn't know about this one. Fair enough, this is totally a downside.

    What about __origin__, __parameters__, __args__? How will they be calculated?

    This could admittedly be thorny. We'd have to think it through carefully. Admittedly also a downside.

    It can break code which uses annotations for something. For example it can break dataclasses.

    Oh, also interesting - I didn't know about this one either. Could you give an example?

    The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

    We actually deliberately chose not to unpack concrete tuple types - see the description of #30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

    Two other cases will be fixed by GH 32031. It does not require any C code.

    I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?) seems like too restrictive a solution. I can imagine there might be completely legitimate cases where the ability to do this would be important. For example:

    DType = TypeVar('DType')
    Shape = TypeVarTuple('Shape')
    class Tensor(Generic[DType, *Shape]): ...
    Uint8Tensor = Tensor[uint8, *Shape]
    Unit8BatchTensor = Uint8Tensor[Batch, *Shape]

    Note that the alternative proposition is even more lenient to errors.

    True, but at least it's predictably lenient to errors - I think the repr makes it very clear that "Woah, you're doing something advanced here. You're on your own!" I think it better fits the principle of least astonishment to have something that consistently lets through all errors of a certain class than something that sometimes catches errors and sometimes doesn't.

    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Mar 21, 2022

    P.s. To be clear, (I think?) these are all substitutions that are computable. We *could* implement the logic to make all these evaluate correctly if we wanted to. It's just a matter of how much complexity we want to allow in typing.py (or in the runtime in general, if we say farmed some of this logic out to a separate module).

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Mar 22, 2022

    I'd like to look at this as a case of simplifying something to its simplest canonical form, but no simpler. This is what the existing fixed-typevar expansion does: e.g. tuple[str, T, T][int] becomes tuple[str, int, int].

    I propose that we try to agree on a set of rules for what can be simplified further and what cannot, when we have B = C[...]; A = B[...], (IOW A = C[...][...]), for various shapes of the subscripts to C and B. Note that what's relevant for the second subscript is C[...].__parameters__, so I'll call that "left" below.

    1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

    2. Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

    3. If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

    4. If there's a *Ts on the right but not on the left, we should _not_ simplify, since whatever we have on the left serves as a constraint for *Ts. (E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).)

    5. If there's exactly one *Ts on the left and one on the right, we _might__ be able to simplify if the prefix and suffix of the __parameters__ match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that __parameters__ in this example is (T, Ts); we have to assume that typevartuples in __parameters__ are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

    TBH case 5 is the most complex and I may have overlooked something. I'm more sure of cases 1-4.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Mar 22, 2022

    Alias = C[T, *Ts]
    Alias2 = Alias[*tuple[int, ...]]

    Alias2 should be C[int, *tuple[int, ...]]

    tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

    Oh, also interesting - I didn't know about this one either. Could you give an example?

    If __origin__, __parameters__, __args__ are a mess, it will definitely break a code which use them.

    We actually deliberately chose not to unpack concrete tuple types - see the description of #30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

    You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.__annotations__['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

    And one of PEP-646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

    I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

    No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

    I propose to implement simple and strict rules, and later add support of new cases where it makes sense.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Mar 22, 2022

    1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

    I do not understand this. Do you forbid simplifying of tuple[*Ts, float][str, *tuple[int, ...]] to tuple[str, *tuple[int, ...], float]?

    I think that the rule should be that *tuple[X, ...] cannot split between different variables. Or that it cannot substitute a TypeVar. A more strong variant of rule 4.

    1. ... but we cannot flag it as an error either.

    I think that it will better to flag it as an error now. Later, after all code be merged and all edge cases be handled we can return here and reconsider this.

    There are workarounds for this.

    • You should not use Generic[*Ts] if you require at least one item, but Generic[*Ts, T].
    • Instead of def foo(*args: *Ts) use def foo(*args: *tuple[*Ts, T]).

    These tricks are common in functional programming.

    The rest of the rules match my implementations more or less.

    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Apr 4, 2022

    Apologies for the slow reply - coming back to this now that the docs and pickling issues are mostly sorted.

    [Serhiy]

    > Alias = C[T, *Ts]
    > Alias2 = Alias[*tuple[int, ...]]
    > # Alias2 should be C[int, *tuple[int, ...]]

    tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

    This was my initial intuition too, but Pradeep pointed out to me in #31021 (comment) that for tuple[int, ...], Python has chosen the opposite mindset: instead of assuming the worst-case scenario, we assume the best-case scenario. Thus, the following type-checks correctly with mypy (https://mypy-play.net/?mypy=latest&python=3.10&gist=b9ca66fb7d172f939951a741388836a6):

    def return_first(tup: tuple[int, ...]) -> int:
        return tup[0]
    tup: tuple[()] = ()
    return_first(tup)

    > We actually deliberately chose not to unpack concrete tuple types - see the description of #30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

    You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.__annotations__['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

    Fair point, we could technically distinguish between tuple[str, bool] and (str, bool). But if I was a naive user and I saw foo.__annotations__['args'] == (str, bool), I don't think it'd be immediately obvious to me that the type of args was *tuple[str, bool].

    Also though, there's a second reason mentioned in #30398 why (str, bool) wouldn't be the best choice. We decided that the runtime behaviour of *args: *something should be that we essentially do (*something,)[0]. If we made tuple[int, str] unpack to (int, str), then we'd end up with __annotations__['args'] == (int,).

    And one of PEP-646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

    As in, we would allow Generic[*Ts], but not *args: *Ts? That'd be a major change to the PEP - not an option I'm willing to consider at this stage in the process.

    > I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

    No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

    Ah, gotcha. My mistake.

    [Guido]

    I ran out of time this evening :) Will reply properly soon.

    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Apr 5, 2022

    [Guido]

    1. Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify.

    Alright, let me think this through with some examples to get my head round it.

    It would prohibit the following difficult case:

    class C(Generic[*Ts]): ...
    Alias = C[T, *Ts]
    Alias[*tuple[int, ...]]  # Does not simplify; stays C[T, *Ts][*tuple[int, ...]]

    That seems pretty reasonable. It would also prohibit these other relatively simple cases, but I guess that's fine:

    Alias = C[*Ts]
    Alias[*tuple[int, ...]]  # Does not simplify; stays C[*Ts][*tuple[int, ...]]
    
    Alias = C[T, *tuple[int, ...]]
    Alias[str]  # Does not simplify; stays C[T, *tuple[int, ...]][str]

    Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

    Is this to say that we effectively prohibit binding *tuple[...] to anything? If we can simplify without binding *tuple[...] to anything, then we do simplify, but otherwise, we don't simplify? So under this rule, the following WOULD work?

    Alias = C[T, *tuple[int, ...]]
    Alias[str]  # Simplifies to C[str, *tuple[int, ...]], because we didn't have to bind *tuple[int, ...] to do it
    1. Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

    Alright, so this is business as usual.

    1. If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

    So then:

    class C(Generic[*Ts]): ...
    Alias = C[T, *Ts]
    Alias[()]              # Raises error
    Alias[int]             # Simplifies to C[int, *Ts]
    Alias[int, str]        # Simplifies to C[int, str]
    Alias[int, str, bool]  # Simplifies to C[int, str, bool]

    Yup, seems straightforward.

    1. If there's a *Ts on the right but not on the left, we should _not_ simplify, since whatever we have on the left serves as a constraint for *Ts.

    Ok, so this is about the following situations:

    class C(Generic[*Ts]): ...
    Alias = C[T1, T2]
    Alias[*Ts]  # Does not simplify; stays C[T1, T2][*Ts]

    Yikes - in fact, this is actually super hairy; I hadn't thought about this edge case at all in the PEP.

    Agreed that it seems reasonable not to simplify here.

    E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).

    Was that a typo? Surely tuple[int, int][*Ts] isn't valid - since tuple[int, int] doesn't have any free parameters?

    1. If there's exactly one *Ts on the left and one on the right, we _might__ be able to simplify if the prefix and suffix of the __parameters__ match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that __parameters__ in this example is (T, Ts); we have to assume that typevartuples in __parameters__ are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

    Ok, this also makes sense.

    ---

    Still, though, doesn't the point that Serhiy brought up about __origin__, __parameters__ and __args__ still apply? In cases where we *don't* simplify, there'd still be the issue of what we'd set these things to be.

    This evening I'll also revisit the PRs adding tests for substitution to try and make them a comprehensive reference as to what's currently possible.

    @mrahtz
    Copy link
    Mannequin

    @mrahtz mrahtz mannequin commented Apr 5, 2022

    Ok, https://github.com/python/cpython/pull/32341/files is a reference of how the current implementation behaves. Fwiw, it *is* mostly correct - with a few minor tweaks it might be alright for at least the 3.11 release.

    In particular, instead of dealing with the thorny issue of what to do about splitting unpacked arbitrary-length tuples over multiple type variables - e.g. C[T, *Ts][*tuple[int, ...]] - instead either deciding to try and evaluate it properly and living with the complexity, or leaving it unsimplified and living with the __args__, __parameters__ and __origin__ problem - for now, we could just raise an exception for any substitutions which involve an unpacked arbitrary-length tuple, since I'd guess it's going to be an extremely rare use-case.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Apr 7, 2022

    We need to move on this, because the outcome of this discussion is a release blocker for 3.11b1 -- the next release!

    @gvanrossum gvanrossum added release-blocker type-bug labels Apr 7, 2022
    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented Apr 10, 2022

    Copying in correspondence by email while issues were being migrated:

    @mrahtz:

    (Temporarily moving this discussion to email, since my understanding is that the issue tracker will be down for ~24 hours while it's being migrated to GitHub, and it sounds like this might be getting urgent-ish.)

    [Guido]

    We need to move on this, because the outcome of this discussion is a release blocker for 3.11b1 -- the next release!

    Oh yikes, I didn't realise. To be clear, what's the timeframe? I see the release schedule for 3.11 says that 3.11b1 is scheduled for 2022-05-06. Is the point that we need to have this discussion resolved and all the implementation finished by? Or is the very fact that this discussion is still underway blocking other folks right now?

    (And for my general education: is the point of the remaining beta releases that they're supposed to be bugfix-only?)

    @JelleZijlstra:

    Oh yikes, I didn't realise. To be clear, what's the timeframe? I see the release schedule for 3.11 says that 3.11b1 is scheduled for 2022-05-06. Is the point that we need to have this discussion resolved and all the implementation finished by? Or is the very fact that this discussion is still underway blocking other folks right now?

    Yes, we should not be changing any features after the feature freeze in early May.

    (And for my general education: is the point of the remaining beta releases that they're supposed to be bugfix-only?)

    That's right. This should enable people to test without having to worry about new features.

    @mrahtz:

    Alright, by the power invested in me as lead author of the PEP, I say let's go with implementing full substitution support (or as close to full as we can reasonably get it).

    The reasoning is that a) I think we're closer to a working, consistent implementation with full substitution support compared to the "partially-evaluated substitution" approach; and b) Serhiy's point about what to do about args etc has got me worried; a "partially-evaluated substitution" would be a new kind of thing in the typing system, and I think it could easily have further implications we won't have time to discover/work through before 3.11b1. It's true that the logic will be a bit complicated, and that as Jelle says it could complicate implementation of future typing features, but I think the latter is partly an inevitable consequence of the fact that variadic generics simply are complicated, and overall I still think the tradeoff is worth it.

    So unless anyone has a reason to veto this, let's proceed as follows:

    1. Merge Serhiy's C code for substitution in #31828
    2. Discuss #32341 until we're happy with what we think the behaviour should be (there are still a few in there I'm not entirely sure about myself), and use that as a basis for fixing the remaining issues
    3. Reserve the right to raise NotImplementedError for the trickiest cases (e.g. splitting of *tuple[int, ...] across multiple type variables). Ideally it'd be nice to raise a message saying something like "If you think you need this, please leave a comment with your use case at " - is there any precedent for something like this being possible?

    @serhiy-storchaka:

    1. Discuss #32341 until we're happy with what we think the behaviour should be (there are still a few in there I'm not entirely sure about myself), and use that as a basis for fixing the remaining issues

    There were many changes in the code related to ParamSpec and Concatenate during the beta stage of 3.10 and even after the release. The current implementation of ParamSpec substitution is based on "analogies" and "reading between lines" rather of a concrete specification.

    1. Reserve the right to raise NotImplementedError for the trickiest cases (e.g. splitting of *tuple[int, ...] across multiple type variables). Ideally it'd be nice to raise a message saying something like "If you think you need this, please leave a comment with your use case at " - is there any precedent for something like this being possible?

    Callable[Concatenate[int, P], str][...] raises a type error because the result cannot be expressed in a form which would not contradict existing PEPs.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Apr 11, 2022

    I feel I need to add this same remark here:

    @mrahtz @JelleZijlstra @serhiy-storchaka Is it okay if I unsubscribe from these conversations and let you all come up with a compromise? I feel that while we ought to have a policy formulated and mostly implemented by beta 1 (May 6), tweaks of both the policy and the implementation during the beta period until RC 1 (Aug/Sept?) should be allowable.

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented Apr 15, 2022

    Ok, thinking about things more in #32341, I would propose the following spec:

    • In general, we aim to accurately evaluate type variable substitution, supporting as many valid aliases as we can.
    • Exception: if an unpacked arbitrary-length tuple (e.g. *tuple[int, ...]) is used in a generic alias type args list, then:
      • It must be the only type argument
        • Alias[str, *tuple[int, ...]] would not be valid
        • Alias[*tuple[int, ...], *tuple[str, ...]] would not be valid
        • Only Alias[*tuple[int, ...]] would be valid
      • The alias's sole type parameter must be an unpacked TypeVarTuple
        • Alias = tuple[T, *Ts]; Alias[*tuple[int, ...]] would not be valid
        • Alias = tuple[T1, T2]; Alias[*tuple[int, ...]] would not be valid
        • Only Alias = tuple[*Ts]; Alias[*tuple[int, ...]] would be valid

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 21, 2022

    [Pradeep]

    Sorry for the slow reply too. Busy week :(

    The PEP defines the behavior for (a) MultiDeviceArray by saying that it behaves the same as (b) MultiDeviceArray[*tuple[Any, ...]]

    I've just realised - is this actually what we said in the PEP? https://peps.python.org/pep-0646/#behaviour-when-type-parameters-are-not-specified says: (emphasis mine)

    When a generic class parameterised by a type variable tuple is used without any type parameters, it behaves as if the type variable tuple was substituted with Tuple[Any, ...].

    The section on aliases, https://peps.python.org/pep-0646/#aliases, says similarly:

    If the type parameter list is omitted entirely, the unspecified type variable tuples are treated as Tuple[Any, ...]

    That is, only the TypeVarTuple is substituted with Tuple[Any, ...]. Iiuc, this is different from saying that the whole argument list is set to *Tuple[Any, ...].

    Admittedly we maybe could have been clearer on this in the PEP. I don't think we ever say explicitly what type is assigned to the other type parameters in a case like this.

    But I think it's natural to assume that TypeVars in the parameter list should be assigned to individual Anys. So in this case:

    DType = TypeVar('DType')
    Shape = TypeVarTuple('Shape')
    class Array(Generic[DType, *Shape]): ...
    
    MultiDeviceArray = Array[DType, *Shape]

    I think that MultiDeviceArray should actually behave like Array[Any, *tuple[Any, ...]], rather than Array[*tuple[Any, ...]].

    Is this analysis correct?

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented May 21, 2022

    Looks like the only interpretation that makes sense. If you write dict without parameters it has two Any parameters by default. It does not become dict[*Any].

    @pradeep90
    Copy link
    Contributor

    @pradeep90 pradeep90 commented May 21, 2022

    [Matthew]

    Is this analysis correct?

    Yes, that is indeed how Pyre handles aliases with missing type arguments. T is replaced with Any and Ts is replaced with tuple[Any, ...]. I misspoke about the analogy between MultiDeviceArray and MultiDeviceArray[*tuple[Any, ...]], even though they both produce the same result.


    [Matthew]

    Admittedly we maybe could have been clearer on this in the PEP. I don't think we ever say explicitly what type is assigned to the other type parameters in a case like this.

    That's not true. We explicitly mention the behavior for Foo[*tuple[Any, ...]] in the following paragraph and code snippet from the PEP:

    Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]

    class Array(Generic[*Shape]): ...  # I'm including the definition of Array for completeness.
    
    y: Array[*Tuple[Any, ...]] = read_from_file()
    
    def expect_variadic_array(
        x: Array[Batch, *Shape]
    ) -> None: ...
    
    expect_variadic_array(y)  # OK
    
    1. In this case, the user sees that passing an Array[*tuple[Any, ...]] to an expected Array[Batch, *Shape] leads to Batch=Any, Shape=tuple[Any, ...]. So, the concept of splitting an unbounded tuple across Batch, *Shape is specified in the PEP.

      The exact same thing goes for passing an Array or an Array[*tuple[int, ...], str] to an expected Array[Batch, *Shape]. These too require splitting an unbounded tuple across a mix of TypeVars and TypeVarTuples.

    2. However, if the user defined a perfectly-valid alias, MultiDeviceArray = Array[Batch, *Shape], and tried to use MultiDeviceArray[*tuple[Any, ...]], then it would fail at runtime (as per your proposal).

      This would be really surprising. There is no justification why (1) is able to split *tuple[Any, ...] across Batch, *Shape, but (2) fails at runtime.

    In other words, the splitting behavior of unbounded tuples over a mix of TypeVar and TypeVarTuple is important and specified in general, not just for the niche case of runtime alias application. Restricting the splitting for aliases at runtime, while allowing it everywhere else would be a bad, inconsistent user experience.

    Lmk if that makes sense. I'm happy to hop on a VC call this week, if this is something you'd like to discuss further. Might be faster than waiting for each other's replies :)

    Road ahead

    We're back to the original question of what to do about Array[T, *Ts][*tuple[int, ...], str]. I see two choices ahead:

    1. Defer its implementation because it's too much work now - this seems reasonable.

      If you're curious about how Pyre (and TypeScript) handle cases like Array[T, *Ts, T2][*tuple[int, ...], bool, str], we bind the TypeVars from either side till we reach an unpacked element. So, we would bind T = int and T2 = str. Then we align the remaining middle portion - Ts = tuple[*tuple[int, ...], bool]. Happy to discuss in more detail if needed.

      (I hope we agree that this is doable at runtime but involved.)

    2. Forbid it in the PEP because the runtime computation would be involved - this seems less than ideal. We seem to be going in the opposite direction from the runtime being more lenient than the type checker.

      If we really are going that way, we should probably discuss what it implies for future PEPs. Will they also have to restrict things because the runtime is trying to compute exact values (for generic aliases, etc.)?

    I favor option (1). Curious to hear your thoughts, since it's not clear which way you are leaning.

    Footnote

    If the question is why would anyone specify MultiDeviceArray[*tuple[int, ...]] explicitly, we address that in the PEP:

    This allows users to handle dynamic code gracefully while still explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, ...]]). Otherwise, users would face noisy errors from the type checker every time they tried to use the variable y, which would hinder them when migrating a legacy code base to use TypeVarTuple.

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 22, 2022

    [Pradeep]

    Lmk if that makes sense. I'm happy to hop on a VC call this week, if this is something you'd like to discuss further. Might be faster than waiting for each other's replies :)

    For this kind of thing, I think text works better for me - it's helpful to have time to think. Thanks for offering, though :)


    Admittedly we maybe could have been clearer on this in the PEP. I don't think we ever say explicitly what type is assigned to the other type parameters in a case like this.

    That's not true. We explicitly mention the behavior for Foo[*tuple[Any, ...]] in the following paragraph and code snippet from the PEP:

    Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]

    class Array(Generic[*Shape]): ...  # I'm including the definition of Array for completeness.
    
    
    y: Array[*Tuple[Any, ...]] = read_from_file()
    
    def expect_variadic_array(
        x: Array[Batch, *Shape]
    ) -> None: ...
    
    expect_variadic_array(y)  # OK

    Isn't Batch here a type, rather than a TypeVar?

    If the question is why would anyone specify MultiDeviceArray[*tuple[int, ...]] explicitly, we address that in the PEP:

    This allows users to handle dynamic code gracefully while still explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, ...]]). Otherwise, users would face noisy errors from the type checker every time they tried to use the variable y, which would hinder them when migrating a legacy code base to use TypeVarTuple.

    Didn't we write that assuming that Array was generic in only a TypeVarTuple? In the case of MultiDeviceArray, if we wanted to explicitly mark it as unsafe, wouldn't we write something like MultiDeviceArray[tf.dtype, *tuple[int, ...]]?


    We're back to the original question of what to do about Array[T, *Ts][*tuple[int, ...], str].
    ...
    I favor option (1). Curious to hear your thoughts, since it's not clear which way you are leaning.

    Ah, sorry if I wasn't explicit enough on this earlier - yeah, for this case, I'm also on board with deferring its implementation at runtime until someone finds a use-case for it - both because it could take a while to implement and test, and because I think Jelle has a remit to try and keep complexity out of typing.py.

    I definitely agree it's not worth banning in the PEP.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented May 22, 2022

    I hesitate to add anything, but here's something nevertheless.

    I feel that there's something missing from our notation, and that is how to indicate that the number of dimensions is Any (and shouldn't be complained about). This same problem occurs for varargs functions, e.g. this is accepted:

    def f(a: int, *b: int) -> int: ...
    
    t: tuple[int, ...] = ()
    
    f(*t)  # OK in mypy, fails at runtime

    even though f(*()) is rejected.

    We should probably just accept this. Certainly it seems PEP 646 accepted this for some cases already (the quote by Pradeep), so it seems fair to accept it in the case of a type alias as well.

    OT: While reading these deliberations, am beginning to find the notation Foo[A, *tuple[B, ...], C] pretty tedious. Maybe in a future PEP we can allow simplifying this to Foo[A, B, ..., C]? IIRC we never went there because we worried that there might be other interpretations (like B, ... might mean a B plus zero or more Anys, rather than zero or more Bs), but maybe we should just pick a meaning and stick to it. Users could also be confused by tuple[B, ...] but we've defined it and move on.

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 22, 2022

    [Guido]

    I feel that there's something missing from our notation, and that is how to indicate that the number of dimensions is Any (and shouldn't be complained about).

    Assuming that:

    DType = TypeVar('DType')
    Shape = TypeVarTuple('Shape')
    class ArrayShapeOnly(Generic[*Shape]): ...
    class ArrayDTypeAndShape(Generic[DType, *Shape]]): ...

    Doesn't ArrayShapeOnly[*tuple[SomeType, ...]] / ArrayDTypeAndShape[SomeDType, *tuple[SomeType, ...]] address this case?

    Or is your point that there should be some nicer syntax for specifying this? In practice, I'd expect most library authors to cater for this with an alias along the lines of:

    ArrayWithArbitraryNumDimensions = ArrayDTypeAndShape[DType, *tuple[Any, ...]]

    So maybe special syntax wouldn't be necessary?


    This same problem occurs for varargs functions, e.g. this is accepted:

    def f(a: int, *b: int) -> int: ...
    
    t: tuple[int, ...] = ()
    
    f(*t)  # OK in mypy, fails at runtime even though f(*()) is rejected.

    To check that I understand:

    • f(*t) fails at runtime, because there aren't enough elements in t to actually assign to a and b, but is ok in mypy, because mypy "tries to find a way to make it work" by assuming the arbitrary-length tuple is of the right size to proceed
    • f(*()) fails at runtime, but also in mypy, because here mypy doesn't have an arbitrary number of types to play with

    I might just be tired, but could you expand on the connection between this case and the "arbitrary number of dimensions" case?


    OT: While reading these deliberations, am beginning to find the notation Foo[A, *tuple[B, ...], C] pretty tedious. Maybe in a future PEP we can allow simplifying this to Foo[A, B, ..., C]? IIRC we never went there because we worried that there might be other interpretations (like B, ... might mean a B plus zero or more Anys, rather than zero or more Bs), but maybe we should just pick a meaning and stick to it. Users could also be confused by tuple[B, ...] but we've defined it and move on.

    I hear you that it's a bit of a tedious syntax. And yes, you're right - the reason I'm hesitant to use Foo[A, B, ..., C] is that I think we might want to make Foo[A, B, ..., C] mean "A, then B, then an arbitrary number of Any, then C". I continue to think the latter meaning would be much more useful for array typing purposes (and also more intuitive, but I appreciate that's subjective).

    But I don't think we should discuss it yet, because it's a decision that'll impact a lot of future users of array typing who don't exist yet and who therefore can't weigh in on the discussion. Ideally we'd talk about it once, say, NumPy has adopted PEP 646, and people have been using it for NumPy annotations for a good 6 months or so.

    @pradeep90
    Copy link
    Contributor

    @pradeep90 pradeep90 commented May 23, 2022

    [Pradeep]

    Lmk if that makes sense. I'm happy to hop on a VC call this week, if this is something you'd like to discuss further. Might be faster than waiting for each other's replies :)

    For this kind of thing, I think text works better for me - it's helpful to have time to think. Thanks for offering, though :)

    Fine by me!

    class Array(Generic[*Shape]): ...  # I'm including the definition of Array for completeness.
    
    
    y: Array[*Tuple[Any, ...]] = read_from_file()
    
    def expect_variadic_array(
        x: Array[Batch, *Shape]
    ) -> None: ...
    
    expect_variadic_array(y)  # OK

    Isn't Batch here a type, rather than a TypeVar?

    Batch is a TypeVar. The PEP text mentions that "Batch is bound to Any and Shape is bound to Tuple[Any, ...]". Maybe it should be called BatchVar to distinguish from the named dimension :|

    The idea I was getting across was that even if Array is only generic in *Shape, a function could expect y: Array[N1, N2, *Shape, N3]. Now, if someone passes y: Array to the function, we would need to split the unbounded tuple *tuple[Any, ...] across N1, N2, *Shape, N3. That would work.

    But if users tried to define a generic alias MyArray = Array[N1, N2, *Shape, N3] and tried to use y: MyArray[*tuple[Any, ...]], that would fail, even though it's the same splitting as above.

    That is why I was arguing against forbidding this. Since we agree on not forbidding this in the PEP, no worries.

    If the question is why would anyone specify MultiDeviceArray[*tuple[int, ...]] explicitly, we address that in the PEP:

    This allows users to handle dynamic code gracefully while still explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, ...]]). Otherwise, users would face noisy errors from the type checker every time they tried to use the variable y, which would hinder them when migrating a legacy code base to use TypeVarTuple.
    Didn't we write that assuming that Array was generic in only a TypeVarTuple?
    In the case of MultiDeviceArray, if we wanted to explicitly mark it as unsafe, wouldn't we write something like MultiDeviceArray[tf.dtype, *tuple[int, ...]]?

    If we had an alias such as MyArray = Array[N1, N2, *Shape, N3], we would have to write MyArray[int, int, *tuple[int, ...], int] (as per the proposal in this thread). Otherwise, MyArray[*tuple[int, ...] would be a runtime error.

    So, yes, it is always feasible to add type arguments separately for each TypeVar in a generic alias. It would be a bad user experience to get a runtime error first, but I think it's reasonable to punt on the implementation for now.

    I favor option (1). Curious to hear your thoughts, since it's not clear which way you are leaning.

    Ah, sorry if I wasn't explicit enough on this earlier - yeah, for this case, I'm also on board with deferring its implementation at runtime until someone finds a use-case for it - both because it could take a while to implement and test, and because I think Jelle has a remit to try and keep complexity out of typing.py.

    I definitely agree it's not worth banning in the PEP.

    +1. Hope there's nothing else blocking this thread, then.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented May 23, 2022

    First responding to Pradeep: What does "punting on the implementation" mean in practice? Does it just mean that repr(MyArray[*tuple[int, ...]]) might be Array[N1, N2, *Shape, N3][*tuple[int, ...]] rather than Array[int, int, *tuple[int, ...], int]? If so, I am fine with that.

    Next to Matthew, in reverse order:

    (c) I will happily await discussing the semantics of A[T, ..., S] until there's actual experience with PEP 646 (i.e., well after 3.11 is released).

    (b) Yes, you understand my claim about f(*t) vs. f(*()). The reason I connect vararg variable substitutions with vararg function calls is that in both cases we have a single notation that means "arbitrary number" which is generally understood to mean "zero or more" but which is also acceptable in situations where the requirement is "N or more, for some N larger than zero". (We emphasize that an empty tuple is a valid match for tuple[int, ...].)

    (a) I would be happy with the syntax A[X, *tuple[X, ...]] to indicate "A with 1 or more X parameters". However, we accept some other type whose meaning is "A with 0 or more X parameters" as a valid substitution. So really it appears that as soon as we have a type whose parameter count has no upper bound, we appear to drop the lower bound -- we cannot distinguish between "count >= 0" and "count >= 1". I drew the analogy with function calls because they have the same issue and we seem to be fine with that.

    @pradeep90
    Copy link
    Contributor

    @pradeep90 pradeep90 commented May 23, 2022

    [Guido]

    First responding to Pradeep: What does "punting on the implementation" mean in practice? Does it just mean that repr(MyArray[*tuple[int, ...]]) might be Array[N1, N2, *Shape, N3][*tuple[int, ...]] rather than Array[int, int, *tuple[int, ...], int]? If so, I am fine with that.

    Right now, MyArray[*tuple[int, ...]] gives a runtime error. IIUC Matthew was suggesting that we leave it as such - giving a runtime error - until we get user requests for this form. The reasoning for deferring this was that this needs some more work to implement, and there exist long-form ways to represent the type MyArray[*tuple[int, ...]], i.e., Array[int, int, *tuple[int, ...], int].

    Will defer to Matthew in case he wants to clarify.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented May 23, 2022

    Leaving it a runtime error does go against the general trend to be more lenient at runtime than in the static checker, though. So if there's another approach that's not too much effort I'd rather not have the runtime error. If we leave it a runtime error in 3.11.0, could we fix it in 3.11.1, or would we have to wait until 3.12? If 3.11.1, that would be acceptable.

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 23, 2022

    Long thread is long

    The issue of tuple[T, *Ts][*tuple[int, ...]]

    [Pradeep]

    Isn't Batch here a type, rather than a TypeVar?

    Batch is a TypeVar. The PEP text mentions that "Batch is bound to Any and Shape is bound to Tuple[Any, ...]". Maybe it should be called BatchVar to distinguish from the named dimension :|

    Yeah, I was wondering about that. I think we made a typo. In the immediately preceding context, I'm pretty sure we were talking about a function that expected a shape beginning with the Batch type:

    def process_batch_channels(
        x: Array[Batch, *Tuple[Any, ...], Channels]
    ) -> None:
        ...
    
    x: Array[Batch, Height, Width, Channels]
    process_batch_channels(x)  # OK
    y: Array[Batch, Channels]
    process_batch_channels(y)  # OK
    z: Array[Batch]
    process_batch_channels(z)  # Error: Expected Channels.

    Thinking about it some more, I realise that in the example in question...

    y: Array[*Tuple[Any, ...]] = read_from_file()
    
    def expect_variadic_array(
        x: Array[Batch, *Shape]
    ) -> None: ...
    
    expect_variadic_array(y)  # OK

    ...you're right, in that even if we do assume Batch to be a type here, we do in some sense have to 'match up' the Batch with Any. But of course, this is a different thing from type substitution.

    So yeah, I think the current phrasing in the PEP is inaccurate:

    Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]. In the call to expect_precise_array, the variables Batch, Height, Width, and Channels are all bound to Any.

    I think it should be something like:

    Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, we're checking the Batch type for compatibility with Any, and binding Shape to Tuple[Any, ...]. In the call to expect_precise_array, we're similarly checking the Batch, Height, Width and Channels types for compatibility with Any.

    What do you reckon?


    [Pradeep]

    The idea I was getting across was that even if Array is only generic in *Shape, a function could expect y: Array[N1, N2, *Shape, N3]. Now, if someone passes y: Array to the function, we would need to split the unbounded tuple *tuple[Any, ...] across N1, N2, *Shape, N3. That would work.

    But if users tried to define a generic alias MyArray = Array[N1, N2, *Shape, N3] and tried to use y: MyArray[*tuple[Any, ...]], that would fail, even though it's the same splitting as above.

    To check whether I understand - I think we're talking about the following:

    Shape = TypeVarTuple('Ts')
    N1 = TypeVar('N1')
    N2 = TypeVar('N2')
    N3 = TypeVar('N3')
    
    class Array(Generic[*Shape]): ...
    
    def foo(y: Array[N1, N2, *Shape, N3]): ...
    
    y1: Array
    # This is fine conceptually, both because we've decreed that plain `Array`
    # should be backwards-compatible with `Array[anything]`, and because it's
    # the same as `Array[*tuple[Any, ...]]`, and we can happily assign `Any` to
    # `N1`, `N2` and `N3`, and `*tuple[Any, ...]` to `Shape`.
    # It's also fine in our current runtime implementation.
    foo(y1)
    
    y2: Array[*tuple[Any, ...]]
    # This is also fine conceptually, because it means the same thing as `y: Array`.
    # It's also fine in our current runtime implementation.
    foo(y2)
    
    MyArray = Array[N1, N2, *Shape, N3]
    # This is the same thing as what happens when we do foo(y2) -
    # in both cases, we're comparing `N1, N2, *Shape, N3` and `*tuple[Any, ...]` -
    # but this one won't work in our current runtime implementation.
    y3: MyArray[*tuple[Any, ...]]

    ...ok, this example has persuaded me. It's true that if a function which expects a generic type with type parameters N1, N2, *Shape, N3 can accept type arguments *Tuple[Any, ...], then a generic alias with type parameters N1, N2, *Shape, N3 should also be able to accept type arguments *Tuple[Any, ...]. The only reason the former works and not the latter in the runtime currently is that in the former case we don't actually need to compute the assignment of types to type parameters explicitly. They're both conceptually sound.

    So I'll work on getting my PR fixed up.

    The issue of tuple[T, *Ts][*tuple[int, ...], str]

    [Pradeep]

    IIUC Matthew was suggesting that we leave it as such - giving a runtime error - until we get user requests for this form. The reasoning for deferring this was that this needs some more work to implement, and there exist long-form ways to represent the type MyArray[*tuple[int, ...]], i.e., Array[int, int, *tuple[int, ...], int].

    Yeah, so I am suggesting that we leave tuple[T, *Ts][*tuple[int, ...], str] as a runtime error for now.

    [Guido]

    Leaving it a runtime error does go against the general trend to be more lenient at runtime than in the static checker, though. So if there's another approach that's not too much effort I'd rather not have the runtime error. If we leave it a runtime error in 3.11.0, could we fix it in 3.11.1, or would we have to wait until 3.12? If 3.11.1, that would be acceptable.

    I agree it's not ideal, but for this case specifically, I'm not sure there's a simple alternative. I don't think the idea of leaving it partially evaluated - such that tuple[T, *Ts][*tuple[int, ...], str] evaluated to tuple[T, *Ts][*tuple[int, ...], str] and nothing simpler - is possible, for reasons that Serhiy pointed out: what would __args__ and __params__ be in this case? What should things that rely on __args__ and __params__ (Serhiy mentioned dataclasses as one such example) do?

    I think we're pretty unlikely to see anyone do this though. It's a super weird thing to do. I wouldn't bet my life on it, but I'd be surprised.

    The issue of what ... should mean

    ...I'll reply to this one in the next few days :)

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented May 23, 2022

    So yeah, I think the current phrasing in the PEP is inaccurate:

    Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]. In the call to expect_precise_array, the variables Batch, Height, Width, and Channels are all bound to Any.

    I think it should be something like:

    Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, we're checking the Batch type for compatibility with Any, and binding Shape to Tuple[Any, ...]. In the call to expect_precise_array, we're similarly checking the Batch, Height, Width and Channels types for compatibility with Any.

    What do you reckon?

    Agreed. Batch does not have to be a typevar here for the examples to clarify the issues.

    Regarding punting on the implementation, your proposal is to only punt on the case where there is an extra type (str in your example) in the actual parameters following *tuple[int, ...], right? I can live with that then, though if Serhiy manages to fix it I wouldn't stop him.

    @pradeep90
    Copy link
    Contributor

    @pradeep90 pradeep90 commented May 24, 2022

    ...ok, this example has persuaded me.

    🎉

    Batch is a TypeVar. The PEP text mentions that "Batch is bound to Any and Shape is bound to Tuple[Any, ...]". Maybe it should be called BatchVar to distinguish from the named dimension :|

    Yeah, I was wondering about that. I think we made a typo
    What do you reckon?

    Actually, I meant Batch to be a TypeVar in that example, precisely to demonstrate how unbounded tuples are split across TypeVars and TypeVarTuples. Otherwise, type checker implementers wouldn't have any code snippets showing what each variable is bound to in that use case, which is very common in Tensor code.

    The change would have to be making it BatchVar or just T.

    I'll put up a PR to clarify the code and wording (assuming it's ok to update the PEP, since this is an oversight in the naming).

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 28, 2022

    [Guido]

    (c) I will happily await discussing the semantics of A[T, ..., S] until there's actual experience with PEP 646 (i.e., well after 3.11 is released).

    Ok, great :)

    Lower bounds on the number of type arguments

    (b) Yes, you understand my claim about f(t) vs. f(()). The reason I connect vararg variable substitutions with vararg function calls is that in both cases we have a single notation that means "arbitrary number" which is generally understood to mean "zero or more" but which is also acceptable in situations where the requirement is "N or more, for some N larger than zero". (We emphasize that an empty tuple is a valid match for tuple[int, ...].)

    (a) I would be happy with the syntax A[X, *tuple[X, ...]] to indicate "A with 1 or more X parameters". However, we accept some other type whose meaning is "A with 0 or more X parameters" as a valid substitution. So really it appears that as soon as we have a type whose parameter count has no upper bound, we appear to drop the lower bound -- we cannot distinguish between "count >= 0" and "count >= 1". I drew the analogy with function calls because they have the same issue and we seem to be fine with that.

    Ok, so if I understand right, you're saying that, regarding whether we should allow...

    T = TypeVar('T')
    Shape = TypeVarTuple('Shape')
    
    class Array(Generic[*Shape]): ...
    def foo(x: Array[int, *tuple[int, ...]): ...
    
    x: Array[*tuple[int, ...]]
    foo(x)

    ...given that we're passing "Zero or more int" to something that seems to require "One or more int", you're arguing that it should be allowed, by analogy with the following, which is fine in mypy...

    def bar(a: int, *b: int): ...
    
    t: tuple[int, ...]
    
    foo(*t)

    ...and furthermore noting that, if we do allow this, one downside is that it prevents us from being able to properly enforce a lower bound of the number of type arguments in cases where we don't have an upper bound on the number of type arguments, like in foo above. Is that right?

    If so - hmm, interesting point. I agree the validity of the bar example suggests we should also be fine with corresponding foo example. But it hadn't occurred to me that this requires us to give up lower bound verification.

    I guess what that means for arrays is that we couldn't catch something like this:

    class Array(Generic[*Shape]): ...
    
    # Argument must have at least one dimension - scalar arrays like Array[()] not allowed!
    def only_accepts_arrays(x: Array[Any, *tuple[Any, ...]]): ...
    
    def returns_scalar_or_array() → Array[*tuple[Any, ...]]: ...
    
    x = returns_scalar_or_array()
    only_accepts_array(x)  # x might be Array[()], so this might not be ok!

    This is mildly annoying, but seems like an ok sacrifice to make in order to enable gradual typing - allowing Array == Array[*tuple[Any, ...]] to be a 'universal' array.

    What we're punting on

    Regarding punting on the implementation, your proposal is to only punt on the case where there is an extra type (str in your example) in the actual parameters following *tuple[int, ...], right? I can live with that then, though if Serhiy manages to fix it I wouldn't stop him.

    Close. There's a similar case where the unpacked arbitrary-length tuple lines up exactly with a TypeVarTuple and all other types line up exactly with TypeVars, which is easy to evaluate, and which the current runtime therefore evaluates fine:

    class Array(Generic[*Ts]): pass
    Alias = Array[*Ts, T]
    Alias[*tuple[int, ...], str]  # Evaluates to Array[*tuple[int, ...], str]

    So I think the actual logic would be something like:

    • If the type argument list consists of a single unpacked arbitrary-length tuple *tuple[int, ...], then we're fine.
    • Else if the type argument list "lines up exactly" with the type arguments - such that the type argument list is the same length as the type parameter list, and the TypeVarTuple in the type parameter list lines up with the unpacked arbitrary-length tuple in the type argument list (and there are no other unpacked arbitrary-length tuples in the type argument list), then we're fine.
    • Else, error.

    To put it concisely: we're punting on the case where it's a valid substitution, but the unpacked arbitrary-length tuple in the arguments doesn't line up with the TypeVarTuple in the parameters.

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 28, 2022

    [Pradeep]

    Actually, I meant Batch to be a TypeVar in that example, precisely to demonstrate how unbounded tuples are split across TypeVars and TypeVarTuples. Otherwise, type checker implementers wouldn't have any code snippets showing what each variable is bound to in that use case, which is very common in Tensor code.

    Oh! Sorry, I didn't realise you intended Batch to be a TypeVar here. Ok, a PR clarifying this sounds good :)

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented May 28, 2022

    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue May 29, 2022
    …able-size tuple
    
    For example:
    
      A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]]
      A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]
    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented May 29, 2022

    I have wrote the Python implementation and now working on the C code. Please look whether the Python implementation works as you expect.

    For simplicity, it forbids substitution of multiple unpacked var-tuples, e.g. A[*Ts][*tuple[int, ...], *tuple[str, ...]]. It was forbidden by PEP 646, but accepted at runtime for simplicity. I can allow this again, but it will make the code slightly more complicated.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented May 29, 2022

    The following examples now work:

    A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]]
    A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]
    A[T, str, *Ts][*tuple[int, ...]] -> A[int, str, *tuple[int, ...]]
    A[*Ts, str, T][*tuple[int, ...]] -> A[*tuple[int, ...], str, int]
    A[list[T], *Ts][*tuple[int, ...]] -> A[list[int], *tuple[int, ...]]
    A[*Ts, list[T]][*tuple[int, ...]] -> A[*tuple[int, ...], list[int]]
    

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented May 29, 2022

    Oh no :( I already did a bunch of work on the Python side of this yesterday in #93318. I'm upset that I wasted a Saturday on this. Please check more carefully next time whether someone else is already working on it before starting.

    Edit: ah, sorry, I thought you'd merged it already - I didn't realise it was a WIP PR. Still, unfortunate that we ended up duplicating each other's work here :(

    But yes, the behaviour looks correct. And thanks for implementing the C version. I also looked at this yesterday but got a bit lost, so I appreciate you taking care of it.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented May 30, 2022

    I am sorry. I promised to work on it almost a month ago, but I only had the time and inspiration to do it last weekend. GitHub does not send notifications about the linked PRs by email, so I was unaware of your work.

    #93330 is now ready for review. I have also another, simpler, version, which moves a lot of the C code to Python, but I need more time to polish it.

    pablogsal pushed a commit to miss-islington/cpython that referenced this issue May 31, 2022
    pythonGH-92335)
    
    (cherry picked from commit 9d25db9)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    pablogsal added a commit that referenced this issue Jun 1, 2022
    …es (GH-92335) (#92484)
    
    * gh-91162: Fix substitution of unpacked tuples in generic aliases (GH-92335)
    (cherry picked from commit 9d25db9)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    
    * Regenerate ABI file
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Jun 1, 2022

    #93412 is an alternative implementation which does complex things in Python and calls the Python code from C. Seems it can also simplify the code of collections.abc.Callable (because the code is more generic now), but I left a clean up to a separate PR.

    @mrahtz
    Copy link
    Contributor

    @mrahtz mrahtz commented Jun 1, 2022

    I am sorry. I promised to work on it almost a month ago, but I only had the time and inspiration to do it last weekend. GitHub does not send notifications about the linked PRs by email, so I was unaware of your work.

    Oh, fair enough. In that case I'll just say: thank you for your continued work on this :)

    serhiy-storchaka added a commit that referenced this issue Jun 12, 2022
    …ypeVar and TypeVarTuple parameters (alt) (GH-93412)
    
    For example:
    
      A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]]
      A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 12, 2022
    …over TypeVar and TypeVarTuple parameters (alt) (pythonGH-93412)
    
    For example:
    
      A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]]
      A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]
    (cherry picked from commit 3473817)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 expert-typing type-bug
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants