PEP 646: Decide on substitution behavior #91162

JelleZijlstra · 2022-03-13T20:46:16Z

BPO	47006
Nosy	@gvanrossum, @serhiy-storchaka, @JelleZijlstra, @Fidget-Spinner, @mrahtz, @mrahtz, @AlexWaygood
PRs	#32341

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/JelleZijlstra'
closed_at = None
created_at = <Date 2022-03-13.20:46:16.314>
labels = ['type-bug', 'release-blocker', '3.11']
title = 'PEP 646: Decide on substitution behavior'
updated_at = <Date 2022-04-07.01:50:13.871>
user = 'https://github.com/JelleZijlstra'

bugs.python.org fields:

activity = <Date 2022-04-07.01:50:13.871>
actor = 'gvanrossum'
assignee = 'JelleZijlstra'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2022-03-13.20:46:16.314>
creator = 'JelleZijlstra'
dependencies = []
files = []
hgrepos = []
issue_num = 47006
keywords = ['patch']
message_count = 17.0
messages = ['415100', '415107', '415108', '415109', '415557', '415623', '415637', '415694', '415710', '415712', '415734', '415752', '415753', '416707', '416795', '416813', '416913']
nosy_count = 7.0
nosy_names = ['gvanrossum', 'serhiy.storchaka', 'JelleZijlstra', 'kj', 'matthew.rahtz', 'mrahtz', 'AlexWaygood']
pr_nums = ['32341']
priority = 'release blocker'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue47006'
versions = ['Python 3.11']

JelleZijlstra · 2022-03-13T20:46:16Z

We've had some disagreement about the behavior of TypeVarTuple substitution related to PEP-646, and the discussion has now spilled around multiple PRs. I'd like to use this issue to come to an agreement so we don't have to chase through so many different places.

Links:

#75204 (merged) implements PEP-646 in typing.py. Matthew initially implemented full TypeVar substitution support, but took it out after I suggested an alternative solution in #31021 (review).
#75981 (merged without review) implements TypeVarTuple substitution in Python.
#76009 implements TypeVarTuple substitution in C.
#76025 and #76027 add additional test cases.
#75985 (closed) implements my proposed solution.

I'd like to ask that until we come to an agreement we hold off on making any more changes, so we don't have to go back and forth and we ensure that the eventual solution covers all edge cases.

The disagreement is about what to do with TypeVarTuple substitution: the behavior when a generic type is subscripted, like tuple[*Ts][int, str].

There are two possible extreme approaches:

Implement full substitution support, just as we have it for existing TypeVars. This is complicated because TypeVarTuple makes it much harder to match up the types correctly. However, it is consistent with the behavior for other TypeVar-like objects. My example would turn into GenericAlias(tuple, (int, str)).
Give up on substitution and just return a new GenericAlias object: GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str). This avoids implementing any complex runtime behavior, but it inconsistent with existing behavior and less pretty when you print out the type. I prefer this approach because there's less risk that future enhancements to typing will break it. I also want to explore extending this approach to ParamSpec substitution.

mrahtz · 2022-03-13T21:59:29Z

Thanks for starting this, Jelle - I was a bit unsure about how to proceed here.

Given that #31800 is already merged, I'd also propose something halfway between the two extremes: return a sensible substitution when the logic to compute that isn't too onerous, and a new GenericAlias object when it is. The upsides are that we'd probably be able to return reasonable substitutions for the vast majority of cases, and that we wouldn't have to remove what's already been merged. The downsides would be lack of consistency, and the potential for changing rules about what does and doesn't return a full substitution as time goes on and new features are added.

mrahtz · 2022-03-13T22:01:18Z

(Having said that, to be clear: my preferred solution currently would still be the solution where we just return a new GenericAlias for anything involving a TypeVarTuple. The crux is what Serhiy is happy with.)

JelleZijlstra · 2022-03-13T22:10:03Z

Thanks Matthew! Merged PRs can still be reverted, and we have some time before the feature freeze. I'd like to hear what Guido and Ken think too.

If we go with the GenericAlias substitution, we need to make sure that such aliases still work as base class. That would need some C work to make types.GenericAlias.__mro_entries__ recurse if the alias's origin is itself a GenericAlias. There's a few other subtleties to think about; I can work on that but don't have a ton of time today.

serhiy-storchaka · 2022-03-19T15:32:54Z

I am for consistent behavior. If return GenericAlias(GenericAlias(tuple, Unpack[Ts]), (int, str)) for tuple[*Ts][int, str], we should also return GenericAlias(GenericAlias(list, T), int) for list[T][int], etc. And it will cause multiple problems:

A repr can be less readable.
It will break equality comparison and hashing. Good bye caching.
What about __origin__, __parameters__, __args__? How will they be calculated?
It can break code which uses annotations for something. For example it can break dataclasses.

It may be that will need to use it as a fallback for cases like tuple[T, *Ts][*Ts2] (currently it is error). But I am not sure that such cases should be supported.

gvanrossum · 2022-03-20T16:44:40Z

I think I'm with Serhiy, I don't understand the hesitance to transform tuple[*Ts][int, str] into tuple[int, str].

What would be an example of a substitution that's too complex to do?

JelleZijlstra · 2022-03-20T21:54:07Z

It's simple if you only look at simple examples.

Here are some examples current main (with Serhiy's patch for the Python version of typing) gets wrong:

>>> from typing import *
>>> Ts = TypeVarTuple("Ts")
>>> T1 = TypeVar("T1")
>>> T2 = TypeVar("T2")
>>> Tuple[T1, Unpack[Ts], T2][int, Unpack[tuple[int]]]  # expect error
typing.Tuple[int, *tuple[int]]
>>> Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts]]  # expect error (T2 missing)
typing.Tuple[int, str, *Ts]  # it put *Ts in the wrong place
>>> Tuple[T1, Unpack[Ts], str, T2][int, Unpack[Ts], Unpack[Ts]]  # expect error (*Ts can't substitute T2)
typing.Tuple[int, *Ts, str, *Ts]
>>> class G(Generic[T1, Unpack[Ts], T2]): pass
... 
>>> G[int]  # expect error
__main__.G[int]

We can probably fix that, but I'm not looking forward to implementing the fixed logic in both Python and C. Also, I'm worried that it won't work with future extensions to the type system (e.g., the rumored Map operator) that may go into 3.12 or later versions.

serhiy-storchaka · 2022-03-21T18:17:54Z

The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

Two other cases will be fixed by GH 32031. It does not require any C code.

In the last case no error is raised because some error checks are skipped if any of Generic arguments is a TypeVarTuple. We just need to add such checks. This is Python-only code too.

Note that the alternative proposition is even more lenient to errors.

mrahtz · 2022-03-21T21:33:50Z

[Guido]

What would be an example of a substitution that's too complex to do?

We also need to remember the dreaded arbitrary-length tuple. For example, I think it should be the case that:

T = TypeVar('T')
Ts = TypeVarTuple('Ts')
class C(Generic[*Ts]): pass
Alias = C[T, *Ts]
Alias2 = Alias[*tuple[int, ...]]
# Alias2 should be C[int, *tuple[int, ...]]

Ok, this is a bit of a silly example, but if we're committing to evaluating substitutions correctly, we should probably make even this kind of example behave correctly so that users who accidentally do something silly can debug what's gone wrong.

[Serhiy]

A repr can be less readable.

Definitely true.

It will break equality comparison and hashing. Good bye caching.

Huh, I didn't know about this one. Fair enough, this is totally a downside.

What about __origin__, __parameters__, __args__? How will they be calculated?

This could admittedly be thorny. We'd have to think it through carefully. Admittedly also a downside.

It can break code which uses annotations for something. For example it can break dataclasses.

Oh, also interesting - I didn't know about this one either. Could you give an example?

The first case will be practically fixed by GH 32030 after chenging the grammar to allow unpacking in index tuple: A[*B].

We actually deliberately chose not to unpack concrete tuple types - see the description of #30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

Two other cases will be fixed by GH 32031. It does not require any C code.

I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?) seems like too restrictive a solution. I can imagine there might be completely legitimate cases where the ability to do this would be important. For example:

DType = TypeVar('DType')
Shape = TypeVarTuple('Shape')
class Tensor(Generic[DType, *Shape]): ...
Uint8Tensor = Tensor[uint8, *Shape]
Unit8BatchTensor = Uint8Tensor[Batch, *Shape]

Note that the alternative proposition is even more lenient to errors.

True, but at least it's predictably lenient to errors - I think the repr makes it very clear that "Woah, you're doing something advanced here. You're on your own!" I think it better fits the principle of least astonishment to have something that consistently lets through all errors of a certain class than something that sometimes catches errors and sometimes doesn't.

mrahtz · 2022-03-21T21:35:34Z

P.s. To be clear, (I think?) these are all substitutions that are computable. We *could* implement the logic to make all these evaluate correctly if we wanted to. It's just a matter of how much complexity we want to allow in typing.py (or in the runtime in general, if we say farmed some of this logic out to a separate module).

gvanrossum · 2022-03-22T01:01:21Z

I'd like to look at this as a case of simplifying something to its simplest canonical form, but no simpler. This is what the existing fixed-typevar expansion does: e.g. tuple[str, T, T][int] becomes tuple[str, int, int].

I propose that we try to agree on a set of rules for what can be simplified further and what cannot, when we have B = C[...]; A = B[...], (IOW A = C[...][...]), for various shapes of the subscripts to C and B. Note that what's relevant for the second subscript is C[...].__parameters__, so I'll call that "left" below.

Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).
Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.
If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).
If there's a *Ts on the right but not on the left, we should _not_ simplify, since whatever we have on the left serves as a constraint for *Ts. (E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).)
If there's exactly one *Ts on the left and one on the right, we _might__ be able to simplify if the prefix and suffix of the __parameters__ match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that __parameters__ in this example is (T, Ts); we have to assume that typevartuples in __parameters__ are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

TBH case 5 is the most complex and I may have overlooked something. I'm more sure of cases 1-4.

serhiy-storchaka · 2022-03-22T08:49:01Z

Alias = C[T, *Ts]
Alias2 = Alias[*tuple[int, ...]]

Alias2 should be C[int, *tuple[int, ...]]

tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

Oh, also interesting - I didn't know about this one either. Could you give an example?

If __origin__, __parameters__, __args__ are a mess, it will definitely break a code which use them.

We actually deliberately chose not to unpack concrete tuple types - see the description of #30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.__annotations__['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

And one of PEP-646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

I propose to implement simple and strict rules, and later add support of new cases where it makes sense.

serhiy-storchaka · 2022-03-22T09:06:59Z

Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify. Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

I do not understand this. Do you forbid simplifying of tuple[*Ts, float][str, *tuple[int, ...]] to tuple[str, *tuple[int, ...], float]?

I think that the rule should be that *tuple[X, ...] cannot split between different variables. Or that it cannot substitute a TypeVar. A more strong variant of rule 4.

... but we cannot flag it as an error either.

I think that it will better to flag it as an error now. Later, after all code be merged and all edge cases be handled we can return here and reconsider this.

There are workarounds for this.

You should not use Generic[*Ts] if you require at least one item, but Generic[*Ts, T].
Instead of def foo(*args: *Ts) use def foo(*args: *tuple[*Ts, T]).

These tricks are common in functional programming.

The rest of the rules match my implementations more or less.

mrahtz · 2022-04-04T23:28:09Z

Apologies for the slow reply - coming back to this now that the docs and pickling issues are mostly sorted.

[Serhiy]

> Alias = C[T, *Ts]
> Alias2 = Alias[*tuple[int, ...]]
> # Alias2 should be C[int, *tuple[int, ...]]

tuple[int, ...] includes also an empty tuple, and in this case there is no value for T.

This was my initial intuition too, but Pradeep pointed out to me in #31021 (comment) that for tuple[int, ...], Python has chosen the opposite mindset: instead of assuming the worst-case scenario, we assume the best-case scenario. Thus, the following type-checks correctly with mypy (https://mypy-play.net/?mypy=latest&python=3.10&gist=b9ca66fb7d172f939951a741388836a6):

def return_first(tup: tuple[int, ...]) -> int:
    return tup[0]
tup: tuple[()] = ()
return_first(tup)

> We actually deliberately chose not to unpack concrete tuple types - see the description of #30398, under the heading 'Starred tuple types'. (If you see another way around it, though, let me know.)

You assumed that *tuple[str, bool] in def foo(*args: *tuple[str, bool]) should give foo.__annotations__['args'] = tuple[str, bool], but it should rather give (str, bool). No confusion with tuple[str, bool].

Fair point, we could technically distinguish between tuple[str, bool] and (str, bool). But if I was a naive user and I saw foo.__annotations__['args'] == (str, bool), I don't think it'd be immediately obvious to me that the type of args was *tuple[str, bool].

Also though, there's a second reason mentioned in #30398 why (str, bool) wouldn't be the best choice. We decided that the runtime behaviour of *args: *something should be that we essentially do (*something,)[0]. If we made tuple[int, str] unpack to (int, str), then we'd end up with __annotations__['args'] == (int,).

And one of PEP-646 options is to implement star-syntax only in subscription, not in var-parameter type annotations.

As in, we would allow Generic[*Ts], but not *args: *Ts? That'd be a major change to the PEP - not an option I'm willing to consider at this stage in the process.

> I'm also not sure about this one; disallowing unpacked TypeVarTuples in argument lists to generic aliases completely (if I've understood right?)

No, it will only be disallowed in substitution of a VarType. Tuple[T][*Ts] -- error. Tuple[*Ts][*Ts2] -- ok.

Ah, gotcha. My mistake.

[Guido]

I ran out of time this evening :) Will reply properly soon.

mrahtz · 2022-04-05T18:21:12Z

[Guido]

Some edge case seems to be that if *tuple[...] is involved on either side we will never simplify.

Alright, let me think this through with some examples to get my head round it.

It would prohibit the following difficult case:

class C(Generic[*Ts]): ...
Alias = C[T, *Ts]
Alias[*tuple[int, ...]]  # Does not simplify; stays C[T, *Ts][*tuple[int, ...]]

That seems pretty reasonable. It would also prohibit these other relatively simple cases, but I guess that's fine:

Alias = C[*Ts]
Alias[*tuple[int, ...]]  # Does not simplify; stays C[*Ts][*tuple[int, ...]]

Alias = C[T, *tuple[int, ...]]
Alias[str]  # Does not simplify; stays C[T, *tuple[int, ...]][str]

Or perhaps a better rule is that *tuple[...] is never simplified away (but fixed items before and after it may be).

Is this to say that we effectively prohibit binding *tuple[...] to anything? If we can simplify without binding *tuple[...] to anything, then we do simplify, but otherwise, we don't simplify? So under this rule, the following WOULD work?

Alias = C[T, *tuple[int, ...]]
Alias[str]  # Simplifies to C[str, *tuple[int, ...]], because we didn't have to bind *tuple[int, ...] to do it

Another edge case is that if neither side has any starred items we will always simplify (since this is the existing behavior in 3.10). This may raise an error if the number of subscripts on the right does not match the number of parameters on the left.

Alright, so this is business as usual.

If there's a single *Ts on the left but not on the right, we should be able to simplify, which again may raise an error if there are not enough values on the right, but if there are more than enough, the excess will be consumed by *Ts (in fact that's the only way *Ts is fed).

So then:

class C(Generic[*Ts]): ...
Alias = C[T, *Ts]
Alias[()]              # Raises error
Alias[int]             # Simplifies to C[int, *Ts]
Alias[int, str]        # Simplifies to C[int, str]
Alias[int, str, bool]  # Simplifies to C[int, str, bool]

Yup, seems straightforward.

If there's a *Ts on the right but not on the left, we should _not_ simplify, since whatever we have on the left serves as a constraint for *Ts.

Ok, so this is about the following situations:

class C(Generic[*Ts]): ...
Alias = C[T1, T2]
Alias[*Ts]  # Does not simplify; stays C[T1, T2][*Ts]

Yikes - in fact, this is actually super hairy; I hadn't thought about this edge case at all in the PEP.

Agreed that it seems reasonable not to simplify here.

E.g. tuple[int, int][*Ts] constrains *Ts to being (int, int).

Was that a typo? Surely tuple[int, int][*Ts] isn't valid - since tuple[int, int] doesn't have any free parameters?

If there's exactly one *Ts on the left and one on the right, we _might__ be able to simplify if the prefix and suffix of the __parameters__ match the prefix and suffix of the subscript on the right. E.g. C[int, T, *Ts, float][str, *Ts] can be simplified to C[int, str, *Ts, float]. OTOH C[int, T, *Ts, float][*Ts] cannot be simplified -- but we cannot flag it as an error either. Note that __parameters__ in this example is (T, Ts); we have to assume that typevartuples in __parameters__ are always used as *Ts (since the PEP recognizes no valid unstarred uses of Ts).

Ok, this also makes sense.

---

Still, though, doesn't the point that Serhiy brought up about __origin__, __parameters__ and __args__ still apply? In cases where we *don't* simplify, there'd still be the issue of what we'd set these things to be.

This evening I'll also revisit the PRs adding tests for substitution to try and make them a comprehensive reference as to what's currently possible.

mrahtz · 2022-04-05T20:03:02Z

Ok, https://github.com/python/cpython/pull/32341/files is a reference of how the current implementation behaves. Fwiw, it *is* mostly correct - with a few minor tweaks it might be alright for at least the 3.11 release.

In particular, instead of dealing with the thorny issue of what to do about splitting unpacked arbitrary-length tuples over multiple type variables - e.g. C[T, *Ts][*tuple[int, ...]] - instead either deciding to try and evaluate it properly and living with the complexity, or leaving it unsimplified and living with the __args__, __parameters__ and __origin__ problem - for now, we could just raise an exception for any substitutions which involve an unpacked arbitrary-length tuple, since I'd guess it's going to be an extremely rare use-case.

gvanrossum · 2022-04-07T01:50:14Z

We need to move on this, because the outcome of this discussion is a release blocker for 3.11b1 -- the next release!

mrahtz · 2022-04-10T09:16:47Z

Copying in correspondence by email while issues were being migrated:

@mrahtz:

(Temporarily moving this discussion to email, since my understanding is that the issue tracker will be down for ~24 hours while it's being migrated to GitHub, and it sounds like this might be getting urgent-ish.)

[Guido]

We need to move on this, because the outcome of this discussion is a release blocker for 3.11b1 -- the next release!

Oh yikes, I didn't realise. To be clear, what's the timeframe? I see the release schedule for 3.11 says that 3.11b1 is scheduled for 2022-05-06. Is the point that we need to have this discussion resolved and all the implementation finished by? Or is the very fact that this discussion is still underway blocking other folks right now?

(And for my general education: is the point of the remaining beta releases that they're supposed to be bugfix-only?)

@JelleZijlstra:

Oh yikes, I didn't realise. To be clear, what's the timeframe? I see the release schedule for 3.11 says that 3.11b1 is scheduled for 2022-05-06. Is the point that we need to have this discussion resolved and all the implementation finished by? Or is the very fact that this discussion is still underway blocking other folks right now?

Yes, we should not be changing any features after the feature freeze in early May.

(And for my general education: is the point of the remaining beta releases that they're supposed to be bugfix-only?)

That's right. This should enable people to test without having to worry about new features.

@mrahtz:

Alright, by the power invested in me as lead author of the PEP, I say let's go with implementing full substitution support (or as close to full as we can reasonably get it).

The reasoning is that a) I think we're closer to a working, consistent implementation with full substitution support compared to the "partially-evaluated substitution" approach; and b) Serhiy's point about what to do about args etc has got me worried; a "partially-evaluated substitution" would be a new kind of thing in the typing system, and I think it could easily have further implications we won't have time to discover/work through before 3.11b1. It's true that the logic will be a bit complicated, and that as Jelle says it could complicate implementation of future typing features, but I think the latter is partly an inevitable consequence of the fact that variadic generics simply are complicated, and overall I still think the tradeoff is worth it.

So unless anyone has a reason to veto this, let's proceed as follows:

Merge Serhiy's C code for substitution in #31828

Discuss #32341 until we're happy with what we think the behaviour should be (there are still a few in there I'm not entirely sure about myself), and use that as a basis for fixing the remaining issues

Reserve the right to raise NotImplementedError for the trickiest cases (e.g. splitting of *tuple[int, ...] across multiple type variables). Ideally it'd be nice to raise a message saying something like "If you think you need this, please leave a comment with your use case at " - is there any precedent for something like this being possible?

@serhiy-storchaka:

Discuss #32341 until we're happy with what we think the behaviour should be (there are still a few in there I'm not entirely sure about myself), and use that as a basis for fixing the remaining issues

There were many changes in the code related to ParamSpec and Concatenate during the beta stage of 3.10 and even after the release. The current implementation of ParamSpec substitution is based on "analogies" and "reading between lines" rather of a concrete specification.

Reserve the right to raise NotImplementedError for the trickiest cases (e.g. splitting of *tuple[int, ...] across multiple type variables). Ideally it'd be nice to raise a message saying something like "If you think you need this, please leave a comment with your use case at " - is there any precedent for something like this being possible?

Callable[Concatenate[int, P], str][...] raises a type error because the result cannot be expressed in a form which would not contradict existing PEPs.

gvanrossum · 2022-04-11T19:36:25Z

I feel I need to add this same remark here:

@mrahtz @JelleZijlstra @serhiy-storchaka Is it okay if I unsubscribe from these conversations and let you all come up with a compromise? I feel that while we ought to have a policy formulated and mostly implemented by beta 1 (May 6), tweaks of both the policy and the implementation during the beta period until RC 1 (Aug/Sept?) should be allowable.

mrahtz · 2022-04-15T09:28:09Z

Ok, thinking about things more in #32341, I would propose the following spec:

In general, we aim to accurately evaluate type variable substitution, supporting as many valid aliases as we can.
Exception: if an unpacked arbitrary-length tuple (e.g. *tuple[int, ...]) is used in a generic alias type args list, then:
- It must be the only type argument
  - Alias[str, *tuple[int, ...]] would not be valid
  - Alias[*tuple[int, ...], *tuple[str, ...]] would not be valid
  - Only Alias[*tuple[int, ...]] would be valid
- The alias's sole type parameter must be an unpacked TypeVarTuple
  - Alias = tuple[T, *Ts]; Alias[*tuple[int, ...]] would not be valid
  - Alias = tuple[T1, T2]; Alias[*tuple[int, ...]] would not be valid
  - Only Alias = tuple[*Ts]; Alias[*tuple[int, ...]] would be valid

mrahtz · 2022-05-21T15:49:51Z

[Pradeep]

Sorry for the slow reply too. Busy week :(

The PEP defines the behavior for (a) MultiDeviceArray by saying that it behaves the same as (b) MultiDeviceArray[*tuple[Any, ...]]

I've just realised - is this actually what we said in the PEP? https://peps.python.org/pep-0646/#behaviour-when-type-parameters-are-not-specified says: (emphasis mine)

When a generic class parameterised by a type variable tuple is used without any type parameters, it behaves as if the type variable tuple was substituted with Tuple[Any, ...].

The section on aliases, https://peps.python.org/pep-0646/#aliases, says similarly:

If the type parameter list is omitted entirely, the unspecified type variable tuples are treated as Tuple[Any, ...]

That is, only the TypeVarTuple is substituted with Tuple[Any, ...]. Iiuc, this is different from saying that the whole argument list is set to *Tuple[Any, ...].

Admittedly we maybe could have been clearer on this in the PEP. I don't think we ever say explicitly what type is assigned to the other type parameters in a case like this.

But I think it's natural to assume that TypeVars in the parameter list should be assigned to individual Anys. So in this case:

DType = TypeVar('DType')
Shape = TypeVarTuple('Shape')
class Array(Generic[DType, *Shape]): ...

MultiDeviceArray = Array[DType, *Shape]

I think that MultiDeviceArray should actually behave like Array[Any, *tuple[Any, ...]], rather than Array[*tuple[Any, ...]].

Is this analysis correct?

gvanrossum · 2022-05-21T16:22:19Z

Looks like the only interpretation that makes sense. If you write dict without parameters it has two Any parameters by default. It does not become dict[*Any].

pradeep90 · 2022-05-21T23:55:16Z

[Matthew]

Is this analysis correct?

Yes, that is indeed how Pyre handles aliases with missing type arguments. T is replaced with Any and Ts is replaced with tuple[Any, ...]. I misspoke about the analogy between MultiDeviceArray and MultiDeviceArray[*tuple[Any, ...]], even though they both produce the same result.

[Matthew]

Admittedly we maybe could have been clearer on this in the PEP. I don't think we ever say explicitly what type is assigned to the other type parameters in a case like this.

That's not true. We explicitly mention the behavior for Foo[*tuple[Any, ...]] in the following paragraph and code snippet from the PEP:

Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]

class Array(Generic[*Shape]): ...  # I'm including the definition of Array for completeness.

y: Array[*Tuple[Any, ...]] = read_from_file()

def expect_variadic_array(
    x: Array[Batch, *Shape]
) -> None: ...

expect_variadic_array(y)  # OK

In this case, the user sees that passing an Array[*tuple[Any, ...]] to an expected Array[Batch, *Shape] leads to Batch=Any, Shape=tuple[Any, ...]. So, the concept of splitting an unbounded tuple across Batch, *Shape is specified in the PEP.

The exact same thing goes for passing an Array or an Array[*tuple[int, ...], str] to an expected Array[Batch, *Shape]. These too require splitting an unbounded tuple across a mix of TypeVars and TypeVarTuples.
However, if the user defined a perfectly-valid alias, MultiDeviceArray = Array[Batch, *Shape], and tried to use MultiDeviceArray[*tuple[Any, ...]], then it would fail at runtime (as per your proposal).

This would be really surprising. There is no justification why (1) is able to split *tuple[Any, ...] across Batch, *Shape, but (2) fails at runtime.

In other words, the splitting behavior of unbounded tuples over a mix of TypeVar and TypeVarTuple is important and specified in general, not just for the niche case of runtime alias application. Restricting the splitting for aliases at runtime, while allowing it everywhere else would be a bad, inconsistent user experience.

Lmk if that makes sense. I'm happy to hop on a VC call this week, if this is something you'd like to discuss further. Might be faster than waiting for each other's replies :)

Road ahead

We're back to the original question of what to do about Array[T, *Ts][*tuple[int, ...], str]. I see two choices ahead:

Defer its implementation because it's too much work now - this seems reasonable.

If you're curious about how Pyre (and TypeScript) handle cases like Array[T, *Ts, T2][*tuple[int, ...], bool, str], we bind the TypeVars from either side till we reach an unpacked element. So, we would bind T = int and T2 = str. Then we align the remaining middle portion - Ts = tuple[*tuple[int, ...], bool]. Happy to discuss in more detail if needed.

(I hope we agree that this is doable at runtime but involved.)
Forbid it in the PEP because the runtime computation would be involved - this seems less than ideal. We seem to be going in the opposite direction from the runtime being more lenient than the type checker.

If we really are going that way, we should probably discuss what it implies for future PEPs. Will they also have to restrict things because the runtime is trying to compute exact values (for generic aliases, etc.)?

I favor option (1). Curious to hear your thoughts, since it's not clear which way you are leaning.

Footnote

If the question is why would anyone specify MultiDeviceArray[*tuple[int, ...]] explicitly, we address that in the PEP:

This allows users to handle dynamic code gracefully while still explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, ...]]). Otherwise, users would face noisy errors from the type checker every time they tried to use the variable y, which would hinder them when migrating a legacy code base to use TypeVarTuple.

mrahtz · 2022-05-22T00:22:33Z

[Pradeep]

Lmk if that makes sense. I'm happy to hop on a VC call this week, if this is something you'd like to discuss further. Might be faster than waiting for each other's replies :)

For this kind of thing, I think text works better for me - it's helpful to have time to think. Thanks for offering, though :)

Admittedly we maybe could have been clearer on this in the PEP. I don't think we ever say explicitly what type is assigned to the other type parameters in a case like this.

That's not true. We explicitly mention the behavior for Foo[*tuple[Any, ...]] in the following paragraph and code snippet from the PEP:

Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]

class Array(Generic[*Shape]): ...  # I'm including the definition of Array for completeness.


y: Array[*Tuple[Any, ...]] = read_from_file()

def expect_variadic_array(
    x: Array[Batch, *Shape]
) -> None: ...

expect_variadic_array(y)  # OK

Isn't Batch here a type, rather than a TypeVar?

If the question is why would anyone specify MultiDeviceArray[*tuple[int, ...]] explicitly, we address that in the PEP:

This allows users to handle dynamic code gracefully while still explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, ...]]). Otherwise, users would face noisy errors from the type checker every time they tried to use the variable y, which would hinder them when migrating a legacy code base to use TypeVarTuple.

Didn't we write that assuming that Array was generic in only a TypeVarTuple? In the case of MultiDeviceArray, if we wanted to explicitly mark it as unsafe, wouldn't we write something like MultiDeviceArray[tf.dtype, *tuple[int, ...]]?

We're back to the original question of what to do about Array[T, *Ts][*tuple[int, ...], str].
...
I favor option (1). Curious to hear your thoughts, since it's not clear which way you are leaning.

Ah, sorry if I wasn't explicit enough on this earlier - yeah, for this case, I'm also on board with deferring its implementation at runtime until someone finds a use-case for it - both because it could take a while to implement and test, and because I think Jelle has a remit to try and keep complexity out of typing.py.

I definitely agree it's not worth banning in the PEP.

gvanrossum · 2022-05-22T00:51:43Z

I hesitate to add anything, but here's something nevertheless.

I feel that there's something missing from our notation, and that is how to indicate that the number of dimensions is Any (and shouldn't be complained about). This same problem occurs for varargs functions, e.g. this is accepted:

def f(a: int, *b: int) -> int: ...

t: tuple[int, ...] = ()

f(*t)  # OK in mypy, fails at runtime

even though f(*()) is rejected.

We should probably just accept this. Certainly it seems PEP 646 accepted this for some cases already (the quote by Pradeep), so it seems fair to accept it in the case of a type alias as well.

OT: While reading these deliberations, am beginning to find the notation Foo[A, *tuple[B, ...], C] pretty tedious. Maybe in a future PEP we can allow simplifying this to Foo[A, B, ..., C]? IIRC we never went there because we worried that there might be other interpretations (like B, ... might mean a B plus zero or more Anys, rather than zero or more Bs), but maybe we should just pick a meaning and stick to it. Users could also be confused by tuple[B, ...] but we've defined it and move on.

mrahtz · 2022-05-22T22:10:28Z

[Guido]

I feel that there's something missing from our notation, and that is how to indicate that the number of dimensions is Any (and shouldn't be complained about).

Assuming that:

DType = TypeVar('DType')
Shape = TypeVarTuple('Shape')
class ArrayShapeOnly(Generic[*Shape]): ...
class ArrayDTypeAndShape(Generic[DType, *Shape]]): ...

Doesn't ArrayShapeOnly[*tuple[SomeType, ...]] / ArrayDTypeAndShape[SomeDType, *tuple[SomeType, ...]] address this case?

Or is your point that there should be some nicer syntax for specifying this? In practice, I'd expect most library authors to cater for this with an alias along the lines of:

ArrayWithArbitraryNumDimensions = ArrayDTypeAndShape[DType, *tuple[Any, ...]]

So maybe special syntax wouldn't be necessary?

This same problem occurs for varargs functions, e.g. this is accepted:

def f(a: int, *b: int) -> int: ...

t: tuple[int, ...] = ()

f(*t)  # OK in mypy, fails at runtime even though f(*()) is rejected.

To check that I understand:

f(*t) fails at runtime, because there aren't enough elements in t to actually assign to a and b, but is ok in mypy, because mypy "tries to find a way to make it work" by assuming the arbitrary-length tuple is of the right size to proceed
f(*()) fails at runtime, but also in mypy, because here mypy doesn't have an arbitrary number of types to play with

I might just be tired, but could you expand on the connection between this case and the "arbitrary number of dimensions" case?

OT: While reading these deliberations, am beginning to find the notation Foo[A, *tuple[B, ...], C] pretty tedious. Maybe in a future PEP we can allow simplifying this to Foo[A, B, ..., C]? IIRC we never went there because we worried that there might be other interpretations (like B, ... might mean a B plus zero or more Anys, rather than zero or more Bs), but maybe we should just pick a meaning and stick to it. Users could also be confused by tuple[B, ...] but we've defined it and move on.

I hear you that it's a bit of a tedious syntax. And yes, you're right - the reason I'm hesitant to use Foo[A, B, ..., C] is that I think we might want to make Foo[A, B, ..., C] mean "A, then B, then an arbitrary number of Any, then C". I continue to think the latter meaning would be much more useful for array typing purposes (and also more intuitive, but I appreciate that's subjective).

But I don't think we should discuss it yet, because it's a decision that'll impact a lot of future users of array typing who don't exist yet and who therefore can't weigh in on the discussion. Ideally we'd talk about it once, say, NumPy has adopted PEP 646, and people have been using it for NumPy annotations for a good 6 months or so.

pradeep90 · 2022-05-23T06:16:09Z

[Pradeep]

Lmk if that makes sense. I'm happy to hop on a VC call this week, if this is something you'd like to discuss further. Might be faster than waiting for each other's replies :)

For this kind of thing, I think text works better for me - it's helpful to have time to think. Thanks for offering, though :)

Fine by me!

class Array(Generic[*Shape]): ...  # I'm including the definition of Array for completeness.


y: Array[*Tuple[Any, ...]] = read_from_file()

def expect_variadic_array(
    x: Array[Batch, *Shape]
) -> None: ...

expect_variadic_array(y)  # OK

Isn't Batch here a type, rather than a TypeVar?

Batch is a TypeVar. The PEP text mentions that "Batch is bound to Any and Shape is bound to Tuple[Any, ...]". Maybe it should be called BatchVar to distinguish from the named dimension :|

The idea I was getting across was that even if Array is only generic in *Shape, a function could expect y: Array[N1, N2, *Shape, N3]. Now, if someone passes y: Array to the function, we would need to split the unbounded tuple *tuple[Any, ...] across N1, N2, *Shape, N3. That would work.

But if users tried to define a generic alias MyArray = Array[N1, N2, *Shape, N3] and tried to use y: MyArray[*tuple[Any, ...]], that would fail, even though it's the same splitting as above.

That is why I was arguing against forbidding this. Since we agree on not forbidding this in the PEP, no worries.

If the question is why would anyone specify MultiDeviceArray[*tuple[int, ...]] explicitly, we address that in the PEP:

This allows users to handle dynamic code gracefully while still explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, ...]]). Otherwise, users would face noisy errors from the type checker every time they tried to use the variable y, which would hinder them when migrating a legacy code base to use TypeVarTuple.
Didn't we write that assuming that Array was generic in only a TypeVarTuple?
In the case of MultiDeviceArray, if we wanted to explicitly mark it as unsafe, wouldn't we write something like MultiDeviceArray[tf.dtype, *tuple[int, ...]]?

If we had an alias such as MyArray = Array[N1, N2, *Shape, N3], we would have to write MyArray[int, int, *tuple[int, ...], int] (as per the proposal in this thread). Otherwise, MyArray[*tuple[int, ...] would be a runtime error.

So, yes, it is always feasible to add type arguments separately for each TypeVar in a generic alias. It would be a bad user experience to get a runtime error first, but I think it's reasonable to punt on the implementation for now.

I favor option (1). Curious to hear your thoughts, since it's not clear which way you are leaning.

Ah, sorry if I wasn't explicit enough on this earlier - yeah, for this case, I'm also on board with deferring its implementation at runtime until someone finds a use-case for it - both because it could take a while to implement and test, and because I think Jelle has a remit to try and keep complexity out of typing.py.

I definitely agree it's not worth banning in the PEP.

+1. Hope there's nothing else blocking this thread, then.

gvanrossum · 2022-05-23T20:24:21Z

First responding to Pradeep: What does "punting on the implementation" mean in practice? Does it just mean that repr(MyArray[*tuple[int, ...]]) might be Array[N1, N2, *Shape, N3][*tuple[int, ...]] rather than Array[int, int, *tuple[int, ...], int]? If so, I am fine with that.

Next to Matthew, in reverse order:

(c) I will happily await discussing the semantics of A[T, ..., S] until there's actual experience with PEP 646 (i.e., well after 3.11 is released).

(b) Yes, you understand my claim about f(*t) vs. f(*()). The reason I connect vararg variable substitutions with vararg function calls is that in both cases we have a single notation that means "arbitrary number" which is generally understood to mean "zero or more" but which is also acceptable in situations where the requirement is "N or more, for some N larger than zero". (We emphasize that an empty tuple is a valid match for tuple[int, ...].)

(a) I would be happy with the syntax A[X, *tuple[X, ...]] to indicate "A with 1 or more X parameters". However, we accept some other type whose meaning is "A with 0 or more X parameters" as a valid substitution. So really it appears that as soon as we have a type whose parameter count has no upper bound, we appear to drop the lower bound -- we cannot distinguish between "count >= 0" and "count >= 1". I drew the analogy with function calls because they have the same issue and we seem to be fine with that.

pradeep90 · 2022-05-23T20:39:10Z

[Guido]

First responding to Pradeep: What does "punting on the implementation" mean in practice? Does it just mean that repr(MyArray[*tuple[int, ...]]) might be Array[N1, N2, *Shape, N3][*tuple[int, ...]] rather than Array[int, int, *tuple[int, ...], int]? If so, I am fine with that.

Right now, MyArray[*tuple[int, ...]] gives a runtime error. IIUC Matthew was suggesting that we leave it as such - giving a runtime error - until we get user requests for this form. The reasoning for deferring this was that this needs some more work to implement, and there exist long-form ways to represent the type MyArray[*tuple[int, ...]], i.e., Array[int, int, *tuple[int, ...], int].

Will defer to Matthew in case he wants to clarify.

gvanrossum · 2022-05-23T20:53:44Z

Leaving it a runtime error does go against the general trend to be more lenient at runtime than in the static checker, though. So if there's another approach that's not too much effort I'd rather not have the runtime error. If we leave it a runtime error in 3.11.0, could we fix it in 3.11.1, or would we have to wait until 3.12? If 3.11.1, that would be acceptable.

mrahtz · 2022-05-23T22:41:37Z

Long thread is long

The issue of `tuple[T, Ts][tuple[int, ...]]`

[Pradeep]

Isn't Batch here a type, rather than a TypeVar?

Batch is a TypeVar. The PEP text mentions that "Batch is bound to Any and Shape is bound to Tuple[Any, ...]". Maybe it should be called BatchVar to distinguish from the named dimension :|

Yeah, I was wondering about that. I think we made a typo. In the immediately preceding context, I'm pretty sure we were talking about a function that expected a shape beginning with the Batch type:

def process_batch_channels(
    x: Array[Batch, *Tuple[Any, ...], Channels]
) -> None:
    ...

x: Array[Batch, Height, Width, Channels]
process_batch_channels(x)  # OK
y: Array[Batch, Channels]
process_batch_channels(y)  # OK
z: Array[Batch]
process_batch_channels(z)  # Error: Expected Channels.

Thinking about it some more, I realise that in the example in question...

y: Array[*Tuple[Any, ...]] = read_from_file()

def expect_variadic_array(
    x: Array[Batch, *Shape]
) -> None: ...

expect_variadic_array(y)  # OK

...you're right, in that even if we do assume Batch to be a type here, we do in some sense have to 'match up' the Batch with Any. But of course, this is a different thing from type substitution.

So yeah, I think the current phrasing in the PEP is inaccurate:

Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]. In the call to expect_precise_array, the variables Batch, Height, Width, and Channels are all bound to Any.

I think it should be something like:

Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, we're checking the Batch type for compatibility with Any, and binding Shape to Tuple[Any, ...]. In the call to expect_precise_array, we're similarly checking the Batch, Height, Width and Channels types for compatibility with Any.

What do you reckon?

[Pradeep]

The idea I was getting across was that even if Array is only generic in *Shape, a function could expect y: Array[N1, N2, *Shape, N3]. Now, if someone passes y: Array to the function, we would need to split the unbounded tuple *tuple[Any, ...] across N1, N2, *Shape, N3. That would work.

But if users tried to define a generic alias MyArray = Array[N1, N2, *Shape, N3] and tried to use y: MyArray[*tuple[Any, ...]], that would fail, even though it's the same splitting as above.

To check whether I understand - I think we're talking about the following:

Shape = TypeVarTuple('Ts')
N1 = TypeVar('N1')
N2 = TypeVar('N2')
N3 = TypeVar('N3')

class Array(Generic[*Shape]): ...

def foo(y: Array[N1, N2, *Shape, N3]): ...

y1: Array
# This is fine conceptually, both because we've decreed that plain `Array`
# should be backwards-compatible with `Array[anything]`, and because it's
# the same as `Array[*tuple[Any, ...]]`, and we can happily assign `Any` to
# `N1`, `N2` and `N3`, and `*tuple[Any, ...]` to `Shape`.
# It's also fine in our current runtime implementation.
foo(y1)

y2: Array[*tuple[Any, ...]]
# This is also fine conceptually, because it means the same thing as `y: Array`.
# It's also fine in our current runtime implementation.
foo(y2)

MyArray = Array[N1, N2, *Shape, N3]
# This is the same thing as what happens when we do foo(y2) -
# in both cases, we're comparing `N1, N2, *Shape, N3` and `*tuple[Any, ...]` -
# but this one won't work in our current runtime implementation.
y3: MyArray[*tuple[Any, ...]]

...ok, this example has persuaded me. It's true that if a function which expects a generic type with type parameters N1, N2, *Shape, N3 can accept type arguments *Tuple[Any, ...], then a generic alias with type parameters N1, N2, *Shape, N3 should also be able to accept type arguments *Tuple[Any, ...]. The only reason the former works and not the latter in the runtime currently is that in the former case we don't actually need to compute the assignment of types to type parameters explicitly. They're both conceptually sound.

So I'll work on getting my PR fixed up.

The issue of `tuple[T, Ts][tuple[int, ...], str]`

[Pradeep]

IIUC Matthew was suggesting that we leave it as such - giving a runtime error - until we get user requests for this form. The reasoning for deferring this was that this needs some more work to implement, and there exist long-form ways to represent the type MyArray[*tuple[int, ...]], i.e., Array[int, int, *tuple[int, ...], int].

Yeah, so I am suggesting that we leave tuple[T, *Ts][*tuple[int, ...], str] as a runtime error for now.

[Guido]

Leaving it a runtime error does go against the general trend to be more lenient at runtime than in the static checker, though. So if there's another approach that's not too much effort I'd rather not have the runtime error. If we leave it a runtime error in 3.11.0, could we fix it in 3.11.1, or would we have to wait until 3.12? If 3.11.1, that would be acceptable.

I agree it's not ideal, but for this case specifically, I'm not sure there's a simple alternative. I don't think the idea of leaving it partially evaluated - such that tuple[T, *Ts][*tuple[int, ...], str] evaluated to tuple[T, *Ts][*tuple[int, ...], str] and nothing simpler - is possible, for reasons that Serhiy pointed out: what would __args__ and __params__ be in this case? What should things that rely on __args__ and __params__ (Serhiy mentioned dataclasses as one such example) do?

I think we're pretty unlikely to see anyone do this though. It's a super weird thing to do. I wouldn't bet my life on it, but I'd be surprised.

The issue of what `...` should mean

...I'll reply to this one in the next few days :)

gvanrossum · 2022-05-23T23:29:02Z

So yeah, I think the current phrasing in the PEP is inaccurate:

Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, Batch is bound to Any and Shape is bound to Tuple[Any, ...]. In the call to expect_precise_array, the variables Batch, Height, Width, and Channels are all bound to Any.

I think it should be something like:

Array[*Tuple[Any, ...]] stands for an array with an arbitrary number of dimensions of type Any. This means that, in the call to expect_variadic_array, we're checking the Batch type for compatibility with Any, and binding Shape to Tuple[Any, ...]. In the call to expect_precise_array, we're similarly checking the Batch, Height, Width and Channels types for compatibility with Any.

What do you reckon?

Agreed. Batch does not have to be a typevar here for the examples to clarify the issues.

Regarding punting on the implementation, your proposal is to only punt on the case where there is an extra type (str in your example) in the actual parameters following *tuple[int, ...], right? I can live with that then, though if Serhiy manages to fix it I wouldn't stop him.

pradeep90 · 2022-05-24T01:28:07Z

...ok, this example has persuaded me.

🎉

Batch is a TypeVar. The PEP text mentions that "Batch is bound to Any and Shape is bound to Tuple[Any, ...]". Maybe it should be called BatchVar to distinguish from the named dimension :|

Yeah, I was wondering about that. I think we made a typo
What do you reckon?

Actually, I meant Batch to be a TypeVar in that example, precisely to demonstrate how unbounded tuples are split across TypeVars and TypeVarTuples. Otherwise, type checker implementers wouldn't have any code snippets showing what each variable is bound to in that use case, which is very common in Tensor code.

The change would have to be making it BatchVar or just T.

I'll put up a PR to clarify the code and wording (assuming it's ok to update the PEP, since this is an oversight in the naming).

mrahtz · 2022-05-28T10:57:02Z

[Guido]

(c) I will happily await discussing the semantics of A[T, ..., S] until there's actual experience with PEP 646 (i.e., well after 3.11 is released).

Ok, great :)

Lower bounds on the number of type arguments

(b) Yes, you understand my claim about f(t) vs. f(()). The reason I connect vararg variable substitutions with vararg function calls is that in both cases we have a single notation that means "arbitrary number" which is generally understood to mean "zero or more" but which is also acceptable in situations where the requirement is "N or more, for some N larger than zero". (We emphasize that an empty tuple is a valid match for tuple[int, ...].)

(a) I would be happy with the syntax A[X, *tuple[X, ...]] to indicate "A with 1 or more X parameters". However, we accept some other type whose meaning is "A with 0 or more X parameters" as a valid substitution. So really it appears that as soon as we have a type whose parameter count has no upper bound, we appear to drop the lower bound -- we cannot distinguish between "count >= 0" and "count >= 1". I drew the analogy with function calls because they have the same issue and we seem to be fine with that.

Ok, so if I understand right, you're saying that, regarding whether we should allow...

T = TypeVar('T')
Shape = TypeVarTuple('Shape')

class Array(Generic[*Shape]): ...
def foo(x: Array[int, *tuple[int, ...]): ...

x: Array[*tuple[int, ...]]
foo(x)

...given that we're passing "Zero or more int" to something that seems to require "One or more int", you're arguing that it should be allowed, by analogy with the following, which is fine in mypy...

def bar(a: int, *b: int): ...

t: tuple[int, ...]

foo(*t)

...and furthermore noting that, if we do allow this, one downside is that it prevents us from being able to properly enforce a lower bound of the number of type arguments in cases where we don't have an upper bound on the number of type arguments, like in foo above. Is that right?

If so - hmm, interesting point. I agree the validity of the bar example suggests we should also be fine with corresponding foo example. But it hadn't occurred to me that this requires us to give up lower bound verification.

I guess what that means for arrays is that we couldn't catch something like this:

class Array(Generic[*Shape]): ...

# Argument must have at least one dimension - scalar arrays like Array[()] not allowed!
def only_accepts_arrays(x: Array[Any, *tuple[Any, ...]]): ...

def returns_scalar_or_array() → Array[*tuple[Any, ...]]: ...

x = returns_scalar_or_array()
only_accepts_array(x)  # x might be Array[()], so this might not be ok!

This is mildly annoying, but seems like an ok sacrifice to make in order to enable gradual typing - allowing Array == Array[*tuple[Any, ...]] to be a 'universal' array.

What we're punting on

Regarding punting on the implementation, your proposal is to only punt on the case where there is an extra type (str in your example) in the actual parameters following *tuple[int, ...], right? I can live with that then, though if Serhiy manages to fix it I wouldn't stop him.

Close. There's a similar case where the unpacked arbitrary-length tuple lines up exactly with a TypeVarTuple and all other types line up exactly with TypeVars, which is easy to evaluate, and which the current runtime therefore evaluates fine:

class Array(Generic[*Ts]): pass
Alias = Array[*Ts, T]
Alias[*tuple[int, ...], str]  # Evaluates to Array[*tuple[int, ...], str]

So I think the actual logic would be something like:

If the type argument list consists of a single unpacked arbitrary-length tuple *tuple[int, ...], then we're fine.
Else if the type argument list "lines up exactly" with the type arguments - such that the type argument list is the same length as the type parameter list, and the TypeVarTuple in the type parameter list lines up with the unpacked arbitrary-length tuple in the type argument list (and there are no other unpacked arbitrary-length tuples in the type argument list), then we're fine.
Else, error.

To put it concisely: we're punting on the case where it's a valid substitution, but the unpacked arbitrary-length tuple in the arguments doesn't line up with the TypeVarTuple in the parameters.

mrahtz · 2022-05-28T10:58:45Z

[Pradeep]

Actually, I meant Batch to be a TypeVar in that example, precisely to demonstrate how unbounded tuples are split across TypeVars and TypeVarTuples. Otherwise, type checker implementers wouldn't have any code snippets showing what each variable is bound to in that use case, which is very common in Tensor code.

Oh! Sorry, I didn't realise you intended Batch to be a TypeVar here. Ok, a PR clarifying this sounds good :)

gvanrossum · 2022-05-28T15:29:24Z

Good, we’re in full agreement.

On Sat, May 28, 2022 at 03:57 Matthew Rahtz ***@***.***> wrote: [Guido] (c) I will happily await discussing the semantics of A[T, ..., S] until there's actual experience with PEP 646 (i.e., well after 3.11 is released). Ok, great :) Lower bounds on the number of type arguments (b) Yes, you understand my claim about f(*t) vs. f(*()). The reason I connect vararg variable substitutions with vararg function calls is that in both cases we have a single notation that means "arbitrary number" which is generally understood to mean "zero or more" but which is also acceptable in situations where the requirement is "N or more, for some N larger than zero". (We emphasize that an empty tuple is a valid match for tuple[int, ...].) (a) I would be happy with the syntax A[X, *tuple[X, ...]] to indicate "A with 1 or more X parameters". However, we accept some other type whose meaning is "A with 0 or more X parameters" as a valid substitution. So really it appears that as soon as we have a type whose parameter count has no upper bound, we appear to drop the lower bound -- we cannot distinguish between "count >= 0" and "count >= 1". I drew the analogy with function calls because they have the same issue and we seem to be fine with that. Ok, so if I understand right, you're saying that, regarding whether we should allow... T = TypeVar('T') Shape = TypeVarTuple('Shape') class Array(Generic[*Shape]): ... def foo(x: Array[int, *tuple[int, ...]): ... x: Array[*tuple[int, ...]] foo(x) ...given that we're passing "Zero or more int" to something that seems to require "One or more int", you're arguing that it *should* be allowed, by analogy with the following, which is fine in mypy... def bar(a: int, *b: int): ... t: tuple[int, ...] foo(*t) ...and furthermore noting that, if we *do* allow this, one downside is that it prevents us from being able to properly enforce a *lower* bound of the number of type arguments in cases where we don't have an *upper* bound on the number of type arguments, like in foo above. Is that right? If so - hmm, interesting point. I agree the validity of the bar example suggests we should also be fine with corresponding foo example. But it hadn't occurred to me that this requires us to give up lower bound verification. I guess what that means for arrays is that we couldn't catch something like this: class Array(Generic[*Shape]): ... # Argument must have at least one dimension - scalar arrays like Array[()] not allowed! def only_accepts_arrays(x: Array[Any, *tuple[Any, ...]]): ... def returns_scalar_or_array() → Array[*tuple[Any, ...]]: ... x = returns_scalar_or_array() only_accepts_array(x) # x might be Array[()], so this might not be ok! This is mildly annoying, but seems like an ok sacrifice to make in order to enable gradual typing - allowing Array == Array[*tuple[Any, ...]] to be a 'universal' array. What we're punting on Regarding punting on the implementation, your proposal is to only punt on the case where there is an extra type (str in your example) in the actual parameters following *tuple[int, ...], right? I can live with that then, though if Serhiy manages to fix it I wouldn't stop him. Close. There's a similar case where the unpacked arbitrary-length tuple lines up exactly with a TypeVarTuple and all other types line up exactly with TypeVars, which is easy to evaluate, and which the current runtime therefore evaluates fine: class Array(Generic[*Ts]): pass Alias = Array[*Ts, T] Alias[*tuple[int, ...], str] # Evaluates to Array[*tuple[int, ...], str] So I think the actual logic would be something like: - If the type argument list consists of a single unpacked arbitrary-length tuple *tuple[int, ...], then we're fine. - Else if the type argument list "lines up exactly" with the type arguments - such that the type argument list is the same length as the type parameter list, and the TypeVarTuple in the type parameter list lines up with the unpacked arbitrary-length tuple in the type argument list (and there are no other unpacked arbitrary-length tuples in the type argument list), then we're fine. - Else, error. To put it concisely: we're punting on the case where it's a valid substitution, but the unpacked arbitrary-length tuple in the arguments doesn't line up with the TypeVarTuple in the parameters. — Reply to this email directly, view it on GitHub <#91162 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWCWMRGEZXLUTYMYFELOKLVMH3YVANCNFSM5TEGYM5Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- --Guido (mobile)

…able-size tuple For example: A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]] A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]

serhiy-storchaka · 2022-05-29T07:21:34Z

I have wrote the Python implementation and now working on the C code. Please look whether the Python implementation works as you expect.

For simplicity, it forbids substitution of multiple unpacked var-tuples, e.g. A[*Ts][*tuple[int, ...], *tuple[str, ...]]. It was forbidden by PEP 646, but accepted at runtime for simplicity. I can allow this again, but it will make the code slightly more complicated.

serhiy-storchaka · 2022-05-29T07:25:33Z

The following examples now work:

A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]]
A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]
A[T, str, *Ts][*tuple[int, ...]] -> A[int, str, *tuple[int, ...]]
A[*Ts, str, T][*tuple[int, ...]] -> A[*tuple[int, ...], str, int]
A[list[T], *Ts][*tuple[int, ...]] -> A[list[int], *tuple[int, ...]]
A[*Ts, list[T]][*tuple[int, ...]] -> A[*tuple[int, ...], list[int]]

mrahtz · 2022-05-29T08:46:51Z

Oh no :( I already did a bunch of work on the Python side of this yesterday in #93318. I'm upset that I wasted a Saturday on this. Please check more carefully next time whether someone else is already working on it before starting.

Edit: ah, sorry, I thought you'd merged it already - I didn't realise it was a WIP PR. Still, unfortunate that we ended up duplicating each other's work here :(

But yes, the behaviour looks correct. And thanks for implementing the C version. I also looked at this yesterday but got a bit lost, so I appreciate you taking care of it.

serhiy-storchaka · 2022-05-30T06:27:19Z

I am sorry. I promised to work on it almost a month ago, but I only had the time and inspiration to do it last weekend. GitHub does not send notifications about the linked PRs by email, so I was unaware of your work.

#93330 is now ready for review. I have also another, simpler, version, which moves a lot of the C code to Python, but I need more time to polish it.

pythonGH-92335) (cherry picked from commit 9d25db9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

…es (GH-92335) (#92484) * gh-91162: Fix substitution of unpacked tuples in generic aliases (GH-92335) (cherry picked from commit 9d25db9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * Regenerate ABI file Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>

serhiy-storchaka · 2022-06-01T08:58:58Z

#93412 is an alternative implementation which does complex things in Python and calls the Python code from C. Seems it can also simplify the code of collections.abc.Callable (because the code is more generic now), but I left a clean up to a separate PR.

mrahtz · 2022-06-01T18:02:48Z

I am sorry. I promised to work on it almost a month ago, but I only had the time and inspiration to do it last weekend. GitHub does not send notifications about the linked PRs by email, so I was unaware of your work.

Oh, fair enough. In that case I'll just say: thank you for your continued work on this :)

…ypeVar and TypeVarTuple parameters (alt) (GH-93412) For example: A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]] A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]

…over TypeVar and TypeVarTuple parameters (alt) (pythonGH-93412) For example: A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]] A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int] (cherry picked from commit 3473817) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

JelleZijlstra added the 3.11 label Mar 13, 2022

JelleZijlstra self-assigned this Mar 13, 2022

JelleZijlstra added the 3.11 label Mar 13, 2022

JelleZijlstra self-assigned this Mar 13, 2022

gvanrossum added release-blocker type-bug labels Apr 7, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

AlexWaygood added the expert-typing label Apr 13, 2022

mrahtz mentioned this issue May 28, 2022

gh-91162: Implement splitting of *tuple[int, ...] over T, *Ts in typing.py #93318

Closed

serhiy-storchaka mentioned this issue May 29, 2022

gh-91162: Support substitution of TypeVar with an unpacked variable-size tuple #93330

Closed

pablogsal pushed a commit to miss-islington/cpython that referenced this issue May 31, 2022

pythongh-91162: Fix substitution of unpacked tuples in generic aliases (

0b78801

pythonGH-92335) (cherry picked from commit 9d25db9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

serhiy-storchaka mentioned this issue Jun 1, 2022

gh-91162: Support splitting of unpacked arbitrary-length tuple over TypeVar and TypeVarTuple parameters (alt) #93412

Merged

PEP 646: Decide on substitution behavior #91162

PEP 646: Decide on substitution behavior #91162

Comments

JelleZijlstra commented Mar 13, 2022

JelleZijlstra commented Mar 13, 2022

mrahtz mannequin commented Mar 13, 2022

mrahtz mannequin commented Mar 13, 2022

JelleZijlstra commented Mar 13, 2022

serhiy-storchaka commented Mar 19, 2022

gvanrossum commented Mar 20, 2022

JelleZijlstra commented Mar 20, 2022

serhiy-storchaka commented Mar 21, 2022

mrahtz mannequin commented Mar 21, 2022

mrahtz mannequin commented Mar 21, 2022

gvanrossum commented Mar 22, 2022

serhiy-storchaka commented Mar 22, 2022

Alias2 should be C[int, *tuple[int, ...]]

serhiy-storchaka commented Mar 22, 2022

mrahtz mannequin commented Apr 4, 2022

mrahtz mannequin commented Apr 5, 2022

mrahtz mannequin commented Apr 5, 2022

gvanrossum commented Apr 7, 2022

mrahtz commented Apr 10, 2022

gvanrossum commented Apr 11, 2022

mrahtz commented Apr 15, 2022

mrahtz commented May 21, 2022

gvanrossum commented May 21, 2022 • edited

pradeep90 commented May 21, 2022

Road ahead

Footnote

mrahtz commented May 22, 2022 • edited by gvanrossum

gvanrossum commented May 22, 2022

mrahtz commented May 22, 2022

pradeep90 commented May 23, 2022

gvanrossum commented May 23, 2022

pradeep90 commented May 23, 2022

gvanrossum commented May 23, 2022

mrahtz commented May 23, 2022 • edited

The issue of tuple[T, *Ts][*tuple[int, ...]]

The issue of tuple[T, *Ts][*tuple[int, ...], str]

The issue of what ... should mean

gvanrossum commented May 23, 2022

pradeep90 commented May 24, 2022

mrahtz commented May 28, 2022

Lower bounds on the number of type arguments

What we're punting on

mrahtz commented May 28, 2022

gvanrossum commented May 28, 2022

serhiy-storchaka commented May 29, 2022

serhiy-storchaka commented May 29, 2022

mrahtz commented May 29, 2022 • edited

serhiy-storchaka commented May 30, 2022

serhiy-storchaka commented Jun 1, 2022

mrahtz commented Jun 1, 2022

gvanrossum commented May 21, 2022 •

edited

mrahtz commented May 22, 2022 •

edited by gvanrossum

mrahtz commented May 23, 2022 •

edited

The issue of `tuple[T, Ts][tuple[int, ...]]`

The issue of `tuple[T, Ts][tuple[int, ...], str]`

The issue of what `...` should mean

mrahtz commented May 29, 2022 •

edited