TYP: how to annotate DataFrame.__getitem__

I was about to create a PR  but I realized that annotating `DataFrame.__getitem__` might be impossible without making some simplifications.

There are two problems with `__getitem__`:

- We allow any `Hashable` key (one would expect that this always returns a `Series`) but `slice` (is `Hashable`) returns a `DataFrame`
- Columns can be a multiindex, `df["a"`] can return a `DataFrame`.


The MS stubs seems to make two assumptions: 1) columns can only be of type str (and maybe a few more types - but not Hashable) and 2) multiindex doesn't exist. In practice, this will cover almost all cases.

I don't think there is a solution for the multiindex issue. Even if we make DataFrame generic to carry the type of the column index, there is no `Not[Multiindex]` type, so we will always end up with incompatible & overlapping overloads.

The Hashable issue can partly be addressed:

```py
# cover most common cases that return a Series
@overloads
def __getitem__(self, key :Scalar) -> Series:
    ...

# cover most common cases that return a DataFrame
@overloads
def __getitem__(self, key : list[HashableT] | np.ndarray | slice | Index | Series) -> DataFrame:
    ...

# everything else
@overloads
def __getitem__(self, key : Hashable) -> Any:  #  or Series | DataFrame (but might create many errors, typshed also uses Any in some cases to avoid unions)
    ...
```

Do you see a way to cover all cases of `__getitem__` and if not which assumptions are you willing to make? @simonjayhawkins @Dr-Irv 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: how to annotate DataFrame.getitem #46616

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TYP: how to annotate DataFrame.__getitem__ #46616

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

TYP: how to annotate DataFrame.getitem #46616