Skip to content

DOC: documentation on .loc[] needs some clarification #46620

Closed
@kwhkim

Description

@kwhkim

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

Documentation problem

The Original documentation is as follows.

===

tuples = [
   ('cobra', 'mark i'), ('cobra', 'mark ii'),
   ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
   ('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
        [1, 4], [7, 1], [16, 36]]

Single index tuple. Note this returns a Series.

df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this returns a Series.

df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

===

But df.loc['cobra', 'mark i'] is essentially the same as df.loc[('cobra', 'mark i')] because we can omit (, ) inside [, ].
So the second example is not always about single label for row and column but can be also multiple index label.

In conclusion, df.loc[A, B] can be either df.loc[(A, B),:] or df.loc[(A,),B] but pandas seems to try the first one first

Suggested fix for documentation

Single index tuple. Note this returns a Series.

df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

We can omit parentheses from single index tuple. Single index tuple can mean either MultiIndex or single label for row and column. pandas tries multiple index first and it tries single label for row and column if it fails. Here is the example of this.

df.loc['cobra', 'mark ii']
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Let's make a DataFrame with the identical index and column labels.

df.columns=['mark i', 'mark ii']

Let's compare the below

df.loc[('cobra', 'mark i')]
df.loc[('cobra',), 'mark i']
from pandas import IndexSlice as ix
df.loc[('cobra', ix[:]), 'mark i']

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions