Note: in pandas version 0.20.0 and above, ix
is deprecated and the use of loc
and iloc
is encouraged instead. I have left the parts of this answer that describe ix
intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix
.
Label vs. Location
First, here's a recap ofThe main distinction between the threetwo methods:
loc
gets rows (or columns) with particular labels from the index.
iloc
gets rows (or columns) at particular positions in the index (so it only takes integers).
ix
usually tries to behave like loc
but falls back to behaving like iloc
if a label is not present in the index.
It's important to note some subtleties that can make ix
slightly tricky to use is:
if the index is of integer type, ixloc
will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raisedgets rows (and/or columns) with particular labels.
if the index does not contain only integers, then given an integer, ix
will immediately use position-based indexing rather than label-based indexing. If however ixiloc
is given another typegets rows (e.g. a stringand/or columns), it can use label-based indexing at integer locations.
To illustrate the differences between the three methodsdemonstrate, consider the following Seriesa series s
of characters with a non-monotonic integer index:
>>> s = pd.Series(np.nanlist("abcdef"), index=[49,48,47,46,45, 148, 247, 30, 41, 5]2])
>>> s
49 NaN a
48 NaN b
47 NaN
46 c
0 NaN
45 NaNd
1 NaN
2 NaNe
3 2 NaN
4 NaNf
5 NaN
We'll look at slicing with the integer value 3
.
In this case, s.iloc[:3]
returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3]
returns us the first 8 rows (since it treats 3 as a label):
>>> s.iloc[:3] # sliceloc[0] the first three rows
49 # value NaN
48at index label NaN0
47 NaN'd'
>>> s.loc[:3]iloc[0] # slice up# tovalue andat includingindex labellocation 30
49 NaN'a'
48 NaN
47>>> s.loc[0:1] NaN
46 # rows NaN
45at index labels NaN
1between 0 and 1 NaN(inclusive)
20 NaNd
31 NaNe
>>> s.ix[iloc[0:3]1] # the integer is inrows theat index solocation s.ix[:3]between works0 likeand loc1 (exclusive)
49 NaN a
48
Here are some of the differences/similarities between s.loc
and s.iloc
when passed various objects:
<object> | description | s.loc[<object>] | s.iloc[<object>] |
---|
0 | single item | Value at index label 0 (the string 'd' ) | Value at index location 0 (the string 'a' ) |
0:1 | slice | Two rows (labels 0 and 1 ) | One row (first row at location 0) |
[2, 0] | integer list | Two rows with given labels | Two rows with given locations |
s > 'e' | Bool series (indicating which values have the property) | One row (containing 'f' ) | NotImplementedError |
(s>'e').values | Bool array | One row (containing 'f' ) | Same as loc |
999 | int object not in index | KeyError | IndexError (out of bounds) |
-1 | int object not in index | KeyError | Returns last value in s |
lambda x: x.index[3] | callable applied to series (here returning 3rd item in index) | s.loc[s.index[3]] | s.iloc[s.index[3]] |
loc
's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.
Here's a Series where the index contains string objects:
>>> s2 = NaNpd.Series(s.index, index=s.values)
47>>> s2
a NaN 49
46b NaN 48
45c NaN 47
1d NaN 0
2e NaN 1
3f NaN 2
Notice s.ix[:3]
returns the same Series as s.loc[:3]
since it looks for the label first rather than working on the position (and the index forSince sloc
is of integer type).
What if we try with an integer label that isn't in the index (say 6
)?
Here s.iloc[:6]
returns-based, it can fetch the first 6 rows ofvalue in the Series as expected. However,using ss2.loc[:6]
raises a KeyError since 6loc['a']
is not in the index. It can also slice with non-integer objects:
>>> ss2.iloc[loc['c':6]
49'e'] # NaN
48all rows lying NaN
47between 'c' and NaN'e' (inclusive)
46c NaN 47
45d NaN 0
1e NaN 1
For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:
>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M'))
>>> ss3
2021-01-31 16:41:31.loc[879768 a
2021-02-28 16:6]41:31.879768 b
KeyError2021-03-31 16:41:31.879768 6 c
2021-04-30 16:41:31.879768 d
2021-05-31 16:41:31.879768 e
Then to fetch the row(s) for March/April 2021 we only need:
>>> ss3.ix[loc['2021-03':6]'2021-04']
KeyError2021-03-31 17:04:30.742316 6 c
2021-04-30 17:04:30.742316 d
Rows and Columns
As per the subtleties noted above, s.ix[:6]
now raises a KeyError because it tries to work like loc
but can't find aand 6iloc
inwork the indexsame way with DataFrames as they do with Series. Because our index is of integer type ix
doesn't fall backIt's useful to behaving like iloc
note that both methods can address columns and rows together.
If, howeverWhen given a tuple, ourthe first element is used to index was of mixed typethe rows and, given an integer ix
would behave like iloc
immediately instead of raising a KeyErrorif it exists, the second element is used to index the columns.
Consider the DataFrame defined below:
>>> s2import numpy as np
>>> df = pd.SeriesDataFrame(np.nanarange(25).reshape(5, index=['a','b','c','d','e'5), 1, 2
index=list('abcde'), 3
columns=['x','y','z', 48, 5]9])
>>> s2.index.is_mixed()df
# index is mixx of different types
True
>>>y s2.ix[:6] # nowz behaves like iloc8 given integer 9
a NaN0 1 2 3 4
b NaN5 6 7 8 9
c 10 NaN 11 12 13 14
d 15 NaN 16 17 18 19
e 20 NaN
1 21 22 NaN 23 24
Keep in mind that ix
can still accept non-integers and behave like loc
Then for example:
>>> s2df.ix[loc['c':'c'] , :'z'] # behavesrows like'c' locand givenonwards non-integerAND columns up to 'z'
x y z
c 10 11 12
d 15 16 17
e 20 21 22
>>> df.iloc[:, 3] # all rows, but only the column at index location 3
a NaN 3
b NaN 8
c NaN 13
d 18
e 23
As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc
or iloc
to avoid unexpected results - try not use ix
.
Combining position-based and label-based indexing
Sometimes given a DataFrame, you willwe want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc
and iloc
.
>>> import numpy as np
>>> df = pd.DataFrame(np.nanarange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a NaN NaN NaN0 NaN NaN
b NaN1 NaN NaN NaN2 NaN
c NaN NaN3 NaN NaN NaN4
db NaN NaN NaN5 NaN NaN
e NaN6 NaN NaN NaN7 NaN
In earlier versions of pandas (before 0.20.0) ix
lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix
will default to position-based slicing since 4
is not a column name):
>>> df.ix[:'c', :4]
8 9
c x 10 y11 12 z 13 814
ad NaN NaN15 NaN NaN
b16 NaN NaN17 NaN NaN18 19
ce NaN NaN20 NaN NaN21 22 23 24
In later versions of pandas, weWe can achieve this result using iloc
and the help of another method:
>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x y z 8
a NaN NaN NaN0 NaN 1 2 3
b NaN NaN NaN5 NaN 6 7 8
c NaN NaN10 NaN NaN11 12 13
get_loc()
is an index method meaning "get the position of the label in this index". Note that since slicing with iloc
is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.
There are further examples in pandas' documentation here.