Skip to main content
incorrect number of rows returned specified
Source Link
Parham
  • 3.5k
  • 6
  • 34
  • 46
<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
1:47 slice with out-of-bounds end Zero rows (empty Series) Five rows (location 1 onwards)
1:47:-1 slice with negative step Fourthree rows (labels 1 back to 47) Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]
<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
1:47 slice with out-of-bounds end Zero rows (empty Series) Five rows (location 1 onwards)
1:47:-1 slice with negative step Four rows (labels 1 back to 47) Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]
<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
1:47 slice with out-of-bounds end Zero rows (empty Series) Five rows (location 1 onwards)
1:47:-1 slice with negative step three rows (labels 1 back to 47) Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]
Remove info about ix as it is very out-of-date now. Generally improve answer to emphasise differneces in loc/iloc.
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
1:47slice with out-of-bounds endZero rows (empty Series)Five rows (location 1 onwards)
1:47:-1slice with negative stepFour rows (labels 1 back to 47)Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]
<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]
<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
1:47slice with out-of-bounds endZero rows (empty Series)Five rows (location 1 onwards)
1:47:-1slice with negative stepFour rows (labels 1 back to 47)Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]
Remove info about ix as it is very out-of-date now. Generally improve answer to emphasise differneces in loc/iloc.
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.


Label vs. Location

First, here's a recap ofThe main distinction between the threetwo methods:

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

It's important to note some subtleties that can make ix slightly tricky to use is:

  • if the index is of integer type, ixloc will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raisedgets rows (and/or columns) with particular labels.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ixiloc is given another typegets rows (e.g. a stringand/or columns), it can use label-based indexing at integer locations.


 

To illustrate the differences between the three methodsdemonstrate, consider the following Seriesa series s of characters with a non-monotonic integer index:

>>> s = pd.Series(np.nanlist("abcdef"), index=[49,48,47,46,45, 148, 247, 30, 41, 5]2])
>>> s
49   NaN a
48   NaN b
47   NaN
46 c
0  NaN
45   NaNd
1    NaN
2    NaNe
3   2 NaN
4    NaNf
5    NaN

We'll look at slicing with the integer value 3.

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # sliceloc[0] the first three rows
49 # value NaN
48at index label NaN0
47   NaN'd'

>>> s.loc[:3]iloc[0] # slice up# tovalue andat includingindex labellocation 30
49   NaN'a'
48   NaN
47>>> s.loc[0:1]  NaN
46 # rows NaN
45at index labels NaN
1between 0 and 1 NaN(inclusive)
20    NaNd
31    NaNe

>>> s.ix[iloc[0:3]1] # the integer is inrows theat index solocation s.ix[:3]between works0 likeand loc1 (exclusive)
49   NaN a
48

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:

<object>descriptions.loc[<object>]s.iloc[<object>]
0single itemValue at index label 0 (the string 'd')Value at index location 0 (the string 'a')
0:1sliceTwo rows (labels 0 and 1)One row (first row at location 0)
[2, 0]integer listTwo rows with given labelsTwo rows with given locations
s > 'e'Bool series (indicating which values have the property)One row (containing 'f')NotImplementedError
(s>'e').valuesBool arrayOne row (containing 'f')Same as loc
999int object not in indexKeyErrorIndexError (out of bounds)
-1int object not in indexKeyErrorReturns last value in s
lambda x: x.index[3]callable applied to series (here returning 3rd item in index)s.loc[s.index[3]]s.iloc[s.index[3]]

loc's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.

Here's a Series where the index contains string objects:

>>> s2 = NaNpd.Series(s.index, index=s.values)
47>>> s2
a  NaN  49
46b   NaN 48
45c   NaN 47
1d    NaN 0
2e    NaN 1
3f    NaN 2

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index forSince sloc is of integer type).

What if we try with an integer label that isn't in the index (say 6)?

Here s.iloc[:6] returns-based, it can fetch the first 6 rows ofvalue in the Series as expected. However,using ss2.loc[:6] raises a KeyError since 6loc['a'] is not in the index. It can also slice with non-integer objects:

>>> ss2.iloc[loc['c':6]
49'e']  # NaN
48all rows lying NaN
47between 'c' and NaN'e' (inclusive)
46c   NaN 47
45d   NaN  0
1e    NaN 1

For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) 
>>> ss3
2021-01-31 16:41:31.loc[879768    a
2021-02-28 16:6]41:31.879768    b
KeyError2021-03-31 16:41:31.879768 6   c
2021-04-30 16:41:31.879768    d
2021-05-31 16:41:31.879768    e

Then to fetch the row(s) for March/April 2021 we only need:

>>> ss3.ix[loc['2021-03':6]'2021-04']
KeyError2021-03-31 17:04:30.742316 6   c
2021-04-30 17:04:30.742316    d

Rows and Columns

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can't find aand 6iloc inwork the indexsame way with DataFrames as they do with Series. Because our index is of integer type ix doesn't fall backIt's useful to behaving like ilocnote that both methods can address columns and rows together.

If, howeverWhen given a tuple, ourthe first element is used to index was of mixed typethe rows and, given an integer ix would behave like iloc immediately instead of raising a KeyErrorif it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

>>> s2import numpy as np 
>>> df = pd.SeriesDataFrame(np.nanarange(25).reshape(5, index=['a','b','c','d','e'5), 1, 2
                      index=list('abcde'), 3
                      columns=['x','y','z', 48, 5]9])
>>> s2.index.is_mixed()df
 # index is mixx of different types
True
>>>y s2.ix[:6] # nowz behaves like iloc8 given integer 9
a   NaN0   1   2   3   4
b   NaN5   6   7   8   9
c  10 NaN 11  12  13  14
d  15 NaN 16  17  18  19
e  20 NaN
1 21  22 NaN 23  24

Keep in mind that ix can still accept non-integers and behave like locThen for example:

>>> s2df.ix[loc['c':'c'] , :'z']  # behavesrows like'c' locand givenonwards non-integerAND columns up to 'z'
    x   y   z
c  10  11  12
d  15  16  17
e  20  21  22

>>> df.iloc[:, 3]        # all rows, but only the column at index location 3
a   NaN  3
b   NaN  8
c   NaN 13
d    18
e    23

As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results - try not use ix.


Combining position-based and label-based indexing

Sometimes given a DataFrame, you willwe want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc and iloc.

>>> import numpy as np 
>>> df = pd.DataFrame(np.nanarange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN0 NaN NaN
b NaN1 NaN NaN NaN2 NaN
c NaN NaN3 NaN NaN NaN4
db NaN NaN NaN5 NaN NaN
e NaN6 NaN NaN NaN7 NaN

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
8   9
c x 10  y11  12 z 13  814
ad NaN NaN15 NaN NaN
b16 NaN NaN17 NaN NaN18  19
ce NaN NaN20 NaN NaN21  22  23  24

In later versions of pandas, weWe can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN0 NaN  1   2   3
b NaN NaN NaN5 NaN  6   7   8
c NaN NaN10 NaN NaN11  12  13

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

There are further examples in pandas' documentation here.

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.


First, here's a recap of the three methods:

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

It's important to note some subtleties that can make ix slightly tricky to use:

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.


 

To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

We'll look at slicing with the integer value 3.

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

What if we try with an integer label that isn't in the index (say 6)?

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can't find a 6 in the index. Because our index is of integer type ix doesn't fall back to behaving like iloc.

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results - try not use ix.


Combining position-based and label-based indexing

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

There are further examples in pandas' documentation here.

Label vs. Location

The main distinction between the two methods is:

  • loc gets rows (and/or columns) with particular labels.

  • iloc gets rows (and/or columns) at integer locations.

To demonstrate, consider a series s of characters with a non-monotonic integer index:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 
49    a
48    b
47    c
0     d
1     e
2     f

>>> s.loc[0]    # value at index label 0
'd'

>>> s.iloc[0]   # value at index location 0
'a'

>>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive)
0    d
1    e

>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49    a

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:

<object>descriptions.loc[<object>]s.iloc[<object>]
0single itemValue at index label 0 (the string 'd')Value at index location 0 (the string 'a')
0:1sliceTwo rows (labels 0 and 1)One row (first row at location 0)
[2, 0]integer listTwo rows with given labelsTwo rows with given locations
s > 'e'Bool series (indicating which values have the property)One row (containing 'f')NotImplementedError
(s>'e').valuesBool arrayOne row (containing 'f')Same as loc
999int object not in indexKeyErrorIndexError (out of bounds)
-1int object not in indexKeyErrorReturns last value in s
lambda x: x.index[3]callable applied to series (here returning 3rd item in index)s.loc[s.index[3]]s.iloc[s.index[3]]

loc's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.

Here's a Series where the index contains string objects:

>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a    49
b    48
c    47
d     0
e     1
f     2

Since loc is label-based, it can fetch the first value in the Series using s2.loc['a']. It can also slice with non-integer objects:

>>> s2.loc['c':'e']  # all rows lying between 'c' and 'e' (inclusive)
c    47
d     0
e     1

For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) 
>>> s3
2021-01-31 16:41:31.879768    a
2021-02-28 16:41:31.879768    b
2021-03-31 16:41:31.879768    c
2021-04-30 16:41:31.879768    d
2021-05-31 16:41:31.879768    e

Then to fetch the row(s) for March/April 2021 we only need:

>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316    c
2021-04-30 17:04:30.742316    d

Rows and Columns

loc and iloc work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together.

When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

Then for example:

>>> df.loc['c': , :'z']  # rows 'c' and onwards AND columns up to 'z'
    x   y   z
c  10  11  12
d  15  16  17
e  20  21  22

>>> df.iloc[:, 3]        # all rows, but only the column at index location 3
a     3
b     8
c    13
d    18
e    23

Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc and iloc.

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

We can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a   0   1   2   3
b   5   6   7   8
c  10  11  12  13

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

Bounty Awarded with 150 reputation awarded by cs95
added 184 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
update following .ix deprecation
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 113 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 113 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
oops... wrong way round
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
deleted 5 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 1 character in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 976 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 976 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 56 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 63 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 341 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 26 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
added 26 characters in body
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading
Source Link
Alex Riley
  • 177.4k
  • 46
  • 272
  • 245
Loading