Skip to content

sort_index not sorting when multi-index made by different categorical types #24271

Closed
@wiso

Description

@wiso

Code Sample, a copy-pastable example if possible

This is the shorter code to reproduce the problem

import pandas as pd
from pandas.api.types import CategoricalDtype
number_type = CategoricalDtype(['one', 'two', 'three', 'four', 'five'], ordered=True)
day_type = CategoricalDtype(['monday', 'tuesday', 'wednesday', 'thursday', 'friday'], ordered=True)

dd = pd.DataFrame([('two', 'tuesday', 'one', 'up', 10),
                   ('one', 'wednesday', 'two', 'up', 20),
                   ('five', 'monday', 'two', 'up', 1),
                   ('three', 'tuesday', 'three', 'up', 1),
                   ('four', 'monday', 'one', 'up', 2),
                   ('one', 'friday', 'one', 'up', 2),
                   
                   ('two', 'tuesday', 'one', 'down', 10),
                   ('one', 'wednesday', 'two', 'down', 20),
                   ('five', 'monday', 'two', 'down', 1),
                   ('three', 'tuesday', 'three', 'down', 1),
                   ('four', 'monday', 'one', 'down', 2),
                   ('one', 'friday', 'one', 'down', 2),
                  ])
dd = dd.set_index([0, 1, 2, 3])
dd = dd.unstack(3)[4]


dd.index = dd.index.set_levels(dd.index.levels[0].astype(number_type), 0)
dd.index = dd.index.set_levels(dd.index.levels[1].astype(day_type), 1)
dd.index = dd.index.set_levels(dd.index.levels[2].astype(number_type), 2)

print (dd.sort_index())

Problem description

The dataframe is not sorted. I get

0     1         2              
five  monday    two       1   1
four  monday    one       2   2
one   friday    one       2   2
      wednesday two      20  20
three tuesday   three     1   1
two   tuesday   one      10  10

which is exactly what you get without any sorting.

If all the index levels have the same categorical type it seems to work.

It works if I reset the index:

print(dd.reset_index().set_index([0, 1, 2]).sort_index())

Expected Output

0     1         2              
one   wednesday two      20  20
      friday    one       2   2
two   tuesday   one      10  10
three tuesday   three     1   1
four  monday    one       2   2
five  monday    two       1   1

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.16-300.fc29.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.4
pytest: 4.0.0
pip: 18.1
setuptools: 40.5.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.6
feather: None
matplotlib: 3.0.1
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: None
lxml: None
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions