Skip to content

set timeoffset as the window parameter doesn't work for rolling corr function #28266

Closed
@TusakaRin

Description

@TusakaRin

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'B': [0, 1, 2, 4,3],'A':[7,4,6,9,3]},
                   index = [pd.Timestamp('20130101 09:00:00'),
                            pd.Timestamp('20130102 09:00:02'),
                            pd.Timestamp('20130103 09:00:03'),
                            pd.Timestamp('20130105 09:00:05'),
                            pd.Timestamp('20130106 09:00:06')])
print(df.corr())
'''
df.corr()
Out[53]: 
         B        A
B  1.00000  0.19868
A  0.19868  1.00000
'''
df.rolling(window='3d').corr()
'''
                              B         A
2013-01-01 09:00:00 B       NaN       NaN
                    A       NaN       NaN
2013-01-02 09:00:02 B  1.000000 -1.000000
                    A -1.000000  1.000000
2013-01-03 09:00:03 B  1.000000 -0.327327
                    A -0.327327  1.000000
2013-01-05 09:00:05 B  1.000000  0.609449
                    A  0.609449  1.000000
2013-01-06 09:00:06 B  1.000000  0.198680
                    A  0.198680  1.000000
'''

Problem description

Due to some conflicts I'm not able to test it on pandas 0.25, so I'm not sure whether the problem is solved. You may find that the last coorelation value 0.198680 is exactly the same as perform corr() on the total dataframe. No matter how I change the timeoffset, namely the window parameter, the rolling(window='timeoffset string').corr() method returns cummulative correlation instead of correlation inside the window.

The reason why the expected output contains so many 'ones' is that rows which not fall in the time window should not be included into the calculation. For example, the 'ones' at the last row is calculated by the fourth row and the fifth row(the last row), because the indices is between 2013-01-03 09:00:06 and 2013-01-06 09:00:06 (the window parameter is 3d) and the correlation of these two point pairs is 1.

Expected Output

                              B         A
2013-01-01 09:00:00 B       NaN       NaN
                    A       NaN       NaN
2013-01-02 09:00:02 B  1.000000 -1.000000
                    A -1.000000  1.000000
2013-01-03 09:00:03 B  1.000000 -0.327327
                    A -0.327327  1.000000
2013-01-05 09:00:05 B  1.000000  1.000000
                    A  1.000000  1.000000
2013-01-06 09:00:06 B  1.000000  1.000000
                    A  1.000000  1.000000

Output of pd.show_versions()

------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: zh_CN LOCALE: None.None

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions