Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Center rolling window for time offset #38780

Merged
merged 61 commits into from Apr 9, 2021
Merged

Conversation

@luholgit
Copy link
Contributor

@luholgit luholgit commented Dec 29, 2020

  • closes #20012
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Added center functionality for VariableWindowIndexer. Note - I am unsure if the NotImplementedError in lines 1966-1969 in rolling.py still correctly raises an error for offset based windows.

Finalizes previous PR #36097

Copy link
Contributor

@jreback jreback left a comment

needs tests

@jreback jreback added the Window label Dec 29, 2020
@luholgit
Copy link
Contributor Author

@luholgit luholgit commented Dec 29, 2020

take

@@ -11,7 +11,7 @@ def calculate_variable_window_bounds(
int64_t num_values,
int64_t window_size,
object min_periods, # unused but here to match get_window_bounds signature
object center, # unused but here to match get_window_bounds signature
object center,
Copy link
Member

@mroeschke mroeschke Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be bint center now if supported

Copy link
Contributor Author

@luholgit luholgit Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your quick response, this is done now. :)

@@ -450,7 +457,36 @@ def homogeneous_func(values: np.ndarray):
if values.size == 0:
return values.copy()

def _calculate_center_offset(window) -> int:
Copy link
Member

@mroeschke mroeschke Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this code below should be needed.

The center calculation should just adjust the indexers return in calculate_variable_window_bounds and everything else should work correctly.

@@ -334,6 +334,13 @@ def _get_window_indexer(self) -> BaseIndexer:
"""
if isinstance(self.window, BaseIndexer):
return self.window

if self.is_freq_type:
Copy link
Member

@mroeschke mroeschke Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause shouldn't be needed either. self,center should just be passed to VariableWindowIndexer in the clause below

Copy link
Contributor Author

@luholgit luholgit Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is corrected too

or isinstance(
self._on, (ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex)
)
) and isinstance(self.window, (str, BaseOffset, timedelta)):

self._validate_monotonic()

# we don't allow center
if self.center:
# we don't allow center for offset based windows
Copy link
Member

@mroeschke mroeschke Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire clause should be removed once center is supported

Copy link
Contributor Author

@luholgit luholgit Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this now, too! :)

_calculate_center_offset(self.window)
if self.center
and not isinstance(
self._get_window_indexer(self.window), VariableWindowIndexer
Copy link
Member

@arw2019 arw2019 Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should be passing an argument here, just self._get_window_indexer()

Copy link
Contributor Author

@luholgit luholgit Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the argument here, too. Thanks for the feedback!

@jreback jreback changed the title Center window Center rolling window for time offset Dec 29, 2020
Copy link
Contributor

@jreback jreback left a comment

you need tests to validate each path that is added.

@@ -74,14 +78,27 @@ def calculate_variable_window_bounds(
# right endpoint is open
else:
end[0] = 0
if center_window:
for j in range(0, num_values+1):
if (index[j] == index[0] + index_growth_sign*window_size/2 and
Copy link
Contributor

@jreback jreback Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use spaces around operators & parens to format this (we don't lint these files automatically)

def _center_window(self, result: np.ndarray, offset: int) -> np.ndarray:
"""
Center the result in the window for weighted rolling aggregations.
"""
if self.axis > result.ndim - 1:
raise ValueError("Requested axis is larger then no. of argument dimensions")

if offset > 0:
lead_indexer = [slice(None)] * result.ndim
lead_indexer[self.axis] = slice(offset, None)
result = np.copy(result[tuple(lead_indexer)])
return result

Copy link
Contributor Author

@luholgit luholgit Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an effort to make the tests run again before introducing new ones, I encountered an error with "'Rolling' object has no attribute '_center_window'".
I tried to solve the error by adding the _center_window() method to the BaseWindow class.
However, this resulted in new errors.

I am not sure whether it is correct to follow this debugging path. Could you quickly comment @jreback , @mroeschke or @arw2019?

Meanwhile, I am working on adding the changes you asked for. :)

Copy link
Member

@mroeschke mroeschke Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #38780 (comment), I think you shouldn't need any additional code in this _apply method.

The changes should just need to occur in pandas/_libs/window/indexers.pyx in the calculate_variable_window_bounds function to make it return indexes that "center" the window.

Copy link
Member

@mroeschke mroeschke Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, for non time offests, center is handled when calculating the indexes of each window

Copy link
Contributor Author

@luholgit luholgit Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #38780 (comment), I think you shouldn't need any additional code in this _apply method.

The changes should just need to occur in pandas/_libs/window/indexers.pyx in the calculate_variable_window_bounds function to make it return indexes that "center" the window.

Just removed the unnecessary lines from rolling.py and that made the tests work again, thanks for that hint! At first I understood I had to migrate those lines to the pandas/_libs/window/indexers.pyx instead of ditching them all together.

@@ -60,6 +62,8 @@ def calculate_variable_window_bounds(

if index[num_values - 1] < index[0]:
index_growth_sign = -1
if center:
Copy link
Member

@mroeschke mroeschke Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sine we typed center as bint. This isnt needed anymore

@@ -334,9 +334,12 @@ def _get_window_indexer(self) -> BaseIndexer:
"""
if isinstance(self.window, BaseIndexer):
return self.window

Copy link
Member

@mroeschke mroeschke Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this white space?

@@ -451,6 +454,7 @@ def homogeneous_func(values: np.ndarray):
return values.copy()

def calc(x):

Copy link
Member

@mroeschke mroeschke Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this white space?

@@ -1880,20 +1884,14 @@ def validate(self):
# we allow rolling on a datetimelike index
if (
self.obj.empty
# TODO: add "or self.is_datetimelike"?
Copy link
Member

@mroeschke mroeschke Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this TODO

Copy link
Contributor

@jreback jreback left a comment

@luholgit we need actual tests that lock down whether the bounds are correct including variable size windows. There are some center tests for fixed windows, follow those examples.

@pep8speaks
Copy link

@pep8speaks pep8speaks commented Mar 29, 2021

Hello @luholgit! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-09 12:15:28 UTC

Copy link
Contributor

@jreback jreback left a comment

pls also add a whatsnew note (enh is ok), as well as see if any updates are needed in the https://pandas.pydata.org/pandas-docs/dev/user_guide/window.html, also the doc-string of rolling might need a small update as well

pandas/_libs/window/indexers.pyx Show resolved Hide resolved
pandas/tests/window/test_rolling.py Show resolved Hide resolved

tm.assert_frame_equal(result, expected)


def test_closed_fixed_binary_col():
def test_datetimelike_centered_selections(closed, arithmetic_win_operators):
Copy link
Contributor

@jreback jreback Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you paramterize on closed (e.g. directly put your window selection up there with the closed parameter), eg.. don't use the fixture

Copy link
Contributor Author

@luholgit luholgit Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this is done, too.

+ [77.0] * 2
)

else:
Copy link
Contributor

@jreback jreback Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if doing this pls paramterize outside the function

Copy link
Contributor Author

@luholgit luholgit Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@jreback jreback added this to the 1.3 milestone Apr 7, 2021
used (:issue:`38780`).
For example:

.. code-block:: ipython
Copy link
Contributor

@jreback jreback Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ipython:: python so the code runs (don't need the prompts or output just the code)

@@ -134,6 +134,30 @@ a copy will no longer be made (:issue:`32960`)
The default behavior when not passing ``copy`` will remain unchanged, i.e.
a copy will be made.

Centered Datetime-Like Rolling Windows
Copy link
Contributor

@jreback jreback Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not object to an example in window.rst as well

Copy link
Contributor Author

@luholgit luholgit Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the same example also to the window.rst now

pandas/_libs/window/indexers.pyx Show resolved Hide resolved
index=date_range("2020", periods=5, freq="1D")
)
In [2]: df.rolling("2D", center=True).mean()
Copy link
Member

@mroeschke mroeschke Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the In Out annotations when using .. ipython:: python (the code will execute line by line

def test_even_number_window_alignment():
# see discussion in GH 38780
s = Series(range(3), index=date_range(start="2020-01-01", freq="D", periods=3))

# behavior of index- and datetime-based windows differs here!
# s.rolling(window=2, min_periods=1, center=True).mean()

result = s.rolling(window="2D", min_periods=1, center=True).mean()

expected = Series([0.5, 1.5, 2], index=s.index)

tm.assert_series_equal(result, expected)
Copy link
Contributor Author

@luholgit luholgit Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I added this test to check for the behavior when operating with even sized windows. As mentioned by @sevberg this currently does behave differently than the classic index-based windows.

Is that what you had in mind here?

Copy link
Contributor Author

@luholgit luholgit Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would also work with N instead of D as frequency, therefore I am not sure if this is what you meant when referring to nano seconds?

Copy link
Member

@mroeschke mroeschke Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe this test exercises the difference in behavior which is good.

Copy link
Contributor Author

@luholgit luholgit Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

Copy link
Contributor

@jreback jreback left a comment

just a minor doc comment, ping on green (I think you addressed all of @mroeschke comments but if not pls do)

doc/source/user_guide/window.rst Show resolved Hide resolved
df = pd.DataFrame(
{"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
)
Copy link
Contributor

@jreback jreback Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

@luholgit luholgit Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -156,6 +156,16 @@ By default the labels are set to the right edge of the window, but a
s.rolling(window=5).mean()
s.rolling(window=5, center=True).mean()
This can also be applied to datetime-like indices.

Copy link
Contributor

@jreback jreback Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a versionadded 1.3 tag here

Copy link
Contributor Author

@luholgit luholgit Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, too!

jreback
jreback approved these changes Apr 9, 2021
@jreback jreback merged commit 2e341ab into pandas-dev:master Apr 9, 2021
22 of 23 checks passed
@jreback
Copy link
Contributor

@jreback jreback commented Apr 9, 2021

thanks @luholgit very nice!

@luholgit
Copy link
Contributor Author

@luholgit luholgit commented Apr 13, 2021

Big thanks to you, @jreback and @mroeschke! I've learned a lot while working on this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

9 participants