-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix inconsistent results for empty datasets describe #41716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inconsistent results for empty datasets describe #41716
Conversation
this currently fails on master right? |
Hello @weikhor! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-06-08 16:24:04 UTC |
|
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I don't understand why the changing of errors being raised in _cython_transform
and _cython_agg_general
, can you explain why that's needed?
@@ -675,7 +682,7 @@ def nunique(self, dropna: bool = True) -> Series: | |||
@doc(Series.describe) | |||
def describe(self, **kwargs): | |||
result = self.apply(lambda x: x.describe(**kwargs)) | |||
if self.axis == 1: | |||
if self.axis == 1 or not isinstance(result.index, MultiIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @jreback commented, can check that it's not empty. I think the code would be more readable if we added it as a separate check without the transpose - I couldn't figure out why you'd want to take the transpose before realizing the frame is empty.
if result.empty:
return result
elif self.axis == 1:
return result.T
@@ -1850,4 +1861,4 @@ def func(df): | |||
|
|||
return self._python_apply_general(func, self._obj_with_exclusions) | |||
|
|||
boxplot = boxplot_frame_groupby | |||
boxplot = boxplot_frame_groupby |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline here
@@ -1206,3 +1206,10 @@ def test_groupby_sum_below_mincount_nullable_integer(): | |||
result = grouped.sum(min_count=2) | |||
expected = DataFrame({"b": [pd.NA] * 3, "c": [pd.NA] * 3}, dtype="Int64", index=idx) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_groupby_empty_dataset(): | |||
# 41575 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you prefix with GH
or GH #
or GH#
? We're not consistent, but I believe there is usually some prefix.
def test_groupby_empty_dataset(): | ||
# 41575 | ||
df = DataFrame(columns=["A", "B", "C"]) | ||
result = df.groupby("A").B.describe().reset_index(drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the reset index? The test should just test the behavior of describe.
Thanks for the PR, but it appears that this has gone stale and needs to address the reviews. Closing due to inactivity but happy to reopen if you're interested in continuing. |
GroupBy.describe
produces inconsistent results for empty datasets #41575