Skip to content

Fix inconsistent results for empty datasets describe #41716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from
Closed

Fix inconsistent results for empty datasets describe #41716

wants to merge 20 commits into from

Conversation

weikhor
Copy link
Contributor

@weikhor weikhor commented May 29, 2021

@weikhor weikhor changed the title Resolve about inconsistent results for empty datasets describe Fix inconsistent results for empty datasets describe May 29, 2021
@weikhor
Copy link
Contributor Author

weikhor commented May 29, 2021

Hi @jreback This PR is about fixing #41575. Thank

@jreback
Copy link
Contributor

jreback commented May 31, 2021

this currently fails on master right?

@pep8speaks
Copy link

pep8speaks commented Jun 1, 2021

Hello @weikhor! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1864:36: W292 no newline at end of file

Line 1210:1: E302 expected 2 blank lines, found 1

Comment last updated at 2021-06-08 16:24:04 UTC

@weikhor
Copy link
Contributor Author

weikhor commented Jun 1, 2021

this currently fails on master right?
Yes

@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 9, 2021
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I don't understand why the changing of errors being raised in _cython_transform and _cython_agg_general, can you explain why that's needed?

@@ -675,7 +682,7 @@ def nunique(self, dropna: bool = True) -> Series:
@doc(Series.describe)
def describe(self, **kwargs):
result = self.apply(lambda x: x.describe(**kwargs))
if self.axis == 1:
if self.axis == 1 or not isinstance(result.index, MultiIndex):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jreback commented, can check that it's not empty. I think the code would be more readable if we added it as a separate check without the transpose - I couldn't figure out why you'd want to take the transpose before realizing the frame is empty.

if result.empty:
    return result
elif self.axis == 1:
    return result.T

@@ -1850,4 +1861,4 @@ def func(df):

return self._python_apply_general(func, self._obj_with_exclusions)

boxplot = boxplot_frame_groupby
boxplot = boxplot_frame_groupby
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline here

@@ -1206,3 +1206,10 @@ def test_groupby_sum_below_mincount_nullable_integer():
result = grouped.sum(min_count=2)
expected = DataFrame({"b": [pd.NA] * 3, "c": [pd.NA] * 3}, dtype="Int64", index=idx)
tm.assert_frame_equal(result, expected)

def test_groupby_empty_dataset():
# 41575
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you prefix with GH or GH # or GH#? We're not consistent, but I believe there is usually some prefix.

def test_groupby_empty_dataset():
# 41575
df = DataFrame(columns=["A", "B", "C"])
result = df.groupby("A").B.describe().reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the reset index? The test should just test the behavior of describe.

@mroeschke
Copy link
Member

Thanks for the PR, but it appears that this has gone stale and needs to address the reviews. Closing due to inactivity but happy to reopen if you're interested in continuing.

@mroeschke mroeschke closed this Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: GroupBy.describe produces inconsistent results for empty datasets
5 participants