Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault with printing dataframe #47097

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

roberthdevries
Copy link
Contributor

@roberthdevries roberthdevries commented May 23, 2022

GH 46848

@roberthdevries
Copy link
Contributor Author

@roberthdevries roberthdevries commented May 23, 2022

I have a fix that prevents the segfault, but I am not sure how a regression test could be made as the failure depends on the width of the terminal screen. I deem this not a workable regression test, but I am at a loss how to trigger the issue otherwise.
As far as I can determine the crash only happens when printing a dataframe requires the use of dots to indicate missing values when the screen is not wide enough.
Apparently there is some logic to determine this situation, and then take another code path. This other code path then takes care of the injection of the dots, but also triggers the code path that produces the segfault.
Ideally we can make a regression test that provokes the code path that produces the segfault without the print logic.

@roberthdevries roberthdevries marked this pull request as draft May 23, 2022
@roberthdevries roberthdevries force-pushed the 46848-fix-segfault-when-printing-dataframe branch from eb6301c to 9981fb1 Compare May 23, 2022
@rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented May 23, 2022

Does the segfault arise in one of the lines of the diff of this PR? Or is it only when this result is used later on?

@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone May 26, 2022
@simonjayhawkins simonjayhawkins added Regression Output-Formatting labels May 26, 2022
@@ -183,8 +183,15 @@ def take_2d_axis1_{{name}}_{{dest}}(ndarray[{{c_type_in}}, ndim=2] values,
{{if c_type_in == "uint8_t" and c_type_out == "object"}}
out[i, j] = True if values[i, idx] > 0 else False
{{else}}
{{if c_type_in == "object"}} # GH46848
if values[i][idx] is None:
out[i, j] = None
Copy link
Member

@jbrockmendel jbrockmendel May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why this makes a difference? this looks deeply weird

@roberthdevries
Copy link
Contributor Author

@roberthdevries roberthdevries commented May 26, 2022

The segfault is caused on the original line 186.
The code generated by cython tries to increment the refcount of the object on the rhs of the assignment expression. However that is a NULL pointer.
Question is where that NULL value comes from.
The extra check for None prevents this call to increase the refcount.

@jreback
Copy link
Contributor

@jreback jreback commented May 27, 2022

can you add a test which replicates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting Regression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants