Fix segfault with printing dataframe #47097

roberthdevries · 2022-05-23T17:34:21Z

GH 46848

closes #46848
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

roberthdevries · 2022-05-23T17:35:03Z

I have a fix that prevents the segfault, but I am not sure how a regression test could be made as the failure depends on the width of the terminal screen. I deem this not a workable regression test, but I am at a loss how to trigger the issue otherwise.
As far as I can determine the crash only happens when printing a dataframe requires the use of dots to indicate missing values when the screen is not wide enough.
Apparently there is some logic to determine this situation, and then take another code path. This other code path then takes care of the injection of the dots, but also triggers the code path that produces the segfault.
Ideally we can make a regression test that provokes the code path that produces the segfault without the print logic.

GH 46848

rhshadrach · 2022-05-23T21:00:43Z

Does the segfault arise in one of the lines of the diff of this PR? Or is it only when this result is used later on?

jbrockmendel · 2022-05-26T14:54:20Z

pandas/_libs/algos_take_helper.pxi.in

@@ -183,8 +183,15 @@ def take_2d_axis1_{{name}}_{{dest}}(ndarray[{{c_type_in}}, ndim=2] values,
                {{if c_type_in == "uint8_t" and c_type_out == "object"}}
                out[i, j] = True if values[i, idx] > 0 else False
                {{else}}
+                {{if c_type_in == "object"}} # GH46848
+                if values[i][idx] is None:
+                    out[i, j] = None


any idea why this makes a difference? this looks deeply weird

roberthdevries · 2022-05-26T18:55:09Z

The segfault is caused on the original line 186.
The code generated by cython tries to increment the refcount of the object on the rhs of the assignment expression. However that is a NULL pointer.
Question is where that NULL value comes from.
The extra check for None prevents this call to increase the refcount.

jreback · 2022-05-27T12:55:39Z

can you add a test which replicates

roberthdevries marked this pull request as draft May 23, 2022

Fix segfault with printing dataframe

9981fb1

GH 46848

roberthdevries force-pushed the 46848-fix-segfault-when-printing-dataframe branch from eb6301c to 9981fb1 Compare May 23, 2022

simonjayhawkins added this to the 1.4.3 milestone May 26, 2022

simonjayhawkins added Regression Output-Formatting labels May 26, 2022

jbrockmendel reviewed May 26, 2022

View changes

jbrockmendel mentioned this pull request May 26, 2022

BUG: Segfault when printing dataframe #46848

Open

3 tasks

pandas-dev / pandas Public

Fix segfault with printing dataframe #47097

Fix segfault with printing dataframe #47097

roberthdevries commented May 23, 2022

roberthdevries commented May 23, 2022

rhshadrach commented May 23, 2022

jbrockmendel May 26, 2022

roberthdevries commented May 26, 2022

jreback commented May 27, 2022

pandas-dev / pandas Public

Fix segfault with printing dataframe #47097

Are you sure you want to change the base?

Fix segfault with printing dataframe #47097

Conversation

roberthdevries commented May 23, 2022

roberthdevries commented May 23, 2022

rhshadrach commented May 23, 2022

jbrockmendel May 26, 2022

Choose a reason for hiding this comment

roberthdevries commented May 26, 2022

jreback commented May 27, 2022