Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: py 3.10 build failing #41935

Open
jbrockmendel opened this issue Jun 10, 2021 · 19 comments
Open

CI: py 3.10 build failing #41935

jbrockmendel opened this issue Jun 10, 2021 · 19 comments
Labels
Blocked Build Python 3.10 Upstream issue

Comments

@jbrockmendel
Copy link

@jbrockmendel jbrockmendel commented Jun 10, 2021

@seberg this build is using numpy 1.22dev, looks like a bunch of the failures are raising in np.iinfo(np.int64).max

    return np.iinfo(np.int64).max
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError("'iinfo' object has no attribute 'kind'") raised in repr()] iinfo object at 0x7f986b9aa830>
int_type = <class 'numpy.int64'>

    def __init__(self, int_type):
        try:
            self.dtype = numeric.dtype(int_type)
        except TypeError:
>           self.dtype = numeric.dtype(type(int_type))
E           TypeError: 'numpy.dtype[bool_]' object is not callable

/opt/hostedtoolcache/Python/3.10.0-beta.2/x64/lib/python3.10/site-packages/numpy/core/getlimits.py:518: TypeError
@jbrockmendel jbrockmendel added Bug Needs Triage labels Jun 10, 2021
@seberg
Copy link

@seberg seberg commented Jun 11, 2021

Confusing, nothing feels like it quite adds up? The first branch should not fail, then the second branch fails but works with a DType (before the dtype object is created?). And then mentions a boolean DType...

The only way I can make sense of the error (and reproduce it), is by "monkeypatching" numeric.dtype first:

np.core.numeric.dtype = np.dtype("?")

Is there anything else to this? I just tried on a fresh compile of the 3.10 python branch (3.10b2+) and the numpy master branch and np.iinfo(np.int64).max runs fine.

@lithomas1 lithomas1 added CI and removed Needs Triage labels Jun 11, 2021
@seberg
Copy link

@seberg seberg commented Jun 11, 2021

Not sure how to get a py3.10 running with pandas (I got numpy). If you got that running, I would suggest running pytest with --pdb and looking into what np.core.numeric.dtype (or even np.core.getlimits.numeric.dtype) actually is (it should just point to np.dtype, which itself is just np.core._multiarray_umath.dtype)

Something is weird, but if np.dtype itself was the culprit, I would expect things to fail much much worse.

@lithomas1
Copy link

@lithomas1 lithomas1 commented Oct 7, 2021

@jbrockmendel I think I've figured it out. So it turns out that sys.setprofile, which is called in our tests for read_csv, is somehow changing the value of np.core.numeric.dtype. In #43910, where I skip this test, the Python 3.10 tests all pass.

One explanation might be that we are not resetting sys.setprofile back correctly, but the sys.setprofile(None) call should be the correct way to reset it back.

I will continue looking into this.

cc @mzeitlin11

@mzeitlin11
Copy link

@mzeitlin11 mzeitlin11 commented Oct 7, 2021

Thanks for looking into this @lithomas1. Only other possibility I can think of beyond what you mention is there's an actual bug here in locals() being manipulated - maybe the solution in #41105 should be more aggressive (like a deeper copy, or just not using locals() at all)

@jbrockmendel
Copy link
Author

@jbrockmendel jbrockmendel commented Oct 8, 2021

@seberg any plausible way sys.setprofile would affect np.iinfo(np.int64).max?

@seberg
Copy link

@seberg seberg commented Oct 8, 2021

I certainly don't see it. Unless there is something else that modifies numpy global state for any reason? Would be interesting to know, but I don't have an idea for a lead.

@lithomas1 lithomas1 removed this from the 1.3.4 milestone Oct 13, 2021
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Oct 13, 2021
@lithomas1 lithomas1 removed their assignment Nov 1, 2021
@jbrockmendel
Copy link
Author

@jbrockmendel jbrockmendel commented Dec 12, 2021

@lithomas1 did you make any progress figuring this out? Looks like some of the strict=False xfails have been removed.

@jbrockmendel jbrockmendel added the Closing Candidate label Dec 12, 2021
@lithomas1 lithomas1 removed the Closing Candidate label Dec 12, 2021
@lithomas1
Copy link

@lithomas1 lithomas1 commented Dec 12, 2021

Test is still skipped. Didn't have time to look into it more, but we should probably fix for Python 3.11 at least.

@pytest.mark.skipif(
PY310,
reason="GH41935 This test is leaking only on Python 3.10,"
"causing other tests to fail with a cryptic error.",
)
@pytest.mark.parametrize("read_func", ["read_csv", "read_table"])
def test_read_csv_and_table_sys_setprofile(all_parsers, read_func):
# GH#41069
parser = all_parsers
data = "a b\n0 1"
sys.setprofile(lambda *a, **k: None)
result = getattr(parser, read_func)(StringIO(data))
sys.setprofile(None)
expected = DataFrame({"a b": ["0 1"]})
tm.assert_frame_equal(result, expected)

@skuam
Copy link

@skuam skuam commented Dec 17, 2021

For me, it fails in this line of code:

df = pd.DataFrame([1,2,3,4,5]) 
df.plot()

@tadeu
Copy link

@tadeu tadeu commented Jan 20, 2022

I've run into this problem while trying to debug (in PyCharm) some code that uses pandas 1.3.5, and I was able to create a minimal reproducible example:

import sys
import numpy as np
import pandas as pd

from numpy.core import numeric

def trace(frame, event, arg):
    return trace


sys.settrace(trace)  # This call isn't necessary when debugging.

arrays = [np.array([1, 2]), np.array([3, 4])]
index = pd.MultiIndex.from_arrays(arrays, names=["iA", "iB"])

dtype_class = numeric.dtype
print(f"Before DataFrame:\n  {numeric.dtype=}\n  {type(numeric.dtype)=}")

a = pd.DataFrame(
    data={"C1": np.array([10.0, 20.0]), "C2": np.array([30.0, 40.0])},
    index=index,
)

# This import fails:
# import scipy.linalg.lapack

# But this check is simpler:
print(f"After DataFrame:\n  {numeric.dtype=}\n  {type(numeric.dtype)=}")
assert numeric.dtype is dtype_class

Note that pandas is changing the value of numpy.core.numeric.dtype, which originally is a class:

Before DataFrame:
  numeric.dtype=<class 'numpy.dtype'>
  type(numeric.dtype)=<class 'numpy._DTypeMeta'>
After DataFrame:
  numeric.dtype=dtype('bool')
  type(numeric.dtype)=<class 'numpy.dtype[bool_]'>

If we comment out sys.settrace(trace) and debug the code, the output is a little bit different:

Before DataFrame:
  numeric.dtype=<class 'numpy.dtype'>
  type(numeric.dtype)=<class 'numpy._DTypeMeta'>
After DataFrame:
  numeric.dtype=None
  type(numeric.dtype)=<class 'NoneType'>

If we uncomment # import scipy.linalg.lapack, the output is a little bit more complex (the first error I got, and similar to the error in the original report above, and also reported in this question in StackOverflow):

Traceback (most recent call last):
  File "C:\dev\bug\.venv\lib\site-packages\numpy\core\getlimits.py", line 649, in __init__
    self.dtype = numeric.dtype(int_type)
TypeError: 'NoneType' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\__init__.py", line 195, in <module>
    from .misc import *
  File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\misc.py", line 4, in <module>
    from .lapack import get_lapack_funcs
  File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\lapack.py", line 990, in <module>
    _int32_max = _np.iinfo(_np.int32).max
  File "C:\dev\bug\.venv\lib\site-packages\numpy\core\getlimits.py", line 651, in __init__
    self.dtype = numeric.dtype(type(int_type))
TypeError: 'NoneType' object is not callable

Using a custom trace function, I've pinpointed that the global dtype is changed right after this call: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/indexes/base.py#L6411

This call goes into Cython code and, looking at it, I've found this suspicious assignment that may be the cause (but I'm not sure, as there's also the weird issue that this only happens when there's a tracer or a profiler): https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L2655 (also another one here: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L1429).

(Also, not sure why would an assignment like that change a global in a numpy module, bug in Cython perhaps?)

@seberg
Copy link

@seberg seberg commented Jan 21, 2022

Using a custom trace function, I've pinpointed that the global dtype is changed right after this call: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/indexes/base.py#L6411

Great debugging there! Still utterly puzzling :). Just to note, I can reproduce the example in python3.10.1, but not python3.9.0 (Maybe we knew that long ago). Further, it does not matter whether I run python compiled for debugging. valgrind does not find anything (not that I would have expected that).

So we know that this is sensitive to python3.10 and has to do with tracing being active? We also know it is probably related to Cython. And I feel I have heard about tricky changes in Python 3.10 that affected cython? It feels like it is probably time to open either a python or cython issue about this?

@seberg
Copy link

@seberg seberg commented Jan 21, 2022

Aha, I think I have a lead... To cut it down more, this line is sufficient to trigger the issue:

pd._libs.lib.maybe_convert_objects(np.array([None], dtype=object))

And that should end up side-stepping almost all code in maybe_convert_objects.

Not feeling like getting pandas dev setup running right now, but there is one line here:

mask = np.full(n, False)

which is called from cython using:

  /* "pandas/_libs/lib.pyx":2441
 *     uints = np.empty(n, dtype='u8')
 *     bools = np.empty(n, dtype=np.uint8)
 *     mask = np.full(n, False)             # <<<<<<<<<<<<<<
 * 
 *     if convert_datetime:
 */
  __Pyx_GetModuleGlobalName(__pyx_t_6, __pyx_n_s_np); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2441, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_6);
  __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_6, __pyx_n_s_full); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 2441, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
  __pyx_t_6 = PyInt_FromSsize_t(__pyx_v_n); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2441, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_6);
  __pyx_t_2 = NULL;
  __pyx_t_8 = 0;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_5))) {
    __pyx_t_2 = PyMethod_GET_SELF(__pyx_t_5);
    if (likely(__pyx_t_2)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);
      __Pyx_INCREF(__pyx_t_2);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_5, function);
      __pyx_t_8 = 1;
    }
  }
  #if CYTHON_FAST_PYCALL
  if (PyFunction_Check(__pyx_t_5)) {
    PyObject *__pyx_temp[3] = {__pyx_t_2, __pyx_t_6, Py_False};
    __pyx_t_15 = __Pyx_PyFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
    __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_GOTREF(__pyx_t_15);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
  } else
  #endif
  #if CYTHON_FAST_PYCCALL
  if (__Pyx_PyFastCFunction_Check(__pyx_t_5)) {
    PyObject *__pyx_temp[3] = {__pyx_t_2, __pyx_t_6, Py_False};
    __pyx_t_15 = __Pyx_PyCFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
    __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_GOTREF(__pyx_t_15);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
  } else
  #endif
  {
    __pyx_t_1 = PyTuple_New(2+__pyx_t_8); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2441, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    if (__pyx_t_2) {
      __Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_2); __pyx_t_2 = NULL;
    }
    __Pyx_GIVEREF(__pyx_t_6);
    PyTuple_SET_ITEM(__pyx_t_1, 0+__pyx_t_8, __pyx_t_6);
    __Pyx_INCREF(Py_False);
    __Pyx_GIVEREF(Py_False);
    PyTuple_SET_ITEM(__pyx_t_1, 1+__pyx_t_8, Py_False);
    __pyx_t_6 = 0;
    __pyx_t_15 = __Pyx_PyObject_Call(__pyx_t_5, __pyx_t_1, NULL); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_15);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  }
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_v_mask = __pyx_t_15;
  __pyx_t_15 = 0;

Now, that may be nothing, but np.full lives in the numeric module. And it does use a dtype which is the boolean dtype we end up with here. Obviously, that also should not mess with the module scope, but at least the np.core.numeric module gets involved there.

EDIT: Continuing down the rabbit hole a bit. In fact the value is mutated by the time the trace function says that np.full is called (or by the time tracing reports it). No call before it seems to happen at all (np.empty, etc. are all C implemented though, so maybe that is why).

EDIT2: I opened a Python issue here: https://bugs.python.org/issue46451

@jbrockmendel
Copy link
Author

@jbrockmendel jbrockmendel commented Jan 30, 2022

Could the ctypedef class numpy.dtype [object PyArray_Descr] near the top of lib.pyx be a suspect? Seems like if a numpy dtype global is being affected, this would be worth a look. (It may no longer be necessary with newer numpy/cython)

@seberg
Copy link

@seberg seberg commented Jan 30, 2022

I honestly suspect now that it is a Python bug. The global gets changed during the call to np.full (as well as upon entering that call). My suspiscion is that Python optimized something around the frame object, and there is a bug together with tracing when you call into Python directly from C (as cython does).
(The modification of the module global coincides with the change of the function local with the same name dtype. But maybe I need to figure out the right words for the BPO report to catch attention of the right people ;))

@WarrenWeckesser
Copy link

@WarrenWeckesser WarrenWeckesser commented Jan 31, 2022

@seberg
Copy link

@seberg seberg commented Feb 1, 2022

Mark Shannon asked for a repro, and I had another look and it seems like Cython generates somewhat complicated stuff (PyEval...). So moved it to cython/cython#4609, on the plus side, there is really nothing fancy about it and you can trivially reproduce this without pandas/numpy and just cython. (I still have no idea if it is Cython or Python going wrong.)

@seberg
Copy link

@seberg seberg commented Feb 1, 2022

It looks a bit like I failed to regenerate with newer Cython. I think (I did not double check) that the Cython pre-release fixes the issue.

@arian-f
Copy link

@arian-f arian-f commented May 3, 2022

I keep running into this or a very similar issue, outside of CI: numpy.core.numeric.dtype gets set to something else, from numpy.dtype

building packages with newer cython seems to fix this issue, right? I'd like to check whether it also fixes my issue. If it doesn't I guess I have another issue. Are there builds around with the newer cython?

i'm observing this with python3.10 on windows 10,
numpy 1.22.3
pandas 1.4.2
seem to be up to date

@seberg
Copy link

@seberg seberg commented May 3, 2022

building packages with newer cython seems to fix this issue, right? I'd like to check whether it also fixes my issue. If it doesn't I guess I have another issue. Are there builds around with the newer cython?

The important part is whether tracing is enabled (i.e. typically a debugger or profiler is being used). In that case you will run into this issue. Check also cython/cython#4609

Basically, your options are to upgrade Cython (to the non-released version as of now), to use the Cython 3 alpha, or to use the correct compile time option to disable the faulty paths.

@lithomas1 lithomas1 removed this from the Contributions Welcome milestone May 6, 2022
@lithomas1 lithomas1 added this to the 1.4.3 milestone May 6, 2022
@lithomas1 lithomas1 added the Blocked label May 6, 2022
@lithomas1 lithomas1 removed this from the 1.4.3 milestone May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked Build Python 3.10 Upstream issue
Projects
None yet
Development

No branches or pull requests

9 participants