pandas-dev / pandas Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: py 3.10 build failing #41935
Comments
Confusing, nothing feels like it quite adds up? The first branch should not fail, then the second branch fails but works with a DType (before the dtype object is created?). And then mentions a boolean DType... The only way I can make sense of the error (and reproduce it), is by "monkeypatching" np.core.numeric.dtype = np.dtype("?") Is there anything else to this? I just tried on a fresh compile of the 3.10 python branch ( |
Not sure how to get a py3.10 running with pandas (I got numpy). If you got that running, I would suggest running pytest with Something is weird, but if |
@jbrockmendel I think I've figured it out. So it turns out that sys.setprofile, which is called in our tests for read_csv, is somehow changing the value of np.core.numeric.dtype. In #43910, where I skip this test, the Python 3.10 tests all pass. One explanation might be that we are not resetting sys.setprofile back correctly, but the sys.setprofile(None) call should be the correct way to reset it back. I will continue looking into this. cc @mzeitlin11 |
Thanks for looking into this @lithomas1. Only other possibility I can think of beyond what you mention is there's an actual bug here in |
@seberg any plausible way sys.setprofile would affect |
I certainly don't see it. Unless there is something else that modifies numpy global state for any reason? Would be interesting to know, but I don't have an idea for a lead. |
@lithomas1 did you make any progress figuring this out? Looks like some of the |
Test is still skipped. Didn't have time to look into it more, but we should probably fix for Python 3.11 at least. pandas/pandas/tests/io/parser/common/test_common_basic.py Lines 679 to 695 in 193ca73
|
For me, it fails in this line of code:
|
I've run into this problem while trying to debug (in PyCharm) some code that uses pandas 1.3.5, and I was able to create a minimal reproducible example: import sys
import numpy as np
import pandas as pd
from numpy.core import numeric
def trace(frame, event, arg):
return trace
sys.settrace(trace) # This call isn't necessary when debugging.
arrays = [np.array([1, 2]), np.array([3, 4])]
index = pd.MultiIndex.from_arrays(arrays, names=["iA", "iB"])
dtype_class = numeric.dtype
print(f"Before DataFrame:\n {numeric.dtype=}\n {type(numeric.dtype)=}")
a = pd.DataFrame(
data={"C1": np.array([10.0, 20.0]), "C2": np.array([30.0, 40.0])},
index=index,
)
# This import fails:
# import scipy.linalg.lapack
# But this check is simpler:
print(f"After DataFrame:\n {numeric.dtype=}\n {type(numeric.dtype)=}")
assert numeric.dtype is dtype_class Note that pandas is changing the value of
If we comment out
If we uncomment Traceback (most recent call last):
File "C:\dev\bug\.venv\lib\site-packages\numpy\core\getlimits.py", line 649, in __init__
self.dtype = numeric.dtype(int_type)
TypeError: 'NoneType' object is not callable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\__init__.py", line 195, in <module>
from .misc import *
File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\misc.py", line 4, in <module>
from .lapack import get_lapack_funcs
File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\lapack.py", line 990, in <module>
_int32_max = _np.iinfo(_np.int32).max
File "C:\dev\bug\.venv\lib\site-packages\numpy\core\getlimits.py", line 651, in __init__
self.dtype = numeric.dtype(type(int_type))
TypeError: 'NoneType' object is not callable Using a custom trace function, I've pinpointed that the global This call goes into Cython code and, looking at it, I've found this suspicious assignment that may be the cause (but I'm not sure, as there's also the weird issue that this only happens when there's a tracer or a profiler): https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L2655 (also another one here: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L1429). (Also, not sure why would an assignment like that change a global in a |
Great debugging there! Still utterly puzzling :). Just to note, I can reproduce the example in python3.10.1, but not python3.9.0 (Maybe we knew that long ago). Further, it does not matter whether I run python compiled for debugging. So we know that this is sensitive to python3.10 and has to do with tracing being active? We also know it is probably related to Cython. And I feel I have heard about tricky changes in Python 3.10 that affected cython? It feels like it is probably time to open either a python or cython issue about this? |
Aha, I think I have a lead... To cut it down more, this line is sufficient to trigger the issue: pd._libs.lib.maybe_convert_objects(np.array([None], dtype=object)) And that should end up side-stepping almost all code in Not feeling like getting pandas dev setup running right now, but there is one line here:
which is called from cython using: /* "pandas/_libs/lib.pyx":2441
* uints = np.empty(n, dtype='u8')
* bools = np.empty(n, dtype=np.uint8)
* mask = np.full(n, False) # <<<<<<<<<<<<<<
*
* if convert_datetime:
*/
__Pyx_GetModuleGlobalName(__pyx_t_6, __pyx_n_s_np); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_6);
__pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_6, __pyx_n_s_full); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
__pyx_t_6 = PyInt_FromSsize_t(__pyx_v_n); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_6);
__pyx_t_2 = NULL;
__pyx_t_8 = 0;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_5))) {
__pyx_t_2 = PyMethod_GET_SELF(__pyx_t_5);
if (likely(__pyx_t_2)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);
__Pyx_INCREF(__pyx_t_2);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_5, function);
__pyx_t_8 = 1;
}
}
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_5)) {
PyObject *__pyx_temp[3] = {__pyx_t_2, __pyx_t_6, Py_False};
__pyx_t_15 = __Pyx_PyFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_GOTREF(__pyx_t_15);
__Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_5)) {
PyObject *__pyx_temp[3] = {__pyx_t_2, __pyx_t_6, Py_False};
__pyx_t_15 = __Pyx_PyCFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_GOTREF(__pyx_t_15);
__Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
} else
#endif
{
__pyx_t_1 = PyTuple_New(2+__pyx_t_8); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (__pyx_t_2) {
__Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_2); __pyx_t_2 = NULL;
}
__Pyx_GIVEREF(__pyx_t_6);
PyTuple_SET_ITEM(__pyx_t_1, 0+__pyx_t_8, __pyx_t_6);
__Pyx_INCREF(Py_False);
__Pyx_GIVEREF(Py_False);
PyTuple_SET_ITEM(__pyx_t_1, 1+__pyx_t_8, Py_False);
__pyx_t_6 = 0;
__pyx_t_15 = __Pyx_PyObject_Call(__pyx_t_5, __pyx_t_1, NULL); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_15);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
}
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_v_mask = __pyx_t_15;
__pyx_t_15 = 0; Now, that may be nothing, but EDIT: Continuing down the rabbit hole a bit. In fact the value is mutated by the time the trace function says that EDIT2: I opened a Python issue here: https://bugs.python.org/issue46451 |
Could the |
I honestly suspect now that it is a Python bug. The global gets changed during the call to |
FYI: Question on stackoverflow: In debug, using pandas before importing from Scipy generates Type error on import |
Mark Shannon asked for a repro, and I had another look and it seems like Cython generates somewhat complicated stuff ( |
It looks a bit like I failed to regenerate with newer Cython. I think (I did not double check) that the Cython pre-release fixes the issue. |
I keep running into this or a very similar issue, outside of CI: building packages with newer cython seems to fix this issue, right? I'd like to check whether it also fixes my issue. If it doesn't I guess I have another issue. Are there builds around with the newer cython? i'm observing this with python3.10 on windows 10, |
The important part is whether tracing is enabled (i.e. typically a debugger or profiler is being used). In that case you will run into this issue. Check also cython/cython#4609 Basically, your options are to upgrade Cython (to the non-released version as of now), to use the Cython 3 alpha, or to use the correct compile time option to disable the faulty paths. |
@seberg this build is using numpy 1.22dev, looks like a bunch of the failures are raising in
np.iinfo(np.int64).max
The text was updated successfully, but these errors were encountered: