Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] BUG ensure object array are properly casted when dtype=object #16076

Merged
merged 32 commits into from Jan 15, 2020

Conversation

@alexshacked
Copy link
Contributor

@alexshacked alexshacked commented Jan 9, 2020

closes #16036

Fix a bug where calling np.array(..., dtype=object) will create a N-D array while algorithms are expecting a 1-D array with objects inside (similar to a list).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Thank you for the PR @alexshacked !

Loading

sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
Loading
@glemaitre glemaitre changed the title [MRG] Using dbscan with precomputed neighbors gives an error in 0.22.… [MRG] BUG ensure object array are properly casted when dtype=object Jan 9, 2020
Copy link
Contributor

@glemaitre glemaitre left a comment

Looks good. A couple of changes.

Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

Loading

sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
@jnothman
Copy link
Member

@jnothman jnothman commented Jan 9, 2020

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

Oh I see, I was seeking for np.array(..., dtype=object).
Then, I agree to move it either as it is now (I would rename it _to_object_array) or even in utils if we have something similar in other file. I will look at it.

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

So we need to call _to_object_array for sklearn.neighbors._base: l.945-948; l.952-953

NB: I searched for the patter [:] = and filter that it was preceded by the creation of a numpy object array.

Loading

sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
@jnothman
Copy link
Member

@jnothman jnothman commented Jan 9, 2020

So we need to call _to_object_array for sklearn.neighbors._base: l.945-948; l.952-953

I see similar here:

sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436:    tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437-    tuple_classes[:] = [(1,), (2,), (3,)]
--
sklearn/neighbors/_classification.py:541:            pred_labels = np.zeros(len(neigh_ind), dtype=object)
sklearn/neighbors/_classification.py-542-            pred_labels[:] = [_y[ind, k] for ind in neigh_ind]

but otherwise agree it's all in radius_neighbors

Loading

@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 9, 2020

@glemaitre change log is in v0.20.rst? I thought v0.23.rst

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

v0.23.rst

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

Ups my automatic answering is broken :)

Loading

@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 9, 2020

ok. v0.23 then. Thanks @glemaitre

Loading

alexshacked and others added 11 commits Jan 9, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Contributor

@glemaitre glemaitre left a comment

We will need to apply the _to_object_array function on the following line:

sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436:    tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437-    tuple_classes[:] = [(1,), (2,), (3,)]

I propose to add a docstring (as you did earlier) and move the _to_object_array function in sklearn/utils/__init__.py. Then, we can import it in neighbors and preprocessing.

We just need to add a small test in sklearn/utils/tests/test_utils.py to check the expected behavior:

@pytest.mark.parametrize(
    "sequence",
    [[np.array(1), np.array(2)], [[1, 2], [3, 4]]]
)
test_to_object_array(sequence):
    out = _to_object_array(sequence)
    assert isinstance(out, ndarray)
    assert out.dtype.kind == 'O'
    assert out.ndim == 1

Loading

doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
Loading
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
Loading
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
alexshacked and others added 2 commits Jan 10, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
alexshacked and others added 3 commits Jan 10, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 10, 2020

Hi @glemaitre. Moved function to_object_array() to sklearn.utils and changed the message in the change log of v0.23.rst

Loading

Copy link
Contributor

@glemaitre glemaitre left a comment

Apart of making the function private LGTM. @alexshacked you can accept my suggestion and this would be enough.

Loading

sklearn/utils/__init__.py Outdated Show resolved Hide resolved
Loading
sklearn/preprocessing/tests/test_label.py Outdated Show resolved Hide resolved
Loading
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
Loading
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 10, 2020

Sorry about this @glemaitre . I thought one underscore means private inside the class, not private inside the package. Will restore the underscore

Loading

alexshacked and others added 11 commits Jan 10, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre added this to the 0.22.2 milestone Jan 13, 2020
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 13, 2020

LGTM. @jnothman @thomasjpfan Could you have a look. I added the regression tag and tag it as a candidate for 0.22.2

Loading

TomDLT
TomDLT approved these changes Jan 13, 2020
Copy link
Member

@TomDLT TomDLT left a comment

LGTM

Loading

sklearn/utils/__init__.py Show resolved Hide resolved
Loading
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
Loading
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
Loading
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 13, 2020

Thanks for your comments @TomDLT. Will apply them all.

Loading

@TomDLT TomDLT merged commit c4ea377 into scikit-learn:master Jan 15, 2020
11 of 17 checks passed
Loading
@TomDLT
Copy link
Member

@TomDLT TomDLT commented Jan 15, 2020

Thanks @alexshacked !

Loading

ogrisel added a commit that referenced this issue Feb 28, 2020
* FIX ensure object array are properly casted when dtype=object (#16076)

* DOC Docstring example of classifier should import classifier (#16430)

* MNT Update nightly build URL and release staging config (#16435)

* BUG ensure that estimator_name is properly stored in the ROC display (#16500)

* BUG ensure that name is properly stored in the precision/recall display (#16505)

* ENH Perform KNN imputation without O(n^2) memory cost (#16397)

* bump scikit-learn version for binder

* bump version to 0.22.2

* MNT Skips failing SpectralCoclustering doctest (#16232)

* TST Updates test for deprecation in pandas.SparseArray (#16040)

* move 0.22.2 what's new entries (#16586)

* add 0.22.2 in the news of the web site frontpage

* skip test_ard_accuracy_on_easy_problem

Co-authored-by: alexshacked <al.shacked@gmail.com>
Co-authored-by: Oleksandr Pavlyk <oleksandr-pavlyk@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants