Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] BUG ensure object array are properly casted when dtype=object #16076

Merged
merged 32 commits into from Jan 15, 2020

Conversation

@alexshacked
Copy link
Contributor

@alexshacked alexshacked commented Jan 9, 2020

closes #16036

Fix a bug where calling np.array(..., dtype=object) will create a N-D array while algorithms are expecting a 1-D array with objects inside (similar to a list).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Thank you for the PR @alexshacked !

sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
@glemaitre glemaitre changed the title [MRG] Using dbscan with precomputed neighbors gives an error in 0.22.… [MRG] BUG ensure object array are properly casted when dtype=object Jan 9, 2020
Copy link
Contributor

@glemaitre glemaitre left a comment

Looks good. A couple of changes.

Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/tests/test_neighbors.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
@jnothman
Copy link
Member

@jnothman jnothman commented Jan 9, 2020

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

Oh I see, I was seeking for np.array(..., dtype=object).
Then, I agree to move it either as it is now (I would rename it _to_object_array) or even in utils if we have something similar in other file. I will look at it.

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

So we need to call _to_object_array for sklearn.neighbors._base: l.945-948; l.952-953

NB: I searched for the patter [:] = and filter that it was preceded by the creation of a numpy object array.

sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
@jnothman
Copy link
Member

@jnothman jnothman commented Jan 9, 2020

So we need to call _to_object_array for sklearn.neighbors._base: l.945-948; l.952-953

I see similar here:

sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436:    tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437-    tuple_classes[:] = [(1,), (2,), (3,)]
--
sklearn/neighbors/_classification.py:541:            pred_labels = np.zeros(len(neigh_ind), dtype=object)
sklearn/neighbors/_classification.py-542-            pred_labels[:] = [_y[ind, k] for ind in neigh_ind]

but otherwise agree it's all in radius_neighbors

@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 9, 2020

@glemaitre change log is in v0.20.rst? I thought v0.23.rst

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

v0.23.rst

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 9, 2020

Ups my automatic answering is broken :)

@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 9, 2020

ok. v0.23 then. Thanks @glemaitre

alexshacked and others added 11 commits Jan 9, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Contributor

@glemaitre glemaitre left a comment

We will need to apply the _to_object_array function on the following line:

sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436:    tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437-    tuple_classes[:] = [(1,), (2,), (3,)]

I propose to add a docstring (as you did earlier) and move the _to_object_array function in sklearn/utils/__init__.py. Then, we can import it in neighbors and preprocessing.

We just need to add a small test in sklearn/utils/tests/test_utils.py to check the expected behavior:

@pytest.mark.parametrize(
    "sequence",
    [[np.array(1), np.array(2)], [[1, 2], [3, 4]]]
)
test_to_object_array(sequence):
    out = _to_object_array(sequence)
    assert isinstance(out, ndarray)
    assert out.dtype.kind == 'O'
    assert out.ndim == 1
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
alexshacked and others added 2 commits Jan 10, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 10, 2020

Hi @glemaitre. Moved function to_object_array() to sklearn.utils and changed the message in the change log of v0.23.rst

Copy link
Contributor

@glemaitre glemaitre left a comment

Apart of making the function private LGTM. @alexshacked you can accept my suggestion and this would be enough.

sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/preprocessing/tests/test_label.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
sklearn/neighbors/_base.py Outdated Show resolved Hide resolved
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 10, 2020

Sorry about this @glemaitre . I thought one underscore means private inside the class, not private inside the package. Will restore the underscore

alexshacked and others added 11 commits Jan 10, 2020
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre added this to the 0.22.2 milestone Jan 13, 2020
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 13, 2020

LGTM. @jnothman @thomasjpfan Could you have a look. I added the regression tag and tag it as a candidate for 0.22.2

@TomDLT
TomDLT approved these changes Jan 13, 2020
Copy link
Member

@TomDLT TomDLT left a comment

LGTM

sklearn/utils/__init__.py Show resolved Hide resolved
doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
@alexshacked
Copy link
Contributor Author

@alexshacked alexshacked commented Jan 13, 2020

Thanks for your comments @TomDLT. Will apply them all.

@TomDLT TomDLT merged commit c4ea377 into scikit-learn:master Jan 15, 2020
11 of 17 checks passed
11 of 17 checks passed
@azure-pipelines
scikit-learn.scikit-learn Build #20200115.1 failed
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit failed
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl failed
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_mkl_no_openmp) macOS pylatest_conda_mkl_no_openmp failed
Details
ci/circleci: doc CircleCI is running your tests
Details
ci/circleci: doc-min-dependencies CircleCI is running your tests
Details
@lgtm-com
LGTM analysis: C/C++ No code changes detected
Details
@lgtm-com
LGTM analysis: JavaScript No code changes detected
Details
@lgtm-com
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
@azure-pipelines
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@TomDLT
Copy link
Member

@TomDLT TomDLT commented Jan 15, 2020

Thanks @alexshacked !

thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Feb 28, 2020
ogrisel added a commit that referenced this pull request Feb 28, 2020
* FIX ensure object array are properly casted when dtype=object (#16076)

* DOC Docstring example of classifier should import classifier (#16430)

* MNT Update nightly build URL and release staging config (#16435)

* BUG ensure that estimator_name is properly stored in the ROC display (#16500)

* BUG ensure that name is properly stored in the precision/recall display (#16505)

* ENH Perform KNN imputation without O(n^2) memory cost (#16397)

* bump scikit-learn version for binder

* bump version to 0.22.2

* MNT Skips failing SpectralCoclustering doctest (#16232)

* TST Updates test for deprecation in pandas.SparseArray (#16040)

* move 0.22.2 what's new entries (#16586)

* add 0.22.2 in the news of the web site frontpage

* skip test_ard_accuracy_on_easy_problem

Co-authored-by: alexshacked <al.shacked@gmail.com>
Co-authored-by: Oleksandr Pavlyk <oleksandr-pavlyk@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants