ENH make CalibratedClassifierCV accept on fit_params #18170

BenjaminBossan · 2020-08-16T12:48:11Z

Reference Issues/PRs

Re-creating PR #15218, the reason being that I probably messed up the rebase
there, leading to 865 files being changed. Here is a clean PR.

Partly addresses #12384

What does this implement/fix? Explain your changes.

This PR makes it possible to pass fit_params to the fit method of the
CalibratedClassifierCV, which are then routed to the underlying base
estimator.

Note: I implemented predict_proba on CheckingClassifier for the
new unit test to work.

Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.

After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.

* remove unnecessary check for empty dict * rename variable name to canonical X from T

Use _safe_indexing to index into fit_params in case they are not numpy arrays.

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.

After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

BenjaminBossan · 2020-08-16T12:52:04Z

@cmarmo Thank you for reminding me, I forgot about the PR.

As mentioned above, I recreated the PR because of some rebasing issues. The original PR already had approval by ogrisel.

Apart from addressing the last reviewer comment by jnothman, the main work here was to integrate my changes with a recent PR that introduced parallel fitting in CalibratedClassifierCV via joblib, which necessitated a considerable rewrite of my changes.

BenjaminBossan · 2020-08-16T12:59:42Z

An unrelated proposal from my side: We could add a return self after line 244:

scikit-learn/sklearn/calibration.py

Lines 244 to 246 in 41d648e

    
               self.calibrated_classifiers_.append(calibrated_classifier) 
        
           else: 
        
               X, y = self._validate_data(

Using this early return, we can remove the else from line 245 and save one level of indentation spanning 50 LOC starting from line 246. If this is in line with the sklearn coding style, I could add that change on top of this PR.

thomasjpfan · 2020-08-16T14:06:04Z

Using this early return, we can remove the else from line 245 and save one level of indentation spanning 50 LOC starting from line 246. If this is in line with the sklearn coding style, I could add that change on top of this PR.

I think doing this style change in another PR would be better, so we do not increase the diff of this PR. On the proposal, I tend to prefer doing early returns for the same reason you stated.

BenjaminBossan · 2020-08-16T14:43:44Z

@thomasjpfan I updated the whats_new. While doing that, I also removed the sklearn.calibrator section, which probably was intended to be sklearn.calibration

thomasjpfan · 2020-08-17T00:05:58Z

sklearn/calibration.py

                    estimator_name = type(base_estimator).__name__
                    warnings.warn("Since %s does not support sample_weights, "
                                  "sample weights will only be used for the "
                                  "calibration itself." % estimator_name)
+                else:
+                    sample_weight_base_estimator = sample_weight


In this specific case, is sample_weight_base_estimator always equal to sample_weight? If this is the case, I do not think we need a sample_weight_base_estimator parameter in _fit_calibrated_classifer.

thomasjpfan · 2020-08-17T00:05:58Z

sklearn/calibration.py

+    for key, val in fit_params.items():
+        check_consistent_length(y, val)


Can we do the check_consistent_length all at once in fit? This way the parallel calls do not need to validate anymore.

* don't use sample_weight_base_estimator, since it's the same as sample_weight * perform check_consistent_length only once, inside fit In addition: * added a test for check_consistent_length

BenjaminBossan · 2020-08-18T18:58:37Z

@thomasjpfan Very good comments, I addressed the issues you raised.

In this specific case, is sample_weight_base_estimator always equal to sample_weight?

Indeed. I guess it would be possible to have different values if we allowed

calibrated_classifier_cv.fit(X, y, sample_weight=sample_weight, base_estimator__sample_weight=other_sample_weight)

But that is probably overkill.

thomasjpfan

Otherwise LGTM

thomasjpfan · 2020-08-23T15:03:45Z

sklearn/calibration.py

+                else:
+                    base_estimator_uses_sw = True


I think removing this if statement and defining base_estimator_supports_sw above as follows would lower the complexity:

base_estimator_uses_sw = sample_weight is not None and base_estimator_supports_sw

* Clearer way to define base_estimator_uses_sw

Make sure that sample_weight is correctly passed to the base_estimator. For this CheckingClassifier had to be extended to include a check for sample_weight.

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

BenjaminBossan · 2020-08-23T21:13:42Z

@thomasjpfan I addressed your comment.

On top of that, I added two more tests, to check if sample_weight is correctly routed to the base estimator. This in turn required me to extend CheckingClassifier to add an option to check for sample_weight.

BenjaminBossan · 2021-08-23T20:29:10Z

Although it would be awesome to be on the contributors list for 1.0, there are too many merge conflicts and open questions for a quick resolution. There are probably more pressing changes than this :)

glemaitre · 2021-08-30T08:24:28Z

I will review this today (and probably push something to solve at least the merge conflicts)

BenjaminBossan · 2021-08-30T09:03:53Z

Thanks @glemaitre. I can also take a look at the merge conflicts, but that wouldn't be today anymore (more likely sometimes before end of week).

fingoldo · 2021-12-13T22:29:41Z

When is this expected to be ready? I am currently having a problem when trying to pass eval_set to a CatboostClassifier wrapped into CalibratedClassifierCV.
Without CalibratedClassifierCV, pipe.fit(X_train,y_train,est__eval_set=(X_test_val,y_test_val),est__plot=True) works fine.
With it, both est__eval_set and est__base_estimator__eval_set throw "TypeError: fit() got an unexpected keyword argument." I believe this pr will solve my problem... I can see a big work has been done already, please let's not lose all that efforts guys and get it fixed and approved )

jjerphan

A few suggestions; apart from those, the core of this PR LGTM.

doc/whats_new/v1.1.rst

sklearn/calibration.py

jjerphan · 2021-12-16T19:33:52Z

sklearn/calibration.py

+        for sample_aligned_params in fit_params.values():
+            check_consistent_length(y, sample_aligned_params)
+
+        self.calibrated_classifiers_ = []


Why is self.calibrated_classifiers_ defined here?

Maybe an error from solving the merge conflict. I will have a look.

sklearn/calibration.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

…com:BenjaminBossan/scikit-learn into pr/BenjaminBossan/18170

glemaitre

I think that I am fine with the current implementation. @jjerphan do you want to have new look.

jjerphan

LGTM. Thank you for initiating this work, @BenjaminBossan; thank you, @glemaitre, for pursuing it.

sklearn/tests/test_calibration.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

kabartay · 2022-03-10T18:51:51Z

When is this expected to be ready? I am currently having a problem when trying to pass eval_set to a CatboostClassifier wrapped into CalibratedClassifierCV. Without CalibratedClassifierCV, pipe.fit(X_train,y_train,est__eval_set=(X_test_val,y_test_val),est__plot=True) works fine. With it, both est__eval_set and est__base_estimator__eval_set throw "TypeError: fit() got an unexpected keyword argument." I believe this pr will solve my problem... I can see a big work has been done already, please let's not lose all that efforts guys and get it fixed and approved )

I have similar issue, didn't work for me. Similarly, it worked without CalibratedClassifierCV but not with it.

kabartay · 2022-03-10T19:23:09Z

@glemaitre @thomasjpfan @jjerphan @BenjaminBossan
I need some help with proper usage of fit_params in Pipeline and CalibratedClassifierCV (my sklearn version is v0.24.2)

For example, a Pipeline with simple preprocessing and CalibratedClassifierCV with some ANN as a base_estimator

model_preprocessing = ("preprocessing", ColumnTransformer([
        ('categorical', 'passthrough', categoricals),]), categoricals),  
        ('numerical', Pipeline([("scaler", RobustScaler()),("imputer", SimpleImputer())]), numericals)),
        ('drop', 'drop', drops),
    ], remainder='drop'))
calibrated_classifier = ("calibrated_classifier", CalibratedClassifierCV(
    base_estimator=model_class(**model_parameters), method='isotonic', cv=5))
pipeline = Pipeline([model_preprocessing, calibrated_classifier])

Some grid search with GridSearchCV
model = GridSearchCV(estimator=pipeline,param_grid=param_grid,cv=5, scoring='accuracy',refit=False, verbose=2)

Then I try to add fit_params, for example, epochs, patiencem etc.

fit_params = {}
fit_params['calibrated_classifier__base_estimator__epochs'] = 100
fit_params['calibrated_classifier__base_estimator__patience'] = 10

And do fitting which errors:
model.fit(x_train, y_train, **fit_params)

with TypeError: fit() got an unexpected keyword argument 'epochs':
File "/home/utilisateur/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f return f(*args, **kwargs)
File "/home/utilisateur/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py", line 880, in fit self.best_estimator_.fit(X, y, **fit_params)
File "/home/utilisateur/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py", line 346, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step)

cannot understand why it doesn't properly pass fit_params to underlying estimator
without CalibratedClassifierCV I run properly, for example if use this kinda pipeline

model_preprocessing = ("preprocessing", ColumnTransformer([
        ('categorical', 'passthrough', categoricals),]), categoricals),
        ('numerical', Pipeline([("scaler", RobustScaler()),("imputer", SimpleImputer())]), numericals)),
        ('drop', 'drop', drops),
    ], remainder='drop'))
basic_classifier = ("basic_classifier", model_class(**model_parameters))
pipeline = Pipeline([model_preprocessing,basic_classifier])

Might I use not proper sklearn version or passing wrong name (not sure issue is here, I tried various stuff)?
Thanks for help.

glemaitre · 2022-03-10T19:25:23Z

fit_params will only be supported in scikit-learn 1.1 has not been released yet and will be released in the coming month.

kabartay · 2022-03-11T14:07:21Z

fit_params will only be supported in scikit-learn 1.1 has not been released yet and will be released in the coming month.

@glemaitre thanks for letting me know. Is there approximate time such as end of March, mid-April, end April?
Now trying to test v1.1.dev0 just cloning, building and trying to use.

Getting these errors once I do import of CalibratedClassifierCV

Traceback (most recent call last):
  File "main/__main__.py", line 22, in <module>
    from main.train_nn_pipe_dev_cc import train_nn_pipe_dev_cc_model
  File "/home/main/pipe_dev.py", line 45, in <module>
    from scikit.sklearn.calibration import CalibratedClassifierCV
  File "/home/main/scikit/sklearn/calibration.py", line 48, in <module>
    from .svm import LinearSVC
  File "/home/main/scikit/sklearn/svm/__init__.py", line 13, in <module>
    from ._classes import SVC, NuSVC, SVR, NuSVR, OneClassSVM, LinearSVC, LinearSVR
  File "/home/main/scikit/sklearn/svm/_classes.py", line 6, in <module>
    from ..linear_model._base import LinearClassifierMixin, SparseCoefMixin, LinearModel
  File "/home/main/scikit/sklearn/linear_model/__init__.py", line 11, in <module>
    from ._least_angle import (
  File "/home/main/scikit/sklearn/linear_model/_least_angle.py", line 28, in <module>
    from ..model_selection import check_cv
  File "/home/main/scikit/sklearn/model_selection/__init__.py", line 23, in <module>
    from ._validation import cross_val_score
  File "/home/main/scikit/sklearn/model_selection/_validation.py", line 31, in <module>
    from ..metrics import check_scoring
  File "/home/main/scikit/sklearn/metrics/__init__.py", line 41, in <module>
    from . import cluster
  File "/home/main/scikit/sklearn/metrics/cluster/__init__.py", line 22, in <module>
    from ._unsupervised import silhouette_samples
  File "/home/main/scikit/sklearn/metrics/cluster/_unsupervised.py", line 16, in <module>
    from ..pairwise import pairwise_distances_chunked
  File "/home/main/scikit/sklearn/metrics/pairwise.py", line 35, in <module>
    from ._pairwise_distances_reduction import PairwiseDistancesArgKmin
  File "sklearn/metrics/_pairwise_distances_reduction.pyx", line 1, in init sklearn.metrics._pairwise_distances_reduction
ModuleNotFoundError: No module named 'sklearn.utils._heap'

I have no idea why the ModuleNotFoundError: No module named 'sklearn.utils._heap' is raising, it seems relative paths are correct in sklearn.metrics._pairwise_distances_reduction

Any clue?

kabartay · 2022-03-11T14:37:17Z

@glemaitre what can be the reason that causes failing certain tests with TypeError
For example,

test_linear_regression_positive
- sklearn/linear_model/tests/test_base.py:276:
  TypeError: _nnls.nnls() missing required argument 'n' (pos 3)
- sklearn/linear_model/_base.py:687: TypeError
test_linear_regression_positive_multiple_outcome
- sklearn/linear_model/tests/test_base.py:301:
  sklearn/linear_model/_base.py:690: in fit
  /home/.local/lib/python3.8/site-packages/joblib/parallel.py:1041: in __call__
  /home/.local/lib/python3.8/site-packages/joblib/parallel.py:859: in dispatch_one_batch
  /home/.local/lib/python3.8/site-packages/joblib/parallel.py:777: in _dispatch
  /home/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py:572: in __init__
  /home/.local/lib/python3.8/site-packages/joblib/parallel.py:262: in __call__
  TypeError: _nnls.nnls() missing required argument 'n' (pos 3) -> finally lead to this
- sklearn/utils/fixes.py:118: TypeError
test_linear_regression_positive_vs_nonpositive
- sklearn/linear_model/tests/test_base.py:315:
  TypeError: _nnls.nnls() missing required argument 'n' (pos 3)
- sklearn/linear_model/_base.py:687: TypeError
test_linear_regression_positive_vs_nonpositive_when_positive
- sklearn/linear_model/tests/test_base.py:331:
  TypeError: _nnls.nnls() missing required argument 'n' (pos 3)
- sklearn/linear_model/_base.py:687: TypeError
test_estimatorclasses_positive_constraint
- sklearn/linear_model/tests/test_least_angle.py:609:
- sklearn/linear_model/_least_angle.py:2237: in fit
- sklearn/linear_model/_least_angle.py:2286: in _estimate_noise_variance
  TypeError: _nnls.nnls() missing required argument 'n' (pos 3) -> finally lead to this
- sklearn/linear_model/_base.py:687: TypeError

Totally 5 failed

BenjaminBossan and others added 13 commits Oct 12, 2019

CalibratedClassifierCV passes on fit_params

b6291ee

Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.

Add test for CheckingClassifier

faa757d

After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.

Remove unnecessary test code

bd4704e

Rename function used in test

18ebeba

Address reviewer comments

5fb3d8e

* remove unnecessary check for empty dict * rename variable name to canonical X from T

Address reviewer comments

576023f

Use _safe_indexing to index into fit_params in case they are not numpy arrays.

Merge branch 'calibratedclassifiercv-passes-on-fit-params' of https:/…

16ef36d

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

CalibratedClassifierCV passes on fit_params

08ce808

Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.

Add test for CheckingClassifier

2d7c381

After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.

Remove unnecessary test code

7684740

Rename function used in test

fdd1af2

Merge branch 'calibratedclassifiercv-passes-on-fit-params' of https:/…

9e68abf

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

Fix linting error, line too long

a12f31c

Update whats_new

3d56d56

thomasjpfan reviewed Aug 17, 2020

View changes

BenjaminBossan and others added 2 commits Aug 18, 2020

Address reviewer comments:

c0fc62f

* don't use sample_weight_base_estimator, since it's the same as sample_weight * perform check_consistent_length only once, inside fit In addition: * added a test for check_consistent_length

Merge branch 'master' into calibratedclassifiercv-passes-on-fit-params

55e1ed3

BenjaminBossan requested a review from thomasjpfan Aug 23, 2020

thomasjpfan approved these changes Aug 23, 2020

View changes

BenjaminBossan added 3 commits Aug 23, 2020

Address reviewer comment

0182b2f

* Clearer way to define base_estimator_uses_sw

Add a test that sample_weight is passed to the base_estimator

5943e43

Make sure that sample_weight is correctly passed to the base_estimator. For this CheckingClassifier had to be extended to include a check for sample_weight.

Merge branch 'calibratedclassifiercv-passes-on-fit-params' of https:/…

f52ec2a

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

Fix linter complaint

b872aa1

github-actions bot added the module:utils label Aug 23, 2020

adrinjalali removed this from the 1.0 milestone Sep 7, 2021

adrinjalali added this to the 1.1 milestone Sep 7, 2021

glemaitre added 7 commits Dec 16, 2021

black on future conflicted files

022f635

Merge remote-tracking branch 'origin/main' into pr/BenjaminBossan/18170

968bc0a

iter

facc918

doc

23676d0

DOC update changelog

f19dfa9

DOC update for numpydoc validation

348df18

iter

dd7ae86

jjerphan reviewed Dec 16, 2021

View changes

glemaitre and others added 4 commits Dec 16, 2021

iter

c1ebb3b

Apply suggestions from code review

9a9f9ba

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

iter

c91e650

Merge branch 'calibratedclassifiercv-passes-on-fit-params' of github.…

3013465

…com:BenjaminBossan/scikit-learn into pr/BenjaminBossan/18170

glemaitre changed the title ~~CalibratedClassifierCV passes on fit_params~~ ENH make CalibratedClassifierCV accept on fit_params Dec 16, 2021

glemaitre approved these changes Dec 16, 2021

View changes

jjerphan approved these changes Dec 17, 2021

View changes

sklearn/tests/test_calibration.py Outdated Show resolved Hide resolved

Update sklearn/tests/test_calibration.py

92ca7c9

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

jjerphan merged commit a3e0d4d into scikit-learn:main Dec 17, 2021
31 checks passed

thomasjpfan mentioned this pull request Feb 3, 2022

Passing fit parameters to base estimator in CalibratedClassifierCV #12384

Closed

kabartay mentioned this pull request Mar 11, 2022

TypeError: Expected sequence or array-like, got <class 'int'> #22768

Closed

scikit-learn / scikit-learn Public

ENH make CalibratedClassifierCV accept on fit_params #18170

ENH make CalibratedClassifierCV accept on fit_params #18170

BenjaminBossan commented Aug 16, 2020

BenjaminBossan commented Aug 16, 2020

BenjaminBossan commented Aug 16, 2020

thomasjpfan commented Aug 16, 2020

BenjaminBossan commented Aug 16, 2020

thomasjpfan Aug 17, 2020

thomasjpfan Aug 17, 2020

BenjaminBossan commented Aug 18, 2020

thomasjpfan left a comment

thomasjpfan Aug 23, 2020

BenjaminBossan Aug 23, 2020

BenjaminBossan commented Aug 23, 2020

BenjaminBossan commented Aug 23, 2021

glemaitre commented Aug 30, 2021

BenjaminBossan commented Aug 30, 2021

fingoldo commented Dec 13, 2021 •

edited

jjerphan left a comment

jjerphan Dec 16, 2021

glemaitre Dec 16, 2021

glemaitre left a comment

jjerphan left a comment

kabartay commented Mar 10, 2022

kabartay commented Mar 10, 2022

glemaitre commented Mar 10, 2022 •

edited

kabartay commented Mar 11, 2022 •

edited

kabartay commented Mar 11, 2022

		for key, val in fit_params.items():
		check_consistent_length(y, val)

scikit-learn / scikit-learn Public

ENH make CalibratedClassifierCV accept on fit_params #18170

ENH make CalibratedClassifierCV accept on fit_params #18170

Conversation

BenjaminBossan commented Aug 16, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

BenjaminBossan commented Aug 16, 2020

BenjaminBossan commented Aug 16, 2020

thomasjpfan commented Aug 16, 2020

BenjaminBossan commented Aug 16, 2020

thomasjpfan Aug 17, 2020

Choose a reason for hiding this comment

thomasjpfan Aug 17, 2020

Choose a reason for hiding this comment

BenjaminBossan commented Aug 18, 2020

thomasjpfan left a comment

thomasjpfan Aug 23, 2020

Choose a reason for hiding this comment

BenjaminBossan Aug 23, 2020

Choose a reason for hiding this comment

BenjaminBossan commented Aug 23, 2020

BenjaminBossan commented Aug 23, 2021

glemaitre commented Aug 30, 2021

BenjaminBossan commented Aug 30, 2021

fingoldo commented Dec 13, 2021 • edited

jjerphan left a comment

jjerphan Dec 16, 2021

Choose a reason for hiding this comment

glemaitre Dec 16, 2021

Choose a reason for hiding this comment

glemaitre left a comment

jjerphan left a comment

kabartay commented Mar 10, 2022

kabartay commented Mar 10, 2022

glemaitre commented Mar 10, 2022 • edited

kabartay commented Mar 11, 2022 • edited

kabartay commented Mar 11, 2022

fingoldo commented Dec 13, 2021 •

edited

glemaitre commented Mar 10, 2022 •

edited

kabartay commented Mar 11, 2022 •

edited