New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH make CalibratedClassifierCV accept on fit_params #18170
ENH make CalibratedClassifierCV accept on fit_params #18170
Conversation
Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.
After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.
* remove unnecessary check for empty dict * rename variable name to canonical X from T
Use _safe_indexing to index into fit_params in case they are not numpy arrays.
…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params
Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.
After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.
…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params
@cmarmo Thank you for reminding me, I forgot about the PR. As mentioned above, I recreated the PR because of some rebasing issues. The original PR already had approval by ogrisel. Apart from addressing the last reviewer comment by jnothman, the main work here was to integrate my changes with a recent PR that introduced parallel fitting in |
An unrelated proposal from my side: We could add a scikit-learn/sklearn/calibration.py Lines 244 to 246 in 41d648e
Using this early return, we can remove the |
I think doing this style change in another PR would be better, so we do not increase the diff of this PR. On the proposal, I tend to prefer doing early returns for the same reason you stated. |
@thomasjpfan I updated the |
sklearn/calibration.py
Outdated
estimator_name = type(base_estimator).__name__ | ||
warnings.warn("Since %s does not support sample_weights, " | ||
"sample weights will only be used for the " | ||
"calibration itself." % estimator_name) | ||
else: | ||
sample_weight_base_estimator = sample_weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific case, is sample_weight_base_estimator
always equal to sample_weight
? If this is the case, I do not think we need a sample_weight_base_estimator
parameter in _fit_calibrated_classifer
.
sklearn/calibration.py
Outdated
for key, val in fit_params.items(): | ||
check_consistent_length(y, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the check_consistent_length
all at once in fit
? This way the parallel calls do not need to validate anymore.
* don't use sample_weight_base_estimator, since it's the same as sample_weight * perform check_consistent_length only once, inside fit In addition: * added a test for check_consistent_length
@thomasjpfan Very good comments, I addressed the issues you raised.
Indeed. I guess it would be possible to have different values if we allowed calibrated_classifier_cv.fit(X, y, sample_weight=sample_weight, base_estimator__sample_weight=other_sample_weight) But that is probably overkill. |
sklearn/calibration.py
Outdated
else: | ||
base_estimator_uses_sw = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing this if statement and defining base_estimator_supports_sw
above as follows would lower the complexity:
base_estimator_uses_sw = sample_weight is not None and base_estimator_supports_sw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Clearer way to define base_estimator_uses_sw
Make sure that sample_weight is correctly passed to the base_estimator. For this CheckingClassifier had to be extended to include a check for sample_weight.
…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params
@thomasjpfan I addressed your comment. On top of that, I added two more tests, to check if |
Although it would be awesome to be on the contributors list for 1.0, there are too many merge conflicts and open questions for a quick resolution. There are probably more pressing changes than this :) |
I will review this today (and probably push something to solve at least the merge conflicts) |
Thanks @glemaitre. I can also take a look at the merge conflicts, but that wouldn't be today anymore (more likely sometimes before end of week). |
When is this expected to be ready? I am currently having a problem when trying to pass eval_set to a CatboostClassifier wrapped into CalibratedClassifierCV. |
A few suggestions; apart from those, the core of this PR LGTM.
sklearn/calibration.py
Outdated
for sample_aligned_params in fit_params.values(): | ||
check_consistent_length(y, sample_aligned_params) | ||
|
||
self.calibrated_classifiers_ = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is self.calibrated_classifiers_
defined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an error from solving the merge conflict. I will have a look.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
…com:BenjaminBossan/scikit-learn into pr/BenjaminBossan/18170
LGTM. Thank you for initiating this work, @BenjaminBossan; thank you, @glemaitre, for pursuing it.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
I have similar issue, didn't work for me. Similarly, it worked without CalibratedClassifierCV but not with it. |
@glemaitre @thomasjpfan @jjerphan @BenjaminBossan For example, a
Some grid search with Then I try to add
And do fitting which errors: with cannot understand why it doesn't properly pass
Might I use not proper sklearn version or passing wrong name (not sure issue is here, I tried various stuff)? |
|
@glemaitre thanks for letting me know. Is there approximate time such as end of March, mid-April, end April? Getting these errors once I do import of CalibratedClassifierCV
I have no idea why the Any clue? |
@glemaitre what can be the reason that causes failing certain tests with TypeError
Totally 5 failed |
Reference Issues/PRs
Re-creating PR #15218, the reason being that I probably messed up the rebase
there, leading to 865 files being changed. Here is a clean PR.
Partly addresses #12384
What does this implement/fix? Explain your changes.
This PR makes it possible to pass
fit_params
to the fit method of theCalibratedClassifierCV
, which are then routed to the underlying baseestimator.
Note: I implemented
predict_proba
onCheckingClassifier
for thenew unit test to work.