New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA add d2_tweedie_score #17036
FEA add d2_tweedie_score #17036
Conversation
Hi @lorentzenchr will you be able to synchronize with upstream? I'm wondering if this one could be milestoned for 0.24? Thanks! |
Right! Sorry for the noise! |
sklearn/metrics/_regression.py
Outdated
if _num_samples(y_pred) < 2: | ||
msg = "D^2 score is not well-defined with less than two samples." | ||
warnings.warn(msg, UndefinedMetricWarning) | ||
return float('nan') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably raise a ValueError
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the exact same behavior as for r2_score
. I think there is nothing wrong with a single sample. So I'll remove that warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But single sample does not make sense, so ValueError it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also chance from warning to raising ValueError for r2_score?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising ValueError
, however, produces failure in test_single_sample
/check_single_sample
which has the following comment:
# Non-regression test: scores should work with a single sample.
# This is important for leave-one-out cross validation.
So I revert to returning nan
for both r2_score
and d2_score
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum... Ok, but then we have a design issue for LOO, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. r2_score
is defined for one sample, it just doesn't make sense. Therefore, we'd like to either warn or raise ValueError. As to LOO, one should use MSE instead of R2. Same applies for D2 and tweedie deviances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also change test_single_sample
and let both R2 and D2 raise ValueError. In either case, a note somewhere in the docstring or user guide concerning LOO would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what should we do:
- warn (and keep behavior of
r2_score
) - raise
ValueError
(and changetest_single_sample
)
pinging @lorentzenchr and @rth, this needs to pick up pace if you want it in 1.0 |
@adrinjalali Picking up pace... But it's not the most pressing PR to merge neither |
Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment). |
I am fine with implemented both options together (overcomplete APIs for the win!). But this can be done in later PRs. No need to put everything in this one.
I agree. |
The current status is that we need to decide what R2 and D2 return in the case of one single sample point (
|
+1 for option 1. I think if a metric is undefined it should return a nan. Rather than error and expect that the user code will somehow deal with it. |
@rth I reverted to option 2: |
Thanks @lorentzenchr ! LGTM. Please add a changelog entry, I guess for 1.0 and feel free to merge. There is one doctest failure that needs fixing though.
sklearn/metrics/_regression.py
Outdated
arbitrarily worse). A model that always predicts a constant value for the expected | ||
value of y, disregarding the input features, would get a D^2 score of 0.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "the expected value of y"? Any constant value would work, or does it mean the mean on y_true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it means a model that constantly uses the empirical mean of the observed y as prediction. I thought this one would be even clearer than the sentence for R2. I'll make it more precise.
Reference Issues/PRs
Resolves #15244.
What does this implement/fix? Explain your changes.
Add
d2_tweedie_score
metric.Open questions
mean_tweedie_deviance
, there are also the 2 specialized versionsmean_poisson_deviance
andmean_gamma_deviance
. Do we also want the correspondingd2_poisson_score
andd2_gamma_score
?