Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add d2_tweedie_score #17036

Merged
merged 18 commits into from Sep 4, 2021
Merged

FEA add d2_tweedie_score #17036

merged 18 commits into from Sep 4, 2021

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Apr 25, 2020

Reference Issues/PRs

Resolves #15244.

What does this implement/fix? Explain your changes.

Add d2_tweedie_score metric.

Open questions

  • For mean_tweedie_deviance, there are also the 2 specialized versions mean_poisson_devianceand mean_gamma_deviance. Do we also want the corresponding d2_poisson_score and d2_gamma_score?

@lorentzenchr lorentzenchr changed the title [WIP] ENH add d2_tweedie_score [MRG] ENH add d2_tweedie_score Apr 25, 2020
@cmarmo
Copy link
Member

@cmarmo cmarmo commented Oct 30, 2020

Hi @lorentzenchr will you be able to synchronize with upstream? I'm wondering if this one could be milestoned for 0.24? Thanks!

@lorentzenchr
Copy link
Member Author

@lorentzenchr lorentzenchr commented Oct 30, 2020

@cmarmo Thanks for looking into this. The underlying issue #15244 needs a decision before this PR can be merged. But syncing with master doesn't hurt😏

@cmarmo
Copy link
Member

@cmarmo cmarmo commented Oct 30, 2020

The underlying issue #15244 needs a decision before this PR can be merged

Right! Sorry for the noise!

@cmarmo cmarmo added this to the 1.0 milestone Nov 3, 2020
Base automatically changed from master to main Jan 22, 2021
ogrisel
ogrisel approved these changes Jul 6, 2021
Copy link
Member

@ogrisel ogrisel left a comment

LGTM. Just a few comments/suggestions:

if _num_samples(y_pred) < 2:
msg = "D^2 score is not well-defined with less than two samples."
warnings.warn(msg, UndefinedMetricWarning)
return float('nan')
Copy link
Member

@ogrisel ogrisel Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably raise a ValueError instead.

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact same behavior as for r2_score. I think there is nothing wrong with a single sample. So I'll remove that warning.

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But single sample does not make sense, so ValueError it is.

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also chance from warning to raising ValueError for r2_score?

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising ValueError, however, produces failure in test_single_sample/check_single_sample which has the following comment:

# Non-regression test: scores should work with a single sample.
# This is important for leave-one-out cross validation.

So I revert to returning nan for both r2_score and d2_score.

Copy link
Member

@ogrisel ogrisel Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... Ok, but then we have a design issue for LOO, no?

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. r2_score is defined for one sample, it just doesn't make sense. Therefore, we'd like to either warn or raise ValueError. As to LOO, one should use MSE instead of R2. Same applies for D2 and tweedie deviances.

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also change test_single_sample and let both R2 and D2 raise ValueError. In either case, a note somewhere in the docstring or user guide concerning LOO would be good.

Copy link
Member Author

@lorentzenchr lorentzenchr Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what should we do:

  1. warn (and keep behavior of r2_score)
  2. raise ValueError (and change test_single_sample)

doc/modules/model_evaluation.rst Show resolved Hide resolved
sklearn/metrics/_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved
sklearn/metrics/_regression.py Show resolved Hide resolved
@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 22, 2021

pinging @lorentzenchr and @rth, this needs to pick up pace if you want it in 1.0

@lorentzenchr
Copy link
Member Author

@lorentzenchr lorentzenchr commented Aug 22, 2021

@adrinjalali Picking up pace... But it's not the most pressing PR to merge neither😏

@lorentzenchr
Copy link
Member Author

@lorentzenchr lorentzenchr commented Aug 23, 2021

Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment).
Further meaningful D^2 scores that could be handy are logloss, absolute error and pinball loss (and maybe more).

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Aug 23, 2021

Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment).

I am fine with implemented both options together (overcomplete APIs for the win!). But this can be done in later PRs. No need to put everything in this one.

Further meaningful D^2 scores that could be handy are logloss, absolute error and pinball loss (and maybe more).

I agree.

@lorentzenchr
Copy link
Member Author

@lorentzenchr lorentzenchr commented Sep 3, 2021

The current status is that we need to decide what R2 and D2 return in the case of one single sample point (n_samples < 2):

  1. Current behavior of R2: return float("nan")
  2. Or raise ValueError("R^2 score is not well-defined with less than two samples.")

@lorentzenchr lorentzenchr removed this from the 1.0 milestone Sep 3, 2021
@lorentzenchr lorentzenchr added this to the 1.1 milestone Sep 3, 2021
@rth
Copy link
Member

@rth rth commented Sep 3, 2021

+1 for option 1. I think if a metric is undefined it should return a nan. Rather than error and expect that the user code will somehow deal with it.

@lorentzenchr
Copy link
Member Author

@lorentzenchr lorentzenchr commented Sep 4, 2021

@rth I reverted to option 2: return float("nan") as is done in R2. Some more minor fixed. Good to merge, IMO, if there is a further +1 on the horizon.

rth
rth approved these changes Sep 4, 2021
Copy link
Member

@rth rth left a comment

Thanks @lorentzenchr ! LGTM. Please add a changelog entry, I guess for 1.0 and feel free to merge. There is one doctest failure that needs fixing though.

arbitrarily worse). A model that always predicts a constant value for the expected
value of y, disregarding the input features, would get a D^2 score of 0.0.
Copy link
Member

@rth rth Sep 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "the expected value of y"? Any constant value would work, or does it mean the mean on y_true?

Copy link
Member Author

@lorentzenchr lorentzenchr Sep 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it means a model that constantly uses the empirical mean of the observed y as prediction. I thought this one would be even clearer than the sentence for R2. I'll make it more precise.

@lorentzenchr lorentzenchr removed this from the 1.1 milestone Sep 4, 2021
@lorentzenchr lorentzenchr added this to the 1.0 milestone Sep 4, 2021
@lorentzenchr lorentzenchr changed the title [MRG] ENH add d2_tweedie_score FEA add d2_tweedie_score Sep 4, 2021
@lorentzenchr lorentzenchr merged commit 9061ff9 into scikit-learn:main Sep 4, 2021
33 checks passed
@lorentzenchr lorentzenchr deleted the d2_score branch Sep 4, 2021
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this issue Sep 5, 2021
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this issue Sep 5, 2021
adrinjalali pushed a commit that referenced this issue Sep 6, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this issue Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants