FEA add d2_tweedie_score #17036

lorentzenchr · 2020-04-25T14:56:29Z

Reference Issues/PRs

Resolves #15244.

What does this implement/fix? Explain your changes.

Add d2_tweedie_score metric.

Open questions

For mean_tweedie_deviance, there are also the 2 specialized versions mean_poisson_devianceand mean_gamma_deviance. Do we also want the corresponding d2_poisson_score and d2_gamma_score?

cmarmo · 2020-10-30T11:54:43Z

Hi @lorentzenchr will you be able to synchronize with upstream? I'm wondering if this one could be milestoned for 0.24? Thanks!

lorentzenchr · 2020-10-30T12:50:19Z

@cmarmo Thanks for looking into this. The underlying issue #15244 needs a decision before this PR can be merged. But syncing with master doesn't hurt😏

cmarmo · 2020-10-30T12:59:54Z

The underlying issue #15244 needs a decision before this PR can be merged

Right! Sorry for the noise!

ogrisel

LGTM. Just a few comments/suggestions:

ogrisel · 2021-07-06T16:35:34Z

sklearn/metrics/_regression.py

+    if _num_samples(y_pred) < 2:
+        msg = "D^2 score is not well-defined with less than two samples."
+        warnings.warn(msg, UndefinedMetricWarning)
+        return float('nan')


I would probably raise a ValueError instead.

This is the exact same behavior as for r2_score. I think there is nothing wrong with a single sample. So I'll remove that warning.

But single sample does not make sense, so ValueError it is.

Should we also chance from warning to raising ValueError for r2_score?

Raising ValueError, however, produces failure in test_single_sample/check_single_sample which has the following comment:

# Non-regression test: scores should work with a single sample. # This is important for leave-one-out cross validation.

So I revert to returning nan for both r2_score and d2_score.

Hum... Ok, but then we have a design issue for LOO, no?

I'm not sure. r2_score is defined for one sample, it just doesn't make sense. Therefore, we'd like to either warn or raise ValueError. As to LOO, one should use MSE instead of R2. Same applies for D2 and tweedie deviances.

I can also change test_single_sample and let both R2 and D2 raise ValueError. In either case, a note somewhere in the docstring or user guide concerning LOO would be good.

So, what should we do:

warn (and keep behavior of r2_score)

raise ValueError (and change test_single_sample)

doc/modules/model_evaluation.rst

sklearn/metrics/_regression.py

sklearn/metrics/tests/test_regression.py

sklearn/metrics/_regression.py

adrinjalali · 2021-08-22T09:49:39Z

pinging @lorentzenchr and @rth, this needs to pick up pace if you want it in 1.0

lorentzenchr · 2021-08-22T17:41:31Z

@adrinjalali Picking up pace... But it's not the most pressing PR to merge neither😏

lorentzenchr · 2021-08-23T06:20:03Z

Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment).
Further meaningful D^2 scores that could be handy are logloss, absolute error and pinball loss (and maybe more).

ogrisel · 2021-08-23T16:44:47Z

Just to be sure: Accepting this PR means that we go for individual "d2 scores" for each (meaningful) metric, e.g. first option mentioned in #15244 (comment).

I am fine with implemented both options together (overcomplete APIs for the win!). But this can be done in later PRs. No need to put everything in this one.

Further meaningful D^2 scores that could be handy are logloss, absolute error and pinball loss (and maybe more).

I agree.

lorentzenchr · 2021-09-03T14:27:25Z

The current status is that we need to decide what R2 and D2 return in the case of one single sample point (n_samples < 2):

Current behavior of R2: return float("nan")
Or raise ValueError("R^2 score is not well-defined with less than two samples.")

rth · 2021-09-03T14:41:01Z

+1 for option 1. I think if a metric is undefined it should return a nan. Rather than error and expect that the user code will somehow deal with it.

This reverts commit 8e31778.

lorentzenchr · 2021-09-04T11:16:15Z

@rth I reverted to option 2: return float("nan") as is done in R2. Some more minor fixed. Good to merge, IMO, if there is a further +1 on the horizon.

rth

Thanks @lorentzenchr ! LGTM. Please add a changelog entry, I guess for 1.0 and feel free to merge. There is one doctest failure that needs fixing though.

rth · 2021-09-04T19:22:02Z

sklearn/metrics/_regression.py

+    arbitrarily worse). A model that always predicts a constant value for the expected
+    value of y, disregarding the input features, would get a D^2 score of 0.0.


What do you mean by "the expected value of y"? Any constant value would work, or does it mean the mean on y_true?

Yes, it means a model that constantly uses the empirical mean of the observed y as prediction. I thought this one would be even clearer than the sentence for R2. I'll make it more precise.

Christian Lorentzen added 3 commits Apr 25, 2020

ENH add d2_tweedie_score as a metric/scorer

1212182

TST add tests for d2_tweedie_score

e0a2376

DOC add d2_tweedie_score to user guide and API

980f89f

lorentzenchr force-pushed the d2_score branch from 8fc3b20 to 980f89f Compare Apr 25, 2020

github-actions bot added the module:metrics label Apr 25, 2020

DOC add d2_tweedie_score to user guide and API

008f51d

lorentzenchr changed the title ~~[WIP] ENH add d2_tweedie_score~~ [MRG] ENH add d2_tweedie_score Apr 25, 2020

cmarmo added the Waiting for Reviewer label Oct 30, 2020

Merge branch 'master' into d2_score

7cddd12

cmarmo added this to the 1.0 milestone Nov 3, 2020

Base automatically changed from master to main Jan 22, 2021

ogrisel approved these changes Jul 6, 2021

View changes

ogrisel removed the Waiting for Reviewer label Jul 6, 2021

adrinjalali mentioned this pull request Aug 22, 2021

Add d2_tweedie_score #15244

Closed

lorentzenchr added 6 commits Aug 22, 2021

address some review comments

8e31778

CLN nicer tests

f02bde4

Merge branch 'main' into d2_score

f06be17

DOC add versionadded

d1ef272

DOC improve docstring and user guide

cd8a8a8

DOC add code snippet with make_scorer

420eaf8

DOC add whatsnew entry

eb97867

lorentzenchr removed this from the 1.0 milestone Sep 3, 2021

lorentzenchr added this to the 1.1 milestone Sep 3, 2021

lorentzenchr added 4 commits Sep 4, 2021

Revert to return float(nan)

4b96092

This reverts commit 8e31778.

TST fix tests

300b610

MNT kwargs only

8bafea4

DOC fix statement about constant predictions

ebf3e59

rth approved these changes Sep 4, 2021

View changes

lorentzenchr added 2 commits Sep 4, 2021

DOC more precise statement of zero D2 score

454871c

DOC import d2_score in user guide

5b95be4

lorentzenchr removed this from the 1.1 milestone Sep 4, 2021

lorentzenchr added this to the 1.0 milestone Sep 4, 2021

lorentzenchr changed the title ~~[MRG] ENH add d2_tweedie_score~~ FEA add d2_tweedie_score Sep 4, 2021

lorentzenchr merged commit 9061ff9 into scikit-learn:main Sep 4, 2021
33 checks passed

lorentzenchr deleted the d2_score branch Sep 4, 2021

lorentzenchr mentioned this pull request Sep 4, 2021

Add more D2 scores #20943

Open

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this issue Sep 5, 2021

FEA add d2_tweedie_score (scikit-learn#17036)

fe0ad1b

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this issue Sep 5, 2021

FEA add d2_tweedie_score (scikit-learn#17036)

a0dd864

adrinjalali pushed a commit that referenced this issue Sep 6, 2021

FEA add d2_tweedie_score (#17036)

acb98ea

samronsin pushed a commit to samronsin/scikit-learn that referenced this issue Nov 30, 2021

FEA add d2_tweedie_score (scikit-learn#17036)

9254045

scikit-learn / scikit-learn Public

FEA add d2_tweedie_score #17036

FEA add d2_tweedie_score #17036

lorentzenchr commented Apr 25, 2020 •

edited

cmarmo commented Oct 30, 2020

lorentzenchr commented Oct 30, 2020

cmarmo commented Oct 30, 2020

ogrisel left a comment

ogrisel Jul 6, 2021

lorentzenchr Aug 22, 2021

lorentzenchr Aug 22, 2021

lorentzenchr Aug 22, 2021

lorentzenchr Aug 23, 2021

ogrisel Aug 23, 2021

lorentzenchr Aug 23, 2021

lorentzenchr Aug 23, 2021

lorentzenchr Aug 23, 2021 •

edited

adrinjalali commented Aug 22, 2021

lorentzenchr commented Aug 22, 2021

lorentzenchr commented Aug 23, 2021

ogrisel commented Aug 23, 2021

lorentzenchr commented Sep 3, 2021

rth commented Sep 3, 2021

lorentzenchr commented Sep 4, 2021

rth left a comment

rth Sep 4, 2021

lorentzenchr Sep 4, 2021

		arbitrarily worse). A model that always predicts a constant value for the expected
		value of y, disregarding the input features, would get a D^2 score of 0.0.

scikit-learn / scikit-learn Public

FEA add d2_tweedie_score #17036

FEA add d2_tweedie_score #17036

Conversation

lorentzenchr commented Apr 25, 2020 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Open questions

cmarmo commented Oct 30, 2020

lorentzenchr commented Oct 30, 2020

cmarmo commented Oct 30, 2020

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr Aug 23, 2021 • edited

Choose a reason for hiding this comment

adrinjalali commented Aug 22, 2021

lorentzenchr commented Aug 22, 2021

lorentzenchr commented Aug 23, 2021

ogrisel commented Aug 23, 2021

lorentzenchr commented Sep 3, 2021

rth commented Sep 3, 2021

lorentzenchr commented Sep 4, 2021

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr commented Apr 25, 2020 •

edited

lorentzenchr Aug 23, 2021 •

edited