[MRG] Add `drop_intermediate` kwarg to `metrics.precision_recall_curve` #24668

dberenbaum · 2022-10-15T02:11:51Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds a drop_intermediate kwarg to metrics.precision_recall_curve similar to the one that already exists for metrics.roc_curve. This removes unnecessary points on the curve to reduce its size.

betatim · 2022-10-17T07:58:27Z

sklearn/metrics/_ranking.py

+        # with the same tps value have the same recall and thus x coordinate.
+        # They appear as a vertical line on the plot.
+        optimal_idxs = np.where(
+            np.r_[True, np.logical_or(np.diff(tps[:-1]), np.diff(tps[1:])), True]


For my education: why does taking the "second derivative" in roc_curve work, but here it doesn't?

More precisely, why does using the second derivative work for roc_curve? What we are looking for is two (or more) points where there is no change, so the first derivative seems like the natural thing to use :-/

Thinking about this some more, could we use np.r_[True, np.diff(tps, 2), True] instead?

For tps = [1, 2, 3, 3, 3, 5, 6] we'd get [1, 3, 3, 5, 6] (the two gets dropped because its is on the line between 1 and 3. For tps = [1,2.1,3,3,3,5,6] we get [1., 2.1, 3., 3., 5., 6.].

I guess for plotting purposes it is fine to remove the 2?! Is there a reason to have different behaviour regarding the removal of points in roc_curve and this (with np.logical_or(np.diff(tps[:-1]), np.diff(tps[1:])) the 2 is kept)?

The difference is that both axes of an ROC curve have constant denominators:

fpr = fps / fps[-1] (linearly correlated with fps)

tpr = tps / tps[-1] (linearly correlated with tps)

By contrast, precision has a non-constant denominator (note that recall = tpr):

precision = tps / (tps + fps) (not linearly correlated with either tps or fps)

If you extend your example by one to tps = [1, 2, 3, 3, 3, 5, 6, 7], then you will get:

tps = [1, 3, 3, 3, 5, 6, 7] fps = [0, 0, 1, 2, 2, 2, 2] tpr = [1/7, 3/7, 3/7, 3/7, 5/7, 6/7, 7/7] fpr = [0/2, 0/2, 1/2, 2/2, 2/2, 2/2, 2/2] precision = [1/1, 3/3, 3/4, 3/5, 5/7, 6/8, 7/9]

np.r_[True, np.logical_or(np.diff(tps[:-1]), np.diff(tps[1:])), True] results in [1, 3, 3, 5, 6, 7].

np.r_[True, np.diff(tps, 2), True] results in [1, 3, 3, 5, 7].

The second method incorrectly drops the 6, which is not actually on a line in the precision-recall curve:

Today I learnt! Thanks for taking the time to explain it

betatim

Looks good to me.

Does this need an entry in "what's new"?

dberenbaum · 2022-10-17T20:31:07Z

Looks good to me.

Does this need an entry in "what's new"?

Not sure if this question is to me? I'm not sure what justifies a "what's new" entry but happy to provide one if needed.

betatim · 2022-10-18T16:25:07Z

Not sure if this question is to me? I'm not sure what justifies a "what's new" entry but happy to provide one if needed.

It was aimed at someone "in the know", because I also don't know the inclusion criteria.

Add drop_intermediate kwarg to metrics.precision_recall_curve

498806d

github-actions bot added the module:metrics label Oct 15, 2022

betatim reviewed Oct 17, 2022

View changes

betatim approved these changes Oct 17, 2022

View changes

betatim approved these changes Oct 18, 2022

View changes

[MRG] Add `drop_intermediate` kwarg to `metrics.precision_recall_curve` #24668

[MRG] Add `drop_intermediate` kwarg to `metrics.precision_recall_curve` #24668

dberenbaum commented Oct 15, 2022

betatim Oct 17, 2022

betatim Oct 17, 2022

betatim Oct 17, 2022

dberenbaum Oct 17, 2022 •

edited

betatim Oct 18, 2022

betatim left a comment

dberenbaum commented Oct 17, 2022

betatim commented Oct 18, 2022

[MRG] Add drop_intermediate kwarg to metrics.precision_recall_curve #24668

Are you sure you want to change the base?

[MRG] Add drop_intermediate kwarg to metrics.precision_recall_curve #24668

Conversation

dberenbaum commented Oct 15, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

betatim Oct 17, 2022

Choose a reason for hiding this comment

betatim Oct 17, 2022

Choose a reason for hiding this comment

betatim Oct 17, 2022

Choose a reason for hiding this comment

dberenbaum Oct 17, 2022 • edited

Choose a reason for hiding this comment

betatim Oct 18, 2022

Choose a reason for hiding this comment

betatim left a comment

dberenbaum commented Oct 17, 2022

betatim commented Oct 18, 2022

[MRG] Add `drop_intermediate` kwarg to `metrics.precision_recall_curve` #24668

[MRG] Add `drop_intermediate` kwarg to `metrics.precision_recall_curve` #24668

dberenbaum Oct 17, 2022 •

edited