New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Partial AUC #3840
[MRG+1] Partial AUC #3840
Conversation
This seems like a useful addition. What happened to it? |
This could be useful for anomaly/outlier/novelty detection as well as we are especially interested in the performance of the scoring function (e.g. |
I stopped working on this since I wasn't sure how to test this properly. If anybody could give me some hints I would pick it up again. |
@@ -138,6 +138,12 @@ def test_roc_curve(): | |||
assert_equal(fpr.shape, tpr.shape) | |||
assert_equal(fpr.shape, thresholds.shape) | |||
|
|||
# test partial ROC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in test_roc_curve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a new function
y_true = np.array([0, 0, 1, 1]) | ||
y_scores = np.array([0.1, 0, 0.1, 0.01]) | ||
assert_equal(roc_auc_score(y_true, y_scores, max_fpr=0.3), 0.5) | ||
assert_equal(roc_auc_score(y_true, y_true, max_fpr=0.7), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This varies in both data and max_fpr from the previous line. A better test would consider minimal variations to the parameter and the data and show that the changes affect the metric in the expected way. I'd demonstrate for instance that max_fpr=1 equates to leaving it out, then reduce it.
Works fine with 0.18.2. File is ranking.py.txt How can this be brought to a finish? @Alexander-N: would you prefer if you edited some testing code into this, or are you no longer interested? @jnothman probably meant that you should add a new function to
|
Sorry for being unresponsive. I will look into it this weekend. |
I addressed the comments, this is ready for further review. Just wanted to make that clear to attract reviewers :-). |
Can you add this as a metric for common testing in metrics/tests/test_common.py? This way you can check its basic properties and invariances.
sklearn/metrics/ranking.py
Outdated
@@ -257,6 +258,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None): | |||
sample_weight : array-like of shape = [n_samples], optional | |||
Sample weights. | |||
max_fpr : float, optional If not ``None``, the standardized partial AUC | |||
over the range [0, max_fpr] is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include a citation of the reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please insert a newline
max_fpr : float, optional
If not ``None``, the standardized partial AUC over the range [0, max_fpr] is returned.
Also, maybe add here that it should be in [0, 1] and raise a ValueError
in the code if it's not?
sklearn/metrics/ranking.py
Outdated
if max_fpr is None or max_fpr == 1: | ||
return auc(fpr, tpr) | ||
|
||
idx = np.where(fpr <= max_fpr)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fpr is increasing, so it would be clearer if we just did something like:
stop = np.searchsorted(fpr, max_fpr, 'right')
and below used tpr[:stop]
instead of tpr[idx]
and avoided having both names idx_in
and idx_out
sklearn/metrics/ranking.py
Outdated
return auc(fpr, tpr) | ||
|
||
idx = np.where(fpr <= max_fpr)[0] | ||
# Linearly interpolate the ROC curve until max_fpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this should be clearer to say we're just adding a single point at max_fpr by interpolation
A partial review. For the tests, I was told very recently by @jnothman that we should use assert
instead of assert_equal
to check equality.
sklearn/metrics/ranking.py
Outdated
@@ -257,6 +258,9 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None): | |||
sample_weight : array-like of shape = [n_samples], optional | |||
Sample weights. | |||
max_fpr : float, optional If not ``None``, the standardized partial AUC | |||
over the range [0, max_fpr] is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please insert a newline
max_fpr : float, optional
If not ``None``, the standardized partial AUC over the range [0, max_fpr] is returned.
Also, maybe add here that it should be in [0, 1] and raise a ValueError
in the code if it's not?
sklearn/metrics/ranking.py
Outdated
return auc(fpr, tpr) | ||
if max_fpr is None or max_fpr == 1: | ||
return auc(fpr, tpr) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about adding
if max_fpr == 0:
return 0
as the AUC is equal to 0 when max_fpr = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since setting max_fpr
to zero makes no sense I think it might be best to raise a ValueError.
for max_fpr in np.linspace(0, 1, 5): | ||
assert_almost_equal( | ||
roc_auc_score(y_true, y_pred, max_fpr=max_fpr), | ||
_partial_roc_auc_score(y_true, y_pred, max_fpr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test the behavior when max_fpr is not in [0, 1]. If we raise a ValueError, then test that the ValueError is raised with an informative message.
y_true = np.array([0, 0, 1, 1]) | ||
assert_equal(roc_auc_score(y_true, y_true, max_fpr=1), 1) | ||
assert_equal(roc_auc_score(y_true, y_true, max_fpr=0.001), 1) | ||
assert_equal(np.isnan(roc_auc_score(y_true, y_true, max_fpr=0)), True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use
assert np.isnan(roc_auc_score(y_true, y_true, max_fpr=0))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I would be in favor of returning 0 when max_fpr=0 (see my comment above). WDYT?
sklearn/metrics/ranking.py
Outdated
fpr = np.append(fpr[idx], max_fpr) | ||
partial_auc = auc(fpr, tpr) | ||
|
||
# McClish correction: standardize result to lie between 0.5 and 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the computation of min_area
assumes that the estimator is always better than random guess. In practice, for a very bad estimator (doing worse than random guess) partial_auc
could be less than min_area
and the result will not necessarily be in [0.5, 1]. It will always be lower than 1 but can be less than 0.5 which would mean that we are doing worse than a random guess estimator. I think we should keep this standardization but we should explain somewhere that the result is not necessarily in [0.5, 1].
Great comments, thanks! Except returning 0 when max_fpr is 0, everything done as suggested. |
I would have returned 0 when max_fpr is 0, even if it does not make sense, that's what the output should be in theory but LGTM. Thanks @Alexander-N!
Please add an entry to the change log at |
max_fpr=0 might have a theoretically value, but it is a nonsense setting for an evaluation metric. Error is good. |
Fair enough. |
1d16f37
to
089f58d
Compare
Ok, done. |
Please add cmmots rather than amending to make it easier to review the changes |
Ah ok sorry. I amended to correct the original commit message. Only change is the addition of the entry in doc/whats_new/v0.20.rst. |
I was going to merge this, but I think it deserves a mention in doc/modules/model_evaluation.rather when discussing the auc score. Try to help users understand why this may be a better/worse metric for their task.
partial AUC: Integrate the ROC curve until chosen maximal FPR and standardize the result to be 0.5 if non-discriminant and 1 if maximal. Closes scikit-learn#3273
I could add something along the lines of
I feel that this might not be very helpful. Do you have something specific in mind? |
yes, that sounds good. or perhaps precede witg "one critique of auc roc is
that it gives undue weight to operating points with impractically large
for" or something with better wording!
…On 1 Mar 2018 8:05 am, "Alexander-N" ***@***.***> wrote:
I could add something along the lines of
In applications where a high false positive rate is not tolerable the
parameter ``max_fpr`` can be used to summarize the ROC curve up to the
given limit.
I feel that this might not be very helpful. Do you have something specific
in mind?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3840 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69uXfpUh15YcLkfJ8Jp21zxdbFSSks5tZb-DgaJpZM4C4keZ>
.
|
Ok, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had been suggesting you put "one critique" first as a way to improve cohesion. Now it reads a bit strangely. (And I didn't really like my own wording here)
Perhaps you should just drop my words and leave your own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Done :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs better tests, any ideas how to make them?
Issue #3273
partial AUC: integrate the ROC curve until chosen maximal FPR and standardize it to lie between 0.5 and 1.