New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Zero division nan #23183
base: main
Are you sure you want to change the base?
[FEAT] Zero division nan #23183
Conversation
# Conflicts: # doc/whats_new/v0.21.rst # sklearn/metrics/classification.py # sklearn/metrics/tests/test_classification.py
- F-score only warns if both prec and rec are ill-defined - new private method to simplify _prf_divide
# Conflicts: # sklearn/metrics/_classification.py # sklearn/metrics/tests/test_classification.py
- add weights casting to np.array
# Conflicts: # sklearn/metrics/_classification.py # sklearn/metrics/tests/test_classification.py
def _nan_average(scores: np.ndarray, weights: Optional[np.ndarray]): | ||
""" | ||
Wrapper for np.average, with np.nan values being ignored from the average | ||
This is similar to np.nanmean, but allowing to pass weights as in np.average |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitted an issue to numpy for this: numpy/numpy#21375, but let me know if there's a better solution than this wrapper!
@thomasjpfan and everyone interested, this is now ready to review :) |
Thank you for the PR!
I think it would be good to see what @jnothman thinks of this np.nan
behavior.
sklearn/metrics/_classification.py
Outdated
if (weights == 0).all(): | ||
return np.average(scores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for weights == 0
adds more computation for an edge case. Can we pass weights directly into np.average
and not do this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately
ZeroDivisionError
When all weights along axis are zero.
see https://numpy.org/doc/stable/reference/generated/numpy.average.html. But will change into try/except
sklearn/metrics/_classification.py
Outdated
Note that if zero_division is np.nan, such values will be excluded | ||
from the average. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we move this into zero_division
. Currently, when reading the zero_division
description, the reader needs to scroll up to see what zero_division=np.nan
does.
sklearn/metrics/_classification.py
Outdated
Note that if zero_division is np.nan, such values will be excluded | ||
from the average. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I think we can place this in the zero_division
description.
Similar comment for the other docstrings.
- move comment to zero_division - try/except in nan_average
@jnothman can I get your |
Reference Issues/PRs
Fixes #22625
What does this implement/fix? Explain your changes.
This is an extension of #14900, where I added the parameter
zero_division
for precision, recall, and f1. Afterwards, it was added for jaccard as well.Here, we add the ability to set zero_division to np.nan, so that np.nan is returned when the metric is undefined. In addition to this:
Specifically:
Precision:
Recall:
If true_sum = 0, undefined
If average != None, ignore from average any class metric being np.nan
F-score:
if beta=inf, return recall, and beta=0, return precision
elif precision=0 or recall=0 (or both), return 0. <------------- this is a change
else return zero_division
If average != None, ignore from average any metric being np.nan
Jaccard:
if all labels and pred are 0, return zero_division
If average != None, ignore from average any metric being np.nan
Any other comments?