Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Zero division nan #23183

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Conversation

marctorsoc
Copy link
Contributor

@marctorsoc marctorsoc commented Apr 21, 2022

Reference Issues/PRs

Fixes #22625

What does this implement/fix? Explain your changes.

This is an extension of #14900, where I added the parameter zero_division for precision, recall, and f1. Afterwards, it was added for jaccard as well.

Here, we add the ability to set zero_division to np.nan, so that np.nan is returned when the metric is undefined. In addition to this:

  • when there is an average the numbers that are np.nan (due to undefined and then zero_division) are excluded from the average.
  • when beta=0, return precision
  • when just one of (precision, recall) is defined and it's 0, return fscore=0. Even if the other metric is undefined.

Specifically:

Precision:

  • If pred_sum = 0, undefined
  • If average != None, ignore from average any metric being np.nan

Recall:

If true_sum = 0, undefined
If average != None, ignore from average any class metric being np.nan

F-score:

if beta=inf, return recall, and beta=0, return precision
elif precision=0 or recall=0 (or both), return 0. <------------- this is a change
else return zero_division
If average != None, ignore from average any metric being np.nan

Jaccard:

if all labels and pred are 0, return zero_division
If average != None, ignore from average any metric being np.nan

Any other comments?

marctorsoc and others added 16 commits Sep 7, 2019
# Conflicts:
#	doc/whats_new/v0.21.rst
#	sklearn/metrics/classification.py
#	sklearn/metrics/tests/test_classification.py
- F-score only warns if both prec and rec are ill-defined
- new private method to simplify _prf_divide
# Conflicts:
#	sklearn/metrics/_classification.py
#	sklearn/metrics/tests/test_classification.py
- add weights casting to np.array
# Conflicts:
#	sklearn/metrics/_classification.py
#	sklearn/metrics/tests/test_classification.py
def _nan_average(scores: np.ndarray, weights: Optional[np.ndarray]):
"""
Wrapper for np.average, with np.nan values being ignored from the average
This is similar to np.nanmean, but allowing to pass weights as in np.average
Copy link
Contributor Author

@marctorsoc marctorsoc Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted an issue to numpy for this: numpy/numpy#21375, but let me know if there's a better solution than this wrapper!

@marctorsoc marctorsoc changed the title [WIP] [FEAT] Zero division nan [FEAT] Zero division nan Apr 22, 2022
@marctorsoc
Copy link
Contributor Author

@marctorsoc marctorsoc commented Apr 22, 2022

@thomasjpfan and everyone interested, this is now ready to review :)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Thank you for the PR!

I think it would be good to see what @jnothman thinks of this np.nan behavior.

if (weights == 0).all():
return np.average(scores)
Copy link
Member

@thomasjpfan thomasjpfan Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for weights == 0 adds more computation for an edge case. Can we pass weights directly into np.average and not do this check?

Copy link
Contributor Author

@marctorsoc marctorsoc Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately

ZeroDivisionError
When all weights along axis are zero. 

see https://numpy.org/doc/stable/reference/generated/numpy.average.html. But will change into try/except

Note that if zero_division is np.nan, such values will be excluded
from the average.
Copy link
Member

@thomasjpfan thomasjpfan Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we move this into zero_division. Currently, when reading the zero_division description, the reader needs to scroll up to see what zero_division=np.nan does.

Note that if zero_division is np.nan, such values will be excluded
from the average.
Copy link
Member

@thomasjpfan thomasjpfan Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think we can place this in the zero_division description.

Similar comment for the other docstrings.

- move comment to zero_division
- try/except in nan_average
@marctorsoc
Copy link
Contributor Author

@marctorsoc marctorsoc commented May 28, 2022

Thank you for the PR!

I think it would be good to see what @jnothman thinks of this np.nan behavior.

@jnothman can I get your 👀 here? and @thomasjpfan if you're happy with current state, maybe approve? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants