[FEAT] Zero division nan #23183

marctorsoc · 2022-04-21T18:19:25Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This is an extension of #14900, where I added the parameter zero_division for precision, recall, and f1. Afterwards, it was added for jaccard as well.

Here, we add the ability to set zero_division to np.nan, so that np.nan is returned when the metric is undefined. In addition to this:

when there is an average the numbers that are np.nan (due to undefined and then zero_division) are excluded from the average.
when beta=0, return precision
when just one of (precision, recall) is defined and it's 0, return fscore=0. Even if the other metric is undefined.

Specifically:

Precision:

If pred_sum = 0, undefined
If average != None, ignore from average any metric being np.nan

Recall:

If true_sum = 0, undefined
If average != None, ignore from average any class metric being np.nan

F-score:

if beta=inf, return recall, and beta=0, return precision
elif precision=0 or recall=0 (or both), return 0. <------------- this is a change
else return zero_division
If average != None, ignore from average any metric being np.nan

Jaccard:

if all labels and pred are 0, return zero_division
If average != None, ignore from average any metric being np.nan

Any other comments?

# Conflicts: # doc/whats_new/v0.21.rst # sklearn/metrics/classification.py # sklearn/metrics/tests/test_classification.py

- F-score only warns if both prec and rec are ill-defined - new private method to simplify _prf_divide

# Conflicts: # sklearn/metrics/_classification.py # sklearn/metrics/tests/test_classification.py

- add weights casting to np.array

# Conflicts: # sklearn/metrics/_classification.py # sklearn/metrics/tests/test_classification.py

marctorsoc · 2022-04-21T18:36:30Z

sklearn/metrics/_classification.py

+def _nan_average(scores: np.ndarray, weights: Optional[np.ndarray]):
+    """
+    Wrapper for np.average, with np.nan values being ignored from the average
+    This is similar to np.nanmean, but allowing to pass weights as in np.average


submitted an issue to numpy for this: numpy/numpy#21375, but let me know if there's a better solution than this wrapper!

marctorsoc · 2022-04-22T07:26:04Z

@thomasjpfan and everyone interested, this is now ready to review :)

thomasjpfan

Thank you for the PR!

I think it would be good to see what @jnothman thinks of this np.nan behavior.

thomasjpfan · 2022-04-22T12:39:29Z

sklearn/metrics/_classification.py

+    if (weights == 0).all():
+        return np.average(scores)


Checking for weights == 0 adds more computation for an edge case. Can we pass weights directly into np.average and not do this check?

unfortunately

ZeroDivisionError When all weights along axis are zero.

see https://numpy.org/doc/stable/reference/generated/numpy.average.html. But will change into try/except

thomasjpfan · 2022-04-22T12:39:29Z

sklearn/metrics/_classification.py

+        Note that if zero_division is np.nan, such values will be excluded
+        from the average.


I think we move this into zero_division. Currently, when reading the zero_division description, the reader needs to scroll up to see what zero_division=np.nan does.

thomasjpfan · 2022-04-22T12:39:29Z

sklearn/metrics/_classification.py

+    Note that if zero_division is np.nan, such values will be excluded
+    from the average.


Same here, I think we can place this in the zero_division description.

Similar comment for the other docstrings.

- move comment to zero_division - try/except in nan_average

marctorsoc · 2022-05-28T15:43:27Z

Thank you for the PR!

I think it would be good to see what @jnothman thinks of this np.nan behavior.

@jnothman can I get your 👀 here? and @thomasjpfan if you're happy with current state, maybe approve? :)

marctorsoc and others added 16 commits Sep 7, 2019

temp commit to checkout sklearn master

b197bbe

Merge branch 'sklearn_master' into marc_master

974ebe3

# Conflicts: # doc/whats_new/v0.21.rst # sklearn/metrics/classification.py # sklearn/metrics/tests/test_classification.py

- Changed whats_new to 0.22

538e599

- F-score only warns if both prec and rec are ill-defined - new private method to simplify _prf_divide

flake8 warnings

42b895d

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn

a184e7c

# Conflicts: # sklearn/metrics/_classification.py # sklearn/metrics/tests/test_classification.py

first commit. Changes made and tests passing

7ccdd88

- fix linting

e41858a

- add weights casting to np.array

add PR number to whats_new

75309ec

run black

5d5112f

tmp commit

3a11007

remove all instances of None

8b25d83

tests fixed

c0f9e99

isort

12c0af2

Merge branch 'sklearn-main'

c885d3b

Merge branch 'main' into zero_division_nan

d74646d

# Conflicts: # sklearn/metrics/_classification.py # sklearn/metrics/tests/test_classification.py

merged main into this

e55e896

github-actions bot added the module:metrics label Apr 21, 2022

marctorsoc added 3 commits Apr 21, 2022

Merge branch 'sklearn-main' into zero_division_nan

8e1051e

remove change to v0.22.rst

39b954c

remove change to v0.22.rst

5016d55

marctorsoc reviewed Apr 21, 2022

View changes

marctorsoc added 5 commits Apr 21, 2022

fix linting and other errors in CI

113d565

fix linting again...

e41a6c9

apply black

f64a877

apply black2

ebcbc38

fix docstring examples

ac0e960

marctorsoc changed the title ~~[WIP] [FEAT] Zero division nan~~ [FEAT] Zero division nan Apr 22, 2022

thomasjpfan reviewed Apr 22, 2022

View changes

PR comments:

614d1d2

- move comment to zero_division - try/except in nan_average

Merge branch 'sklearn-main' into zero_division_nan

c474657

scikit-learn / scikit-learn Public

[FEAT] Zero division nan #23183

[FEAT] Zero division nan #23183

marctorsoc commented Apr 21, 2022 •

edited

marctorsoc Apr 21, 2022

marctorsoc commented Apr 22, 2022

thomasjpfan left a comment

thomasjpfan Apr 22, 2022

marctorsoc Apr 23, 2022

thomasjpfan Apr 22, 2022

thomasjpfan Apr 22, 2022

marctorsoc commented May 28, 2022

		Note that if zero_division is np.nan, such values will be excluded
		from the average.

scikit-learn / scikit-learn Public

[FEAT] Zero division nan #23183

Are you sure you want to change the base?

[FEAT] Zero division nan #23183

Conversation

marctorsoc commented Apr 21, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

marctorsoc Apr 21, 2022

Choose a reason for hiding this comment

marctorsoc commented Apr 22, 2022

thomasjpfan left a comment

thomasjpfan Apr 22, 2022

Choose a reason for hiding this comment

marctorsoc Apr 23, 2022

Choose a reason for hiding this comment

thomasjpfan Apr 22, 2022

Choose a reason for hiding this comment

thomasjpfan Apr 22, 2022

Choose a reason for hiding this comment

marctorsoc commented May 28, 2022

marctorsoc commented Apr 21, 2022 •

edited