FIX avoid overflow in adjusted_rand_score with large amount data #20312

divyanshudeoli · 2021-06-21T05:59:41Z

Follow up #20305
for large input adjusted_rand_score() give wrong values (outside the range of 0 to 1)
converted variables from numpy.int64 to int to handle large value

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

for large input adjusted_rand_score() give wrong values (outside the range of 0 to 1) converted variables from numpy.int64 to int to handle large value

glemaitre · 2021-06-23T16:07:47Z

You might want to merge main into your branch. We will also need a non-regression test for this matter to ensure that what we intend to correct is working as expected.

divyanshudeoli · 2021-06-27T07:05:31Z

@glemaitre
I am a new here
I tried merging a pull request and failed 3 checks in azure pipelines and got plenty of errors in

Linux pylatest_pip_openblas_pandas Test Library
126 failed, 20993 passed, 168 skipped, 58 xfailed, 36 xpassed, 3317 warnings in 618.68s (0:10:18)
and Windows py37_pip_openblas_32bit Test Library
= 1 failed, 19794 passed, 1126 skipped, 59 xfailed, 35 xpassed, 3702 warnings in 1012.61s (0:16:52)

If you can help or give any lead it will be helpful

glemaitre · 2021-07-22T09:44:27Z

I sync with main and added a non-regression test to check the CI. Basically, the builds were not anymore available since it was a long time that the CIs ran. Let see if the CIs are still failing and what is the reason.

glemaitre · 2021-07-22T09:45:40Z

I would like to have the though of @jeremiedbb on this matter :)

jeremiedbb

I'm fine with that.

Another solution would be to split all terms and compute all fn / tn, fn / tp, ... first and then combine those ratios to reconstruct the result. This should also prevent overflowing, but I don't think it's necessary here and would make the code more complex.

glemaitre · 2021-07-22T16:04:26Z

Thanks @divyanshudeoli

…kit-learn#20312) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

FIX: adjusted_rand_score() for large input

9703000

for large input adjusted_rand_score() give wrong values (outside the range of 0 to 1) converted variables from numpy.int64 to int to handle large value

github-actions bot added the module:metrics label Jun 21, 2021

ogrisel mentioned this pull request Jun 22, 2021

FIX: adjusted_rand_score() #20310

Closed

glemaitre mentioned this pull request Jun 23, 2021

FIX : adjusted_rand_score() for large input #20311

Closed

glemaitre self-requested a review Jul 22, 2021

glemaitre added 2 commits Jul 22, 2021

TST add non-regression test

10c35cf

Merge remote-tracking branch 'origin/main' into pr/divyanshudeoli/20312

e06b60f

glemaitre changed the title ~~FIX: adjusted_rand_score() for large input~~ FIX avoid overflow in adjusted_rand_score with large amount data Jul 22, 2021

glemaitre removed their request for review Jul 22, 2021

jeremiedbb approved these changes Jul 22, 2021

View changes

glemaitre approved these changes Jul 22, 2021

View changes

glemaitre merged commit 07bc459 into scikit-learn:main Jul 22, 2021
28 checks passed

TomDLT added a commit to TomDLT/scikit-learn that referenced this issue Jul 29, 2021

FIX avoid overflow in adjusted_rand_score with large amount data (sci…

4c3349b

…kit-learn#20312) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre mentioned this pull request Sep 15, 2021

adjusted_rand_index : overflow encountered in long_scalars #21055

Closed

samronsin added a commit to samronsin/scikit-learn that referenced this issue Nov 30, 2021

FIX avoid overflow in adjusted_rand_score with large amount data (sci…

3d54e65

…kit-learn#20312) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

scikit-learn / scikit-learn Public

FIX avoid overflow in adjusted_rand_score with large amount data #20312

FIX avoid overflow in adjusted_rand_score with large amount data #20312

divyanshudeoli commented Jun 21, 2021

glemaitre commented Jun 23, 2021

divyanshudeoli commented Jun 27, 2021

glemaitre commented Jul 22, 2021

glemaitre commented Jul 22, 2021

jeremiedbb left a comment

glemaitre commented Jul 22, 2021

scikit-learn / scikit-learn Public

FIX avoid overflow in adjusted_rand_score with large amount data #20312

FIX avoid overflow in adjusted_rand_score with large amount data #20312

Conversation

divyanshudeoli commented Jun 21, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented Jun 23, 2021

divyanshudeoli commented Jun 27, 2021

glemaitre commented Jul 22, 2021

glemaitre commented Jul 22, 2021

jeremiedbb left a comment

glemaitre commented Jul 22, 2021