Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX avoid overflow in adjusted_rand_score with large amount data #20312

Merged
merged 3 commits into from Jul 22, 2021

Conversation

@divyanshudeoli
Copy link
Contributor

@divyanshudeoli divyanshudeoli commented Jun 21, 2021

Follow up #20305
for large input adjusted_rand_score() give wrong values (outside the range of 0 to 1)
converted variables from numpy.int64 to int to handle large value

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

for large input adjusted_rand_score() give wrong values (outside the range of 0 to 1)
converted variables from numpy.int64 to int to handle large value
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jun 23, 2021

You might want to merge main into your branch. We will also need a non-regression test for this matter to ensure that what we intend to correct is working as expected.

@divyanshudeoli
Copy link
Contributor Author

@divyanshudeoli divyanshudeoli commented Jun 27, 2021

@glemaitre
I am a new here
I tried merging a pull request and failed 3 checks in azure pipelines and got plenty of errors in

Linux pylatest_pip_openblas_pandas Test Library
126 failed, 20993 passed, 168 skipped, 58 xfailed, 36 xpassed, 3317 warnings in 618.68s (0:10:18)
and Windows py37_pip_openblas_32bit Test Library
= 1 failed, 19794 passed, 1126 skipped, 59 xfailed, 35 xpassed, 3702 warnings in 1012.61s (0:16:52)

If you can help or give any lead it will be helpful

@glemaitre glemaitre self-requested a review Jul 22, 2021
@glemaitre glemaitre changed the title FIX: adjusted_rand_score() for large input FIX avoid overflow in adjusted_rand_score with large amount data Jul 22, 2021
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 22, 2021

I sync with main and added a non-regression test to check the CI. Basically, the builds were not anymore available since it was a long time that the CIs ran. Let see if the CIs are still failing and what is the reason.

@glemaitre glemaitre removed their request for review Jul 22, 2021
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 22, 2021

I would like to have the though of @jeremiedbb on this matter :)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

I'm fine with that.

Another solution would be to split all terms and compute all fn / tn, fn / tp, ... first and then combine those ratios to reconstruct the result. This should also prevent overflowing, but I don't think it's necessary here and would make the code more complex.

@glemaitre glemaitre merged commit 07bc459 into scikit-learn:main Jul 22, 2021
28 checks passed
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 22, 2021

TomDLT added a commit to TomDLT/scikit-learn that referenced this issue Jul 29, 2021
samronsin added a commit to samronsin/scikit-learn that referenced this issue Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants