Skip to content

Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py #21626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from Feb 23, 2022
Merged

Conversation

ghost
Copy link

@ghost ghost commented Nov 10, 2021

#21598 @adrinjalali @sply88

Increased speed significantly by adding the parameters cv and n_jobs. I set cv=3 and n_jobs=-1. By setting n_jobs=-1 the available number of cpu cores is picked automatically to optimize calculations.

multimetric_before
multimetric_after

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -49,6 +49,8 @@
param_grid={"min_samples_split": range(2, 403, 10)},
scoring=scoring,
refit="AUC",
cv=3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to reduce CV to 3, especially since people tend to copy/paste code.

Did you check the effect of reducing the number of samples?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali Agreed, we want to keep it generic :)

Removed the cv param and decreased the sample number from 8000 to 6000: result is slightly worse but still 2X faster, so a good compromise I'd say
mm_before
mm_after

@adrinjalali adrinjalali changed the title Increased speed by adding cv and n_jobs params Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py Nov 12, 2021
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@adrinjalali adrinjalali mentioned this pull request Nov 12, 2021
41 tasks
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, you may also check if reducing the number of samples makes much of a difference, my guess would be that n_jobs=2 is giving most of the speed up here.

@ghost
Copy link
Author

ghost commented Nov 23, 2021

@adrinjalali You are right, reducing the sample number below 6000 makes the example plot worse. Reducing from 8000 to 6000 however was good. Indeed, the n_jobs param is doing the heavy lifting here.

@adrinjalali
Copy link
Member

@ogrisel wanna have a second look at this one?

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reducing the number of samples, I suggest to reduce the number of min_samples_split values. That way the plot will be the same, with just a little less points.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time is now 7sec instead of 30sec. LGTM. Thanks @sveneschlbeck !

@jeremiedbb jeremiedbb merged commit 5137abf into scikit-learn:main Feb 23, 2022
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022
…uation.py (scikit-learn#21626)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants