-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py #21626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sveneschlbeck
@@ -49,6 +49,8 @@ | |||
param_grid={"min_samples_split": range(2, 403, 10)}, | |||
scoring=scoring, | |||
refit="AUC", | |||
cv=3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to reduce CV to 3, especially since people tend to copy/paste code.
Did you check the effect of reducing the number of samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali Agreed, we want to keep it generic :)
Removed the cv
param and decreased the sample number from 8000 to 6000: result is slightly worse but still 2X faster, so a good compromise I'd say
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, you may also check if reducing the number of samples makes much of a difference, my guess would be that n_jobs=2
is giving most of the speed up here.
@adrinjalali You are right, reducing the sample number below 6000 makes the example plot worse. Reducing from 8000 to 6000 however was good. Indeed, the |
@ogrisel wanna have a second look at this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reducing the number of samples, I suggest to reduce the number of min_samples_split values. That way the plot will be the same, with just a little less points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time is now 7sec instead of 30sec. LGTM. Thanks @sveneschlbeck !
…uation.py (scikit-learn#21626) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
#21598 @adrinjalali @sply88
Increased speed significantly by adding the parameters
cv
andn_jobs
. I setcv=3
andn_jobs=-1
. By settingn_jobs=-1
the available number of cpu cores is picked automatically to optimize calculations.