Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py #21628

ghost · 2021-11-10T19:44:46Z

#21598 @adrinjalali @sply88

Cut the number of splits in half twice. Didn't really change the outcome but time dropped significantly.

examples/model_selection/plot_learning_curve.py

ogrisel

LGTM, could you please just propagate the ylim param to the last plot to make results comparable across models?

ogrisel · 2021-11-12T10:20:54Z

Please also pass random_state=0 to the call to learning_curve to get reproducible results and avoid plots such as:

which can be very confusing. I will open a dedicated issue.

Edit: actually the shuffle parameter is set to False by default so the results should be deterministic but since run times are not deterministic we might still get the problem at random... Not sure about what we can do.

adrinjalali · 2021-11-12T11:01:31Z

we could exclude the very fast runs from the example maybe?

Added ``random_state=0`` to call of ``learning_curve``

ghost · 2021-11-12T12:22:57Z

@ogrisel Thanks for the input :) I added the random_state. Not sure though what you meant concerning the ylim param...I see that it is currently set for both calls of plot_learning_curve. Do you want me to adapt one of them or to remove them? Isn't it comparable now?

ogrisel · 2021-11-12T13:11:54Z

@ogrisel Thanks for the input :) I added the random_state.

As I explained in my edited comment, the random_state param of learning_curve has no impact when shuffle=False which is the default.

Not sure though what you meant concerning the ylim param...I see that it is currently set for both calls of plot_learning_curve. Do you want me to adapt one of them or to remove them? Isn't it comparable now?

I want the y axis (score values) of both models to be on the same scale on the last row as they are on the first row.

ogrisel · 2021-11-12T13:15:30Z

we could exclude the very fast runs from the example maybe?

Maybe. Or we can keep the dataset large enough for the fastest model (NB) and subsample it only for the slowest model (SVC).

Or we can just live with it.

Added ``shuffle=True`` for ``random_state`` to make an impact

examples/model_selection/plot_learning_curve.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

adrinjalali

The new plot doesn't look more odd than what we have. Merging this one, we can merge this one and leave the fix for a separate PR.

ghost · 2021-11-16T14:09:28Z

@adrinjalali Alright

* Adapted the number of splits * Update plot_learning_curve.py * Update plot_learning_curve.py Added ``random_state=0`` to call of ``learning_curve`` * Update plot_learning_curve.py Added ``shuffle=True`` for ``random_state`` to make an impact * Update examples/model_selection/plot_learning_curve.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Adapted the number of splits

82c9f95

TomDLT reviewed Nov 10, 2021

View reviewed changes

examples/model_selection/plot_learning_curve.py Show resolved Hide resolved

TomDLT changed the title ~~Adapted the number of splits in shuffle split to increase speed~~ Adapted the number of splits in shuffle split to increase speed in example Nov 10, 2021

Update plot_learning_curve.py

8300607

ogrisel approved these changes Nov 12, 2021

View reviewed changes

ogrisel mentioned this pull request Nov 12, 2021

Bad error region in examples on learning curves for fast models #21643

Closed

adrinjalali changed the title ~~Adapted the number of splits in shuffle split to increase speed in example~~ Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py Nov 12, 2021

Update plot_learning_curve.py

8c6d656

Added ``random_state=0`` to call of ``learning_curve``

adrinjalali mentioned this pull request Nov 12, 2021

Accelerate slow examples #21598

Closed

41 tasks

Update plot_learning_curve.py

3afe028

Added ``shuffle=True`` for ``random_state`` to make an impact

ogrisel reviewed Nov 12, 2021

View reviewed changes

examples/model_selection/plot_learning_curve.py Outdated Show resolved Hide resolved

Update examples/model_selection/plot_learning_curve.py

6a1bc24

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

adrinjalali approved these changes Nov 16, 2021

View reviewed changes

adrinjalali merged commit 31a75c0 into scikit-learn:main Nov 16, 2021

ghost deleted the speed_increased_example_learningcurve branch November 16, 2021 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py #21628

Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py #21628

Uh oh!

ghost commented Nov 10, 2021

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel commented Nov 12, 2021 •

edited

Loading

Uh oh!

adrinjalali commented Nov 12, 2021

Uh oh!

ghost commented Nov 12, 2021

Uh oh!

ogrisel commented Nov 12, 2021

Uh oh!

ogrisel commented Nov 12, 2021

Uh oh!

Uh oh!

adrinjalali left a comment

Uh oh!

ghost commented Nov 16, 2021

Uh oh!

Uh oh!

Uh oh!

Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py #21628

Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py #21628

Uh oh!

Conversation

ghost commented Nov 10, 2021

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Nov 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Nov 12, 2021

Uh oh!

ghost commented Nov 12, 2021

Uh oh!

ogrisel commented Nov 12, 2021

Uh oh!

ogrisel commented Nov 12, 2021

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

ghost commented Nov 16, 2021

Uh oh!

Uh oh!

ogrisel commented Nov 12, 2021 •

edited

Loading