Skip to content

Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py #21628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from Nov 16, 2021

Conversation

ghost
Copy link

@ghost ghost commented Nov 10, 2021

#21598 @adrinjalali @sply88

Cut the number of splits in half twice. Didn't really change the outcome but time dropped significantly.
lc_before

lc_after

@TomDLT TomDLT changed the title Adapted the number of splits in shuffle split to increase speed Adapted the number of splits in shuffle split to increase speed in example Nov 10, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you please just propagate the ylim param to the last plot to make results comparable across models?

@ogrisel
Copy link
Member

ogrisel commented Nov 12, 2021

Please also pass random_state=0 to the call to learning_curve to get reproducible results and avoid plots such as:

image

which can be very confusing. I will open a dedicated issue.

Edit: actually the shuffle parameter is set to False by default so the results should be deterministic but since run times are not deterministic we might still get the problem at random... Not sure about what we can do.

@adrinjalali
Copy link
Member

we could exclude the very fast runs from the example maybe?

@adrinjalali adrinjalali changed the title Adapted the number of splits in shuffle split to increase speed in example Adapted the number of splits in shuffle split to increase speed in plot_learning_curve.py Nov 12, 2021
Added ``random_state=0`` to call of ``learning_curve``
@ghost
Copy link
Author

ghost commented Nov 12, 2021

@ogrisel Thanks for the input :) I added the random_state. Not sure though what you meant concerning the ylim param...I see that it is currently set for both calls of plot_learning_curve. Do you want me to adapt one of them or to remove them? Isn't it comparable now?

@ogrisel
Copy link
Member

ogrisel commented Nov 12, 2021

@ogrisel Thanks for the input :) I added the random_state.

As I explained in my edited comment, the random_state param of learning_curve has no impact when shuffle=False which is the default.

Not sure though what you meant concerning the ylim param...I see that it is currently set for both calls of plot_learning_curve. Do you want me to adapt one of them or to remove them? Isn't it comparable now?

I want the y axis (score values) of both models to be on the same scale on the last row as they are on the first row.

@ogrisel
Copy link
Member

ogrisel commented Nov 12, 2021

we could exclude the very fast runs from the example maybe?

Maybe. Or we can keep the dataset large enough for the fastest model (NB) and subsample it only for the slowest model (SVC).

Or we can just live with it.

@adrinjalali adrinjalali mentioned this pull request Nov 12, 2021
41 tasks
Added ``shuffle=True`` for ``random_state`` to make an impact
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new plot doesn't look more odd than what we have. Merging this one, we can merge this one and leave the fix for a separate PR.

@adrinjalali adrinjalali merged commit 31a75c0 into scikit-learn:main Nov 16, 2021
@ghost
Copy link
Author

ghost commented Nov 16, 2021

@adrinjalali Alright

@ghost ghost deleted the speed_increased_example_learningcurve branch November 16, 2021 14:09
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 22, 2021
* Adapted the number of splits

* Update plot_learning_curve.py

* Update plot_learning_curve.py

Added ``random_state=0`` to call of ``learning_curve``

* Update plot_learning_curve.py

Added ``shuffle=True`` for ``random_state`` to make an impact

* Update examples/model_selection/plot_learning_curve.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
* Adapted the number of splits

* Update plot_learning_curve.py

* Update plot_learning_curve.py

Added ``random_state=0`` to call of ``learning_curve``

* Update plot_learning_curve.py

Added ``shuffle=True`` for ``random_state`` to make an impact

* Update examples/model_selection/plot_learning_curve.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
* Adapted the number of splits

* Update plot_learning_curve.py

* Update plot_learning_curve.py

Added ``random_state=0`` to call of ``learning_curve``

* Update plot_learning_curve.py

Added ``shuffle=True`` for ``random_state`` to make an impact

* Update examples/model_selection/plot_learning_curve.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
* Adapted the number of splits

* Update plot_learning_curve.py

* Update plot_learning_curve.py

Added ``random_state=0`` to call of ``learning_curve``

* Update plot_learning_curve.py

Added ``shuffle=True`` for ``random_state`` to make an impact

* Update examples/model_selection/plot_learning_curve.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
glemaitre pushed a commit that referenced this pull request Dec 25, 2021
* Adapted the number of splits

* Update plot_learning_curve.py

* Update plot_learning_curve.py

Added ``random_state=0`` to call of ``learning_curve``

* Update plot_learning_curve.py

Added ``shuffle=True`` for ``random_state`` to make an impact

* Update examples/model_selection/plot_learning_curve.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants