Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST Speed-up test suite when using pytest-xdist #25918

Merged
merged 20 commits into from Mar 22, 2023

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Mar 20, 2023

[EDITED]

  • pytest_runtest_setup is called once per test, but _openmp_effective_n_threads and threadpool_limits are not cheap. It brings a significant overhead for very quick tests. This PR uses a fixture with a session scope. I tried to use pytest_sessionstart but it was not run (don't know exactly why, maybe because we run from a test folder).
  • joblib min version is 1.1.1 which exposes only_physical_cores in cpu_count. Restricting the number of openmp threads to the number of physical cores might probably speed things up. At least it does on my laptop where I have an intel cpu with hyper-threading. Anyway even when it doesn't bring speed-up, I'm pretty sure it won't bring a slow-down.
  • Only the number of openmp threads was limited but limiting the number of blas threads to 1 as well is beneficial since we set the number of xdist worker equal to the number of cores.
  • Since we have 3 cores on macos jobs, let's use 3 xdist workers

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, looking forward to seeing the impacts of those changes.

I agree that using only physical cores by default is probably a good idea.

max_n_threads = min(
omp_get_max_threads(),
cpu_count(only_physical_cores=only_physical_cores)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change impacts more scikit-learn in general and not just the test suite. I think we should document it in the changelog.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the default to False so it has no impact on scikit-learn. We can discuss whether or not we want to change the default but I think it requires to be more careful and run some experiments (iirc some discussions in histgbt weren't clear about that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I had missed that. +1 for keeping the PR focused on the tests for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just found a case of this problem here: #25714 (comment).

@jeremiedbb
Copy link
Member Author

looking forward to seeing the impacts of those changes.

Well I think it'll be hard to see in a single CI run because the fluctuations are huge (several minutes from 1 run to another).
That said, 28min on the windows job is something I've rarely seen.

For reference, on my laptop it reduced the duration from 14min to 11m30

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Mar 20, 2023

And somehow the macos jobs and some of the linux ones are crazy high....

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Mar 21, 2023

Starting to look good pretty good. (need to do some clean-up now)

@jeremiedbb
Copy link
Member Author

It's now a fixture with a session scope but it's not used if we don't run the test suite for the whole package. It seems to be the same issue that we had for global_random_seed. We probably need to make a plugin if we want to be able to use it in all situations.

@ogrisel
Copy link
Member

ogrisel commented Mar 21, 2023

It's now a fixture with a session scope but it's not used if we don't run the test suite for the whole package.

I tried it locally and it seems to work even when limiting the tests to a given submodule / test pattern. I tried:

pytest -n 4 -v sklearn/ensemble -k test_map_to_bins

and I could check that it is actually limiting the number of threads to 2 (on a machine with 8 cores) by raising an exception with the content of threadpoolctl.threadpool_info() in the test.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's both cleaner than previously and it seems to yield a measurable average speed-up when running the full test suite, although there is a lot of variability between CI runs.

@ogrisel ogrisel added Build / CI Quick Review For PRs that are quick to review labels Mar 21, 2023
@ogrisel
Copy link
Member

ogrisel commented Mar 22, 2023

Here are the Azure timings of this PR (left) vs the last run of #25930 (right):

image

This is a uniform improvement although the slowest macos run was not improved (but not degraded either).

@ogrisel
Copy link
Member

ogrisel commented Mar 22, 2023

Smaller yet also uniform improvements on the Cirrus CI runs as well (this PR left vs #25930 right).

image

@ogrisel
Copy link
Member

ogrisel commented Mar 22, 2023

For the wheel builder workflow on github actions:

although I do not understand those numbers because the individual "Build wheel for..." jobs do not seem to be uniformly improved: many have the same duration and some are degraded.

@ogrisel
Copy link
Member

ogrisel commented Mar 22, 2023

I redid the experiment described in #25918 (comment) locally with the new method (using pytest_configure instead of the session scoped auto-scoped fixture) and it still works as expected.

@lesteve I think your concern was taken into account and based on the above timing results I think we can merge.

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Mar 22, 2023

Let me just check if the conftest is discovered in the wheel builder

@lesteve
Copy link
Member

lesteve commented Mar 22, 2023

Let me just check if the conftest is discovered in the wheel builder

All builds are red in #25930 so that shows that pytest_configure is run in wheels as well and we can merge right?

@jeremiedbb
Copy link
Member Author

Yep.

Actually I just realized that the jobs of the wheel builder don't use pytest-xdist so it's expected that the durations are not impacted since the configuration in pytest_configure does the same thing as the previous configuration with the env vars.

@lesteve
Copy link
Member

lesteve commented Mar 22, 2023

OK let's merge this one, thanks a lot!

@lesteve lesteve merged commit d92c769 into scikit-learn:main Mar 22, 2023
43 of 44 checks passed
@thomasjpfan
Copy link
Member

With this PR, most CI jobs will be running with single threaded OpenMP/BLAS. I think only the wheel building jobs still runs with multiple threads because it does not install pytest-xdist.

Can we remove pytest-xdist from one of the CI jobs to have a little more coverage for running with multiple threads?

@jeremiedbb
Copy link
Member Author

With this PR, most CI jobs will be running with single threaded OpenMP/BLAS

It was already the case for OpenMP, since _openmp_effecive_n_threads returned 2 and the number of xdist workers is >= 2 (the ratio being 1).
This PR treats BLAS similarly to avoid oversubscription.

I'm not opposed to removing pytest-xdist from 1 job but we must do it in a fast job because the reason of this PR was that the CI duration was becoming more and more frustrating :)
I guess Linux pylatest_pip_openblas_pandas is a good candidate

@thomasjpfan
Copy link
Member

I guess Linux pylatest_pip_openblas_pandas is a good candidate

I'm okay with removing pytest-xdist from this one job. (I'm also curious to see how the runtime changes).

@ogrisel
Copy link
Member

ogrisel commented Mar 22, 2023

+1 but let's be very explicit (in a comment) as to why we remove it then.

@jeremiedbb
Copy link
Member Author

I opened #25943

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants