New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST Speed-up test suite when using pytest-xdist #25918
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, looking forward to seeing the impacts of those changes.
I agree that using only physical cores by default is probably a good idea.
max_n_threads = min( | ||
omp_get_max_threads(), | ||
cpu_count(only_physical_cores=only_physical_cores) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change impacts more scikit-learn in general and not just the test suite. I think we should document it in the changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the default to False
so it has no impact on scikit-learn. We can discuss whether or not we want to change the default but I think it requires to be more careful and run some experiments (iirc some discussions in histgbt weren't clear about that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I had missed that. +1 for keeping the PR focused on the tests for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just found a case of this problem here: #25714 (comment).
Well I think it'll be hard to see in a single CI run because the fluctuations are huge (several minutes from 1 run to another). For reference, on my laptop it reduced the duration from 14min to 11m30 |
And somehow the macos jobs and some of the linux ones are crazy high.... |
Starting to look good pretty good. (need to do some clean-up now) |
It's now a fixture with a session scope but it's not used if we don't run the test suite for the whole package. It seems to be the same issue that we had for |
I tried it locally and it seems to work even when limiting the tests to a given submodule / test pattern. I tried:
and I could check that it is actually limiting the number of threads to 2 (on a machine with 8 cores) by raising an exception with the content of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's both cleaner than previously and it seems to yield a measurable average speed-up when running the full test suite, although there is a lot of variability between CI runs.
Here are the Azure timings of this PR (left) vs the last run of #25930 (right): This is a uniform improvement although the slowest macos run was not improved (but not degraded either). |
Smaller yet also uniform improvements on the Cirrus CI runs as well (this PR left vs #25930 right). |
For the wheel builder workflow on github actions:
although I do not understand those numbers because the individual "Build wheel for..." jobs do not seem to be uniformly improved: many have the same duration and some are degraded. |
I redid the experiment described in #25918 (comment) locally with the new method (using @lesteve I think your concern was taken into account and based on the above timing results I think we can merge. |
Let me just check if the conftest is discovered in the wheel builder |
All builds are red in #25930 so that shows that |
Yep. Actually I just realized that the jobs of the wheel builder don't use pytest-xdist so it's expected that the durations are not impacted since the configuration in pytest_configure does the same thing as the previous configuration with the env vars. |
OK let's merge this one, thanks a lot! |
With this PR, most CI jobs will be running with single threaded OpenMP/BLAS. I think only the wheel building jobs still runs with multiple threads because it does not install Can we remove |
It was already the case for OpenMP, since _openmp_effecive_n_threads returned 2 and the number of xdist workers is >= 2 (the ratio being 1). I'm not opposed to removing pytest-xdist from 1 job but we must do it in a fast job because the reason of this PR was that the CI duration was becoming more and more frustrating :) |
I'm okay with removing |
+1 but let's be very explicit (in a comment) as to why we remove it then. |
I opened #25943 |
[EDITED]
pytest_runtest_setup
is called once per test, but_openmp_effective_n_threads
andthreadpool_limits
are not cheap. It brings a significant overhead for very quick tests. This PR uses a fixture with a session scope. I tried to usepytest_sessionstart
but it was not run (don't know exactly why, maybe because we run from a test folder).only_physical_cores
incpu_count
. Restricting the number of openmp threads to the number of physical cores might probably speed things up. At least it does on my laptop where I have an intel cpu with hyper-threading. Anyway even when it doesn't bring speed-up, I'm pretty sure it won't bring a slow-down.