New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD Reduce binary size of wheels #25063
BLD Reduce binary size of wheels #25063
Conversation
On |
The linux binaries do not use clang but gcc though. The size problem is only for the linux wheels right? |
It would be great to trigger an ASV run of this PR vs Alternatively we can use another machine but it needs to be dedicated to the benchmark for a few ours to make the results reliable enough. |
Actually I found a machine, let me try to do this. |
Looking at the log from the 1.1.X build they To be safe, I reverted this PR back to Edit: The revert is in |
I ran a bunch of benchmarks both on linux with gcc and on macos with clang on varvious Cython / C++ intensive estimators and I do not observe a significant performance degradation when building scikit-learn from this PR. It could even be the case that there is some slight improvement (although there is some variability in the results from run to run, especially for the hist gradient boosting models). EDIT: this comment is about the state of the PR that used the |
But even in 1.1.3 the linux binaries are already much bigger that for macos and windows. I checked that the number of files is the same between a macos wheel and a linux wheel. The only difference are the shipped dynamic libraries. |
Here are the sizes in bytes of the biggest
|
The vendored openmp runtimes are small comparatively. |
In that case, I do like having |
I now realize that the benchmarking I did for macos was with the |
A note on |
Hum this must be something else, the generated wheels for linux in the last CI build are still much bigger than for macos and windows... 30 MB or more for Linux vs less than 10 MB for those others. Maybe this is related to the configuration of debug symbols? |
Actually when I look at the build output of For instance, on my on linux with gcc I get lines such as:
|
stripping the debug symbols does not make a big difference:
|
I checked and on windows: I see the -O3
IIRC, flags that appear after in the command override flags that appear before, so O3 should be used instead of O2 (which appears earlier in the command above because it's the default one set by setuptools) However this is not the appropriate flag for msvc, It should be |
For reference, here are the artifacts and the total size of the compressed artifacts:
In all cases, the Linux wheels are much bigger than OSX and windows. The large linux wheel size existed for many versions including: PyPI 0.24.2, PyPI 1.0.2, and PyPi 1.1.3. @jeremiedbb From looking at your output and the CI for windows, |
From looking at your output, maybe the On the wheel builder CI in this PR for linux, the default is In summary, if we want |
Indeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already a net improvement even if that does not explain the big increase in size of the linux binaries compared to the other systems and older scikit-learn releases.
We can probably explore more how to improve up this later (e.g. maybe try to use clang under linux? unfortunately just setting clang
in the CC
and CXX
variables does not seem to work.
Also after merging this, we should check the impact on the next point on https://scikit-learn.org/scikit-learn-benchmarks/#ensemble.HistGradientBoostingClassifierBenchmark.time_fit and friends to see if we the switch to |
On a 16 core machine on Linux, I ran: asv continuous -b HistGradientBoostingClassifier upstream/main HEAD And get the following results: This PR + O2
This PR + O3
In all case, the |
This is compatible with what I observed. I also tried SVC and KMeans and got similar (lack of) results. |
@ogrisel I think you also need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @thomasjpfan
Indeed I forgot about that. We can try that in a follow PR experiment. I am not sure about the performance impacts and potential compatbility problems w.r.t. multithreading and openmp when scikit-learn is imported in a Python process that also imports other OpenMP / multithreaded libraries... |
This PR reduces the wheel size back to the sizes seen in 1.1.3. The changes include:
Removes the
.c
,.cpp
, and.pyx
files from the wheelUses -O2 optimization to reduce size of compiled files. It is documented that
-O3
is not great:CC @jeremiedbb