New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Rework plot_hashing_vs_dict_vectorizer.py example #23266
Conversation
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Thanks very much @ArturoAmorQ, this notebook is much nicer than the original benchmark script.
Here is a final batch of suggestions for improvement:
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…nto compare_vectorizers
…earn into compare_vectorizers
Thank you, @ArturoAmorQ.
I think one should use other terms to make this example more accurate.
This is for instance the case of:
- "frequency" which can be replace by "occurence (counts)" (to respect the the definition)
- "speed" which can be replaced by "data processing rate" (to respect the unit (bytes/sec))
Here are some comments and formatting fixes.
Edit: not related to this PR, but #23004 might come with new changes for this example then.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Thank you, @ArturoAmorQ.
Edit: I let @ogrisel merge if everything LGTH.
LGTM again, just a final batch of nitpicks + a formatting fix.
Merged, thank you very much for the nice contribution @ArturoAmorQ! |
Reference Issues/PRs
Related to #22928
What does this implement/fix? Explain your changes.
In #22928 we remove the use of
HashingVectorizer
from the plot_document_classification_20newsgroups.py example for the sake of simplicity.A comparison of the performance of hashers and vectorizers can be moved to this existing example.
Any other comments?
Side effect: Implements notebook style as intended in #22406