Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Rework plot_hashing_vs_dict_vectorizer.py example #23266

Merged
merged 29 commits into from May 30, 2022

Conversation

ArturoAmorQ
Copy link
Contributor

@ArturoAmorQ ArturoAmorQ commented May 3, 2022

Reference Issues/PRs

Related to #22928

What does this implement/fix? Explain your changes.

In #22928 we remove the use of HashingVectorizer from the plot_document_classification_20newsgroups.py example for the sake of simplicity.
A comparison of the performance of hashers and vectorizers can be moved to this existing example.

Any other comments?

Side effect: Implements notebook style as intended in #22406

@lesteve lesteve added the Quick Review label May 11, 2022
@lesteve lesteve removed the Quick Review label May 13, 2022
Copy link
Member

@ogrisel ogrisel left a comment

Thanks for the PR, here is a batch of feedback.

examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
ArturoAmorQ and others added 3 commits May 19, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ArturoAmorQ ArturoAmorQ changed the title [WIP] DOC Rework plot_hashing_vs_dict_vectorizer.py example DOC Rework plot_hashing_vs_dict_vectorizer.py example May 20, 2022
Copy link
Member

@ogrisel ogrisel left a comment

Thanks very much @ArturoAmorQ, this notebook is much nicer than the original benchmark script.

Here is a final batch of suggestions for improvement:

examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
ArturoAmorQ and others added 3 commits May 23, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member

@jjerphan jjerphan left a comment

Thank you, @ArturoAmorQ.

I think one should use other terms to make this example more accurate.

This is for instance the case of:

  • "frequency" which can be replace by "occurence (counts)" (to respect the the definition)
  • "speed" which can be replaced by "data processing rate" (to respect the unit (bytes/sec))

Here are some comments and formatting fixes.


Edit: not related to this PR, but #23004 might come with new changes for this example then.

examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@ArturoAmorQ
Copy link
Contributor Author

@ArturoAmorQ ArturoAmorQ commented May 30, 2022

Thanks @ogrisel and @jjerphan. This notebook is much more clearer thanks to your comments.

Copy link
Member

@jjerphan jjerphan left a comment

Thank you, @ArturoAmorQ.

Edit: I let @ogrisel merge if everything LGTH.

Copy link
Member

@ogrisel ogrisel left a comment

LGTM again, just a final batch of nitpicks + a formatting fix.

examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
examples/text/plot_hashing_vs_dict_vectorizer.py Outdated Show resolved Hide resolved
@ogrisel ogrisel merged commit 6ff214c into scikit-learn:main May 30, 2022
30 checks passed
@ogrisel
Copy link
Member

@ogrisel ogrisel commented May 30, 2022

Merged, thank you very much for the nice contribution @ArturoAmorQ!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants