Skip to content

ENH FeatureUnion: Add verbose_feature_names_out parameter #25991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 16, 2024

Conversation

jiawei-zhang-a
Copy link
Contributor

Reference Issues/PRs

Fixes #25889.

What does this implement/fix? Explain your changes.

To have a verbose_feature_names_out attribute in FeatureUnion.
If True, get_feature_names_out will prefix all feature names with the name of the transformer that generated that feature. If False, get_feature_names_out will not prefix any feature names and will error if feature names are not unique.

  • Add verbose_feature_names_out attribute to class FeatureUnion
  • Modify get_feature_names_out to not add prefix automatically
  • Add function _add_prefix_for_feature_names_out to add prefix when verbose_feature_names_out is True and check that names are all unique without a prefix if False
  • Add test function test_feature_union_passthrough_get_feature_names_out_false to check for feature_names_out for verbose_feature_names_out=False
  • Add test_feature_union_passthrough_get_feature_names_out_false_errors to test when collisions happen, there are names the same without a prefix. To make the test cover _add_prefix_for_feature_names_out function

Any other comments?

@jiawei-zhang-a
Copy link
Contributor Author

jiawei-zhang-a commented Mar 29, 2023

Add test_feature_union_passthrough_get_feature_names_out_false_errors_overlap_over_5 to test when collisions happen, there are names the same without a prefix overlapping over 5. To cover this part of code
if top_6_overlap:
....

@jiawei-zhang-a
Copy link
Contributor Author

jiawei-zhang-a commented Mar 29, 2023

@thomasjpfan Mr.Fan. This PR right now has test cases covering all the new code and passes all the official tests and is ready to merge. You might be interested in taking a look at it :)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the implementation is not very DRY because it is a copy of ColumnTransformer, I am okay with it. I do not see a great home to refactor the code right now. Another reviewer may feel different about this.

I left some comments on ways to simplify the tests.

@jiawei-zhang-a
Copy link
Contributor Author

Thank you so much for the suggestion! I will modify based on this

jiawei-zhang-a and others added 6 commits April 27, 2023 16:06
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@jiawei-zhang-a
Copy link
Contributor Author

Dear Mr.Fan, I have removed the previous version of the complex version that construct a column transformer and all switched to using pandas to test:)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I am okay with repeating code here from ColumnTransformer. Let's see what a second reviewer thinks.

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Apr 28, 2023
@jiawei-zhang-a
Copy link
Contributor Author

Thanks Mr.Fan!

Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4196d08. Link to the linter CI: here

@thomasjpfan thomasjpfan merged commit 91332dc into scikit-learn:main Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:pipeline Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeatureUnion: Add verbose_feature_names_out parameter
3 participants