Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Consistent cython types from _typedefs #25942

Merged
merged 11 commits into from Mar 29, 2023

Conversation

jeremiedbb
Copy link
Member

Towards #25572
Follow-up of #25810

This PR replaces the last occurences of old defined types from utils._typedefs.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jeremiedbb.

I just have few comments.

sklearn/metrics/_dist_metrics.pxd.tp Outdated Show resolved Hide resolved
sklearn/utils/_typedefs.pxd Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pxd.tp Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving forward with the concrete implementation of this RFC. Here are some feedbacks from a first quick scan:

sklearn/metrics/_dist_metrics.pxd.tp Outdated Show resolved Hide resolved
sklearn/utils/_typedefs.pxd Show resolved Hide resolved
sklearn/utils/_typedefs.pxd Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pxd.tp Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pxd.tp Outdated Show resolved Hide resolved
@pytest.mark.parametrize(
"dtype_t, value, expected_dtype",
[
("bool_t", 1, np.uint8),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool_t was probably a bad choice since there's no np.bool. We probably should have named it uint8_t. It's only used in hierarchical_fast so I can change it quickly.

@jjerphan, you mentionned that during the review of a previous PR where I changed the uses of int8 in _hierarchical_fast. With this new context, I changed my mind and now think that we should use ìnt8 consistently. What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not remember exactly, do you mean uint8 or do you mean int8?

I think we tend to use uint8 in the code-base generally for booleans.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal for boolean on this line looks good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean uint8 or do you mean int8?

I meant uint8 sorry. I was wondering if we should define ctypedef unsigned char uint8_t instead of bool_t since numpy does not define a bool_t type and to make the name match between to compile-time type and the runtime dtype.

Copy link
Member

@ogrisel ogrisel Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for a more precise uint8_t type alias instead of bool_t which can be misleading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, I have a slight preference for uint8_t over bool_t.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the last comments regarding uint8_t and bool_t are treated.

Thank you, @jeremiedbb.

sklearn/utils/_typedefs.pxd Outdated Show resolved Hide resolved
@pytest.mark.parametrize(
"dtype_t, value, expected_dtype",
[
("bool_t", 1, np.uint8),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, I have a slight preference for uint8_t over bool_t.

jeremiedbb and others added 3 commits March 27, 2023 19:00
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the following is addressed, this PR looks good to me!

@@ -1,20 +1,19 @@
#!python
# _typedefs is a declaration only module
# The functions implemented here are for testing purpose.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename this _typedefs.pyx file to _typedefs_testing.pyx or _typedefs_test_helper.pyx (and update the build setup and test accordingly) to avoid any confusion. In particular I wouldn't want contributors to be confused as to when to se the type_t fused type (the answer should be "never").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you can't have a pxd only, it would mean we have an empty _typedefs.pyx and an additional _typedefs_testing.pyx. As discussed irl, I just made it more explicit that the definitions in the pyx are for testing purpose only.

@ogrisel ogrisel merged commit 70c489f into scikit-learn:main Mar 29, 2023
26 checks passed
@ogrisel
Copy link
Member

ogrisel commented Mar 29, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants