Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Use check_scalar in BaseGradientBoosting #21632

Merged

Conversation

genvalen
Copy link
Contributor

@genvalen genvalen commented Nov 11, 2021

Reference Issues/PRs

This PR requires #21990 to be merged in first.
Addresses #20724 and #21927
#DataUmbrella

What does this implement/fix? Explain your changes.

Summary of changes to BaseGradientBoosting:

  • Add tests to ensure
    GradientBoostingClassifier and
    GradientBoostingRegressor
    raise proper errors when invalid arguments are passed in.
  • Use the helper function check_scalar from sklearn.utils to validate the scalar parameters.

Test and validation progress:

In both estimators

  • learning_rate
  • n_estimators
  • min_samples_split
  • min_samples_leaf
  • min_weight_fraction_leaf
  • max_depth
  • min_impurity_decrease
  • subsample
  • max_features
  • ccp_alpha
  • verbose
  • max_leaf_nodes
  • warm_start
  • validation_fraction
  • n_iter_no_change
  • tol

In GradientBoostingRegressor

  • alpha

References

  1. check_scalar docs
  2. PR #20723

Any other comments?

For the unchecked tasks, validation is coming from BaseDecisionTree, however, tests have been added for them here.

@genvalen
Copy link
Contributor Author

@genvalen genvalen commented Dec 13, 2021

Notes for parameter ranges in Trees
(param: type, range)
learning_rate: float, (0.0, inf)
n_estimators: int, [1, inf)
min_samples_split: int -> [2, inf], float-> (0, 1]
min_samples_leaf: int -> [1, inf], float-> (0, 1]
min_weight_fraction_leaf: float-> [0, 0.5]
max_depth: int, if not none, then [1, inf)
min_impurity_decrease: float, [0, inf)
subsample: float, (0, 1]
alpha: float, (0, 1)
max_features: int -> [1, number of features], float -> (0, 1], or string (not checked with check_scalar)
ccp_alpha: float, [0, inf)
verbose: int, [0, inf), or np.bool_
max_leaf_nodes: int, if not none, then [2, inf)
warm_start: check that it is int, (inf, inf) or np.bool_
validation_fraction: float, (0, 1)
n_iter_no_change: int -> [1, inf)
tol: float, (0, inf)

@genvalen
Copy link
Contributor Author

@genvalen genvalen commented Jan 8, 2022

Hi @glemaitre: just want to note that for the 7 remaining params in the PR task list (and also max_features), the validation is coming from BaseDecisionTree.



I have added tests for these params in BaseGradientBoosting that fail right now, but they should pass once BaseDecisionTree #21990 gets merged in. (they are commented out for now)

For these remaining params, please let me know if it would be helpful to include the validation explicitly within BaseGradientBoosting, too. Thank you!

@genvalen genvalen changed the title [WIP] MAINT Use check_scalar in BaseGradientBoosting [MRG] MAINT Use check_scalar in BaseGradientBoosting Jan 31, 2022
@genvalen genvalen marked this pull request as ready for review Jan 31, 2022
@genvalen
Copy link
Contributor Author

@genvalen genvalen commented Jan 31, 2022

Hi @glemaitre and @ogrisel, please review this at you convenience. Thank you!

sklearn/ensemble/_gb.py Show resolved Hide resolved
sklearn/ensemble/_gb.py Outdated Show resolved Hide resolved
sklearn/ensemble/_gb.py Show resolved Hide resolved
genvalen and others added 5 commits Jan 31, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Contributor

@glemaitre glemaitre left a comment

LGTM

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
max_val=self.n_features_in_,
include_boundaries="both",
)
Copy link
Member

@thomasjpfan thomasjpfan Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_val=self.n_features_in_,
include_boundaries="both",
)
include_boundaries="left",
)

genvalen and others added 2 commits Feb 7, 2022
@thomasjpfan thomasjpfan changed the title [MRG] MAINT Use check_scalar in BaseGradientBoosting MAINT Use check_scalar in BaseGradientBoosting Feb 7, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

LGTM

@thomasjpfan thomasjpfan merged commit b80138f into scikit-learn:main Feb 7, 2022
29 checks passed
glemaitre added a commit to glemaitre/scikit-learn that referenced this issue Feb 9, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this issue Mar 1, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants