New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Monotonic Contraints for Tree-based models #13649
base: main
Are you sure you want to change the base?
Conversation
DecisionTrees RandomForests GradientBoosting
Very interested in this, please ping me when you have something! |
I added a few tests to check that the implementation actually enforced the monotonicity constraints, but they are failing... |
Two issues:
|
@samronsin LMK when you need reviews. I've implemented monotonic constraints for the hist-GBDTs in #15582 It's still under review so your comments are more than welcome there. I think you'll also find some tests worth using. They helped me a great deal in debugging my code |
Yes. Maybe this be done by just reversing the constraint? Not sure
You don't seem to be constraining the values of the children based on the average value of a pair of siblings. You'll need to introduce lower and upper bounds to the nodes values for monotonic constraints to be properly enforced (see https://github.com/scikit-learn/scikit-learn/pull/15582/files#diff-9bd2ee07fb6817a0aee54f959d32de8aR139). |
Some tests aren't passing for reasons given in this thread: https://github.com/scikit-learn/scikit-learn/pull/13649/files#r973630373 Olivier and I agreed that this support can be postponed and integrated in another pull request. Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Prefer: - "monotonicity constraints" to refer to all of constraints (encoded by -1, 0, 1) which replaces "monotonic constraints" - "monotonic increase constraints" (encoded by 1) which replaces "positive constraints", "monotonically increasing" and "monotonically increase" - "monotonic decrease constraints" (encoded by -1) which replaces "negative constraints", "monotonically decreasing" and "monotonically decrease" Note: "monotonically {de,in}creasing" is valid when using continuous tenses but was improperly used before. See: scikit-learn#13649 (comment) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ffb7a53
to
41e443e
Compare
Note that I have just |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
…!= 0" This reverts commit bfe94af.
This does not reallocate placeholder null'ed numpy arrays, and perform operation only when needed. Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Also simplify the logic and its comment. Note: we could factor this duplicated logic in a private function or a private method, but we would need to use pointers to modify variable on the program stack because we can't simply return several values. This factorization might be more complex than the current duplication, hence we propose leaving it as is. See: https://github.com/scikit-learn/scikit-learn/pull/13649/files#r973633867 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are just two tasks to perform before merging this PR:
- testing
assert_1d_reg_tree_children_monotonic_bounded
andassert_nd_reg_tree_children_monotonic_bounded
on cases violating constraints - asserting any significant performance regression using the ASV benchmarking suite
@ogrisel: I am doing 2.; are you interested in doing 1. following your proposal in https://github.com/scikit-learn/scikit-learn/pull/13649/files#r1006829892?
No performance regression observed with asv. Full asv report
|
Working on a test (that fails) for Here is a reproducer:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import HistGradientBoostingRegressor
import matplotlib.pyplot as plt
X = np.linspace(-5, 5, 5).reshape(-1, 1)
y = X.ravel() ** 3
X_test = np.linspace(-6, 6, 100).reshape(-1, 1)
reg = DecisionTreeRegressor(max_depth=None, random_state=0).fit(X, y)
plt.plot(X_test.ravel(), reg.predict(X_test)) looks good, and naturally monotonic as expected. Let's add the monotonicity constraint:
reg = DecisionTreeRegressor(max_depth=None, random_state=0, monotonic_cst=[1]).fit(X, y)
plt.plot(X_test.ravel(), reg.predict(X_test)) This seems overly constrained! The previous decision function should be admissible on this training set. Let's see if we have the same problem with the implementation in the hist gradient boosting trees:
reg = HistGradientBoostingRegressor(min_samples_leaf=1, max_iter=1, random_state=0).fit(
X, y
)
plt.plot(X_test.ravel(), reg.predict(X_test)) all good. Let's add the constraint: reg = HistGradientBoostingRegressor(
min_samples_leaf=1, max_iter=1, monotonic_cst=[1], random_state=0
).fit(X, y)
plt.plot(X_test.ravel(), reg.predict(X_test)) So the I think the bound propagation logic of this PR is too restrictive as discovered in the failing test I just pushed. I need more time to investigate how the 2 implementations differ. |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Thanks a lot @jjerphan and @ogrisel for all the work on this PR ! Regarding your latest findings about the overly constrained implementation @ogrisel: what is currently implemented guarantees that the trees are monotonic, i.e. a sufficient conditions for the monotonicity of the trees. However, these conditions are not necessary, as your example shows. Thus it is expected that some valid trees are discarded. I don't think that a necessary and sufficient condition can be implemented while reusing the current structures of the tree building. As argued here: #13649 (comment) I hope to have a bit of time in the coming weeks to think about it ! |
I agree with your analysis that constraints implemented for trees have no guaranteed to be minimal to ensure monotonicity but I have the feeling that on such toy 1d problems, we could do better (as is the case in hist-gradient boosted trees). I started to investigate with the following debug print statements: For the trees in this PR with the code above I get (both with depth-first and best-first):
while for the HGBRT I get:
the discretized threshold meaning for the HGBRT can be recovered with: >>> reg._bin_mapper.bin_thresholds_[0]
array([-3.75, -1.25, 1.25, 3.75]) So in conclusion the first 2 splits are identical and happen in the same order but the computed bounds are vastly different. But it's true that they do not predict the same targets. I think deeper investigation (logging) on when the constraints hold or not for each kind of model is needed. However I need to work on something else today so I will stop here fore now. Feel free to have a look. In the mean time, I will remove the 1.2 milestone because I am not sure we can resolve this in time and I don't want to delay the release process. |
Looks like it was not resolved indeed :) Moving it to 1.3 |
Continuation of PR #7266 addressing issue #6656.
TODO:
implement logic for new trees in [MRG+2] Faster Gradient Boosting Decision Trees with binned features #12807(already done by [MRG] Monotonic constraints for GBDT #15582)increasing
anddecreasing
) => let's go with themontonic_cst
array