Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Monotonic Contraints for Tree-based models #13649

Open
wants to merge 172 commits into
base: main
Choose a base branch
from

Conversation

samronsin
Copy link
Contributor

@samronsin samronsin commented Apr 15, 2019

Continuation of PR #7266 addressing issue #6656.

TODO:

@NicolasHug
Copy link
Member

Very interested in this, please ping me when you have something!

sklearn/ensemble/forest.py Outdated Show resolved Hide resolved
@samronsin
Copy link
Contributor Author

I added a few tests to check that the implementation actually enforced the monotonicity constraints, but they are failing...
Unless my tests are wrong, this suggests that the current implementation is not correct.

@samronsin
Copy link
Contributor Author

Two issues:

  • montonicity for classification is with respect to class 0 => switch to class 1 ?
  • a small fraction of rows (~5% for 1000 rows, which is hidden in the current tests...) fail to satisfy the monotonicity constraints -- I suspect this could be due to unfortunate float64 to float32 conversions at during the predict call

@NicolasHug
Copy link
Member

@samronsin LMK when you need reviews.

I've implemented monotonic constraints for the hist-GBDTs in #15582

It's still under review so your comments are more than welcome there. I think you'll also find some tests worth using. They helped me a great deal in debugging my code

@NicolasHug
Copy link
Member

NicolasHug commented Jan 29, 2020

montonicity for classification is with respect to class 0 => switch to class 1 ?

Yes. Maybe this be done by just reversing the constraint? Not sure

a small fraction of rows (~5% for 1000 rows, which is hidden in the current tests...) fail to satisfy the monotonicity constraints -- I suspect this could be due to unfortunate float64 to float32 conversions at during the predict call

You don't seem to be constraining the values of the children based on the average value of a pair of siblings. You'll need to introduce lower and upper bounds to the nodes values for monotonic constraints to be properly enforced (see https://github.com/scikit-learn/scikit-learn/pull/15582/files#diff-9bd2ee07fb6817a0aee54f959d32de8aR139).

jjerphan and others added 4 commits October 28, 2022 15:08
Some tests aren't passing for reasons given in this thread:
https://github.com/scikit-learn/scikit-learn/pull/13649/files#r973630373

Olivier and I agreed that this support can be postponed and
integrated in another pull request.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Prefer:
 - "monotonicity constraints" to refer to all of constraints
   (encoded by -1, 0, 1) which replaces "monotonic constraints"
 - "monotonic increase constraints" (encoded by 1) which replaces
   "positive constraints", "monotonically increasing" and "monotonically
   increase"
 - "monotonic decrease constraints" (encoded by -1) which replaces
   "negative constraints", "monotonically decreasing" and "monotonically
   decrease"

Note: "monotonically {de,in}creasing" is valid when using continuous
tenses but was improperly used before.

See:
scikit-learn#13649 (comment)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan
Copy link
Member

Note that I have just force-pushed-with-lease after approval IRL from @ogrisel to resolve the introduction of several merge commits of main after twp concurrent merge of main in this PR branch by Olivier and myself.

jjerphan and others added 10 commits October 28, 2022 15:14
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
This does not reallocate placeholder null'ed numpy arrays,
and perform operation only when needed.

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Also simplify the logic and its comment.

Note: we could factor this duplicated logic in a private function
or a private method, but we would need to use pointers to modify
variable on the program stack because we can't simply return
several values. This factorization might be more complex than
the current duplication, hence we propose leaving it as is.

See:
https://github.com/scikit-learn/scikit-learn/pull/13649/files#r973633867

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are just two tasks to perform before merging this PR:

  1. testing assert_1d_reg_tree_children_monotonic_bounded and assert_nd_reg_tree_children_monotonic_bounded on cases violating constraints
  2. asserting any significant performance regression using the ASV benchmarking suite

@ogrisel: I am doing 2.; are you interested in doing 1. following your proposal in https://github.com/scikit-learn/scikit-learn/pull/13649/files#r1006829892?

sklearn/tree/_tree.pyx Outdated Show resolved Hide resolved
sklearn/tree/_tree.pyx Outdated Show resolved Hide resolved
@jjerphan
Copy link
Member

jjerphan commented Nov 2, 2022

No performance regression observed with asv.

Full asv report
$ asv continuous -b RandomF -e main monotonic-trees | tee ~/asv-bench-monotonic-tree.log                                                                                                                                                                                       
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
·· Building 3558bfd0 <monotonic-trees> for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
·· Installing 3558bfd0 <monotonic-trees> into conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For scikit-learn commit 3558bfd0 <monotonic-trees> (round 1/1):
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[  8.33%] ··· Setting up ensemble:24                                          ok
[  8.33%] ··· ....RandomForestClassifierBenchmark.peakmem_fit                 ok
[  8.33%] ··· ================ ======
              --               n_jobs
              ---------------- ------
               representation    1   
              ================ ======
                   dense        188M 
                   sparse       438M 
              ================ ======

[ 16.67%] ··· ...domForestClassifierBenchmark.peakmem_predict                 ok
[ 16.67%] ··· ================ ======
              --               n_jobs
              ---------------- ------
               representation    1   
              ================ ======
                   dense        191M 
                   sparse       418M 
              ================ ======

[ 25.00%] ··· ...ble.RandomForestClassifierBenchmark.time_fit                 ok
[ 25.00%] ··· ================ ============
              --                  n_jobs   
              ---------------- ------------
               representation       1      
              ================ ============
                   dense         5.92±0s   
                   sparse       10.2±0.01s 
              ================ ============

[ 33.33%] ··· ...RandomForestClassifierBenchmark.time_predict                 ok
[ 33.33%] ··· ================ ===========
              --                  n_jobs  
              ---------------- -----------
               representation       1     
              ================ ===========
                   dense        177±0.4ms 
                   sparse        1.19±0s  
              ================ ===========

[ 41.67%] ··· ...omForestClassifierBenchmark.track_test_score                 ok
[ 41.67%] ··· ================ ====================
              --                      n_jobs       
              ---------------- --------------------
               representation           1          
              ================ ====================
                   dense        0.7539314095652613 
                   sparse       0.8656423941766682 
              ================ ====================

[ 50.00%] ··· ...mForestClassifierBenchmark.track_train_score                 ok
[ 50.00%] ··· ================ ====================
              --                      n_jobs       
              ---------------- --------------------
               representation           1          
              ================ ====================
                   dense        0.9963562897954773 
                   sparse       0.9996123288718864 
              ================ ====================

[ 50.00%] · For scikit-learn commit b7f3fd68 <main> (round 1/1):
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[ 58.33%] ··· Setting up ensemble:24                                          ok
[ 58.33%] ··· ....RandomForestClassifierBenchmark.peakmem_fit                 ok
[ 58.33%] ··· ================ ======
              --               n_jobs
              ---------------- ------
               representation    1   
              ================ ======
                   dense        184M 
                   sparse       433M 
              ================ ======

[ 66.67%] ··· ...domForestClassifierBenchmark.peakmem_predict                 ok
[ 66.67%] ··· ================ ======
              --               n_jobs
              ---------------- ------
               representation    1   
              ================ ======
                   dense        187M 
                   sparse       414M 
              ================ ======

[ 75.00%] ··· ...ble.RandomForestClassifierBenchmark.time_fit                 ok
[ 75.00%] ··· ================ =========
              --                 n_jobs 
              ---------------- ---------
               representation      1    
              ================ =========
                   dense        5.91±0s 
                   sparse       10.3±0s 
              ================ =========

[ 83.33%] ··· ...RandomForestClassifierBenchmark.time_predict                 ok
[ 83.33%] ··· ================ ===========
              --                  n_jobs  
              ---------------- -----------
               representation       1     
              ================ ===========
                   dense        177±0.4ms 
                   sparse        1.19±0s  
              ================ ===========

[ 91.67%] ··· ...omForestClassifierBenchmark.track_test_score                 ok
[ 91.67%] ··· ================ ====================
              --                      n_jobs       
              ---------------- --------------------
               representation           1          
              ================ ====================
                   dense        0.7539314095652613 
                   sparse       0.8656423941766682 
              ================ ====================

[100.00%] ··· ...mForestClassifierBenchmark.track_train_score                 ok
[100.00%] ··· ================ ====================
              --                      n_jobs       
              ---------------- --------------------
               representation           1          
              ================ ====================
                   dense        0.9963562897954773 
                   sparse       0.9996123288718864 
              ================ ====================


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@ogrisel
Copy link
Member

ogrisel commented Nov 4, 2022

Working on a test (that fails) for assert_nd_reg_tree_children_monotonic_bounded in 5926abe I discovered that the current implementation of monotonic_cst for trees is too constrained and rejects valid splits.

Here is a reproducer:

  • Setup:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import HistGradientBoostingRegressor
import matplotlib.pyplot as plt

X = np.linspace(-5, 5, 5).reshape(-1, 1)
y = X.ravel() ** 3

X_test = np.linspace(-6, 6, 100).reshape(-1, 1)
  • single tree without constraint on a 5 datapoint training set:
reg = DecisionTreeRegressor(max_depth=None, random_state=0).fit(X, y)
plt.plot(X_test.ravel(), reg.predict(X_test))

image

looks good, and naturally monotonic as expected. Let's add the monotonicity constraint:

  • single tree with constraint:
reg = DecisionTreeRegressor(max_depth=None, random_state=0, monotonic_cst=[1]).fit(X, y)
plt.plot(X_test.ravel(), reg.predict(X_test))

image

This seems overly constrained! The previous decision function should be admissible on this training set.

Let's see if we have the same problem with the implementation in the hist gradient boosting trees:

  • first without the constraint:
reg = HistGradientBoostingRegressor(min_samples_leaf=1, max_iter=1, random_state=0).fit(
    X, y
)
plt.plot(X_test.ravel(), reg.predict(X_test))

image

all good. Let's add the constraint:

reg = HistGradientBoostingRegressor(
    min_samples_leaf=1, max_iter=1, monotonic_cst=[1], random_state=0
).fit(X, y)
plt.plot(X_test.ravel(), reg.predict(X_test))

image

So the monotonic_cst implementation of HistGradientBoostingRegressor does not have the problem.

I think the bound propagation logic of this PR is too restrictive as discovered in the failing test I just pushed.

I need more time to investigate how the 2 implementations differ.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@samronsin
Copy link
Contributor Author

Thanks a lot @jjerphan and @ogrisel for all the work on this PR !

Regarding your latest findings about the overly constrained implementation @ogrisel: what is currently implemented guarantees that the trees are monotonic, i.e. a sufficient conditions for the monotonicity of the trees. However, these conditions are not necessary, as your example shows. Thus it is expected that some valid trees are discarded. I don't think that a necessary and sufficient condition can be implemented while reusing the current structures of the tree building.

As argued here: #13649 (comment)
if we relax these conditions, we can end up mistakenly building a non-monotonic tree. Also, unless I made a mistake, the implementation of the constraints in HistGradientBoosting do not implement sufficient conditions for monotonocity, although it looks like they work well in practice. Working on an example where they fail at discarding a non-monotonic tree (as you suggested here: #13649 (comment)) would be quite useful.

I hope to have a bit of time in the coming weeks to think about it !

@ogrisel
Copy link
Member

ogrisel commented Nov 7, 2022

I agree with your analysis that constraints implemented for trees have no guaranteed to be minimal to ensure monotonicity but I have the feeling that on such toy 1d problems, we could do better (as is the case in hist-gradient boosted trees).

I started to investigate with the following debug print statements:

For the trees in this PR with the code above I get (both with depth-first and best-first):

value: 0.0000, threshold: -3.7500,  bounds: -inf | 0.0000 || 0.0000 | inf
value: 31.2500, threshold: 3.7500,  bounds: 0.0000 | 31.2500 || 31.2500 | inf

while for the HGBRT I get:

value 0.0000, threshold: 0, bounds: -inf | -46.8750 || -46.8750 | inf
value 31.2500, threshold: 3, bounds: -46.8750 | 62.5000 || 62.5000 | inf
value -0.0000, threshold: 1, bounds: -46.8750 | -3.9062 || -3.9062 | 62.5000
value 7.8125, threshold: 2, bounds: -3.9062 | 7.8125 || 7.8125 | 62.5000

the discretized threshold meaning for the HGBRT can be recovered with:

>>> reg._bin_mapper.bin_thresholds_[0]
array([-3.75, -1.25,  1.25,  3.75])

So in conclusion the first 2 splits are identical and happen in the same order but the computed bounds are vastly different. But it's true that they do not predict the same targets. I think deeper investigation (logging) on when the constraints hold or not for each kind of model is needed. However I need to work on something else today so I will stop here fore now. Feel free to have a look.

In the mean time, I will remove the 1.2 milestone because I am not sure we can resolve this in time and I don't want to delay the release process.

@jeremiedbb
Copy link
Member

In the mean time, I will remove the 1.2 milestone because I am not sure we can resolve this in time and I don't want to delay the release process.

Looks like it was not resolved indeed :) Moving it to 1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet