-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Adding estimators_samples_ attribute to forest models #26736
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Kay this is ready for review! |
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @adam2392!
… into est_samples
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @adam2392. A few additional comments.
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
This attribute is already public on The goal of this attribute is to make it easier to inspect the result of the learning process via public API. +1 for exposing it as a public attribute. |
@adam2392 maybe you can clarify what kind of use you make of this attribute? |
@ogrisel I plan on using this attribute for inspecting the training/testing samples used in fitting each individual tree. Right now, this is really hard to do as the fitting the forest process is entirely internal. More broadly, this would allow someone to implement "honest trees" (ref: #19710) very easily.
|
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Hi I resolved the few docstring comments, but left #26736 (comment) and #26736 (comment) as those seem like they would require changing the behavior in Happy to do so though if you think it's warranted. |
Hi @ogrisel just wanted to gently follow-up here to prevent it getting lost. Is there anything related to the current feature that you think should be still changed? And a follow-up, do you think the deprecation and renaming of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM on my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM on my side.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
|
||
assert isinstance(estimator_samples, list) | ||
assert len(estimators_samples) == len(estimators) | ||
assert estimators_samples[0].dtype == np.int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @glemaitre . One part of the if statement led to it being np.int64
instead of np.int32
Thanks for the review! I've addressed the comments left by you. Lmk if there's anything I missed. |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will merge once the doc build passed.
The failures were transient due to GitHub. |
…rn#26736) Signed-off-by: Adam Li <adam2392@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes: #26716
What does this implement/fix? Explain your changes.
estimators_samples_
property to theBaseForest
class, which allows any forest-method to recompute the indices of the training samples per tree in the forestAny other comments?
Note this is essentially very similar to
Bagging*
except we only care about the samples considered rather than samples and feature indices.This is useful for example for potentially allowing the trees to get closer to enabling things like i) honesty and ii) analysis of the trained samples used for the tree.