Optimizing decision tree

Question

I have a question regarding the technique/technology which could be applied for the issue:

Suppose I have a rule-based tree or decision tree which predicts a variable Y based on variables A,B,C. This tree is not trained on any data but is build up because it models the 'real' system (see it as a physiologically inspired tree).

                            NODE 1: Is A > 10?
                            /               \
                           /                 \
              YES         /                   \   NO
                         /                     \
           NODE 2: Is B > 5?                 NODE 5: Is C < 8?
            /         \                         /           \
           /           \                       /             \
   YES   /             \  NO           YES   /               \  NO
       /               \                   /                 \
 NODE 3: Y = 4      NODE 4: Y = 2    NODE 6: Y = 9       NODE 7: Y = 6

So this is a 'generalized' tree from which I want to optimize according to data. F.e. using a table with new data points:

| A | B | C | Y |
|---|---|---|---|
| 5 | 9 | 8 | 10|
| 4 | 7 | 7 | 7 |
etc.

So, basically I want the NUMBERS (or parameters) in my generalized decision tree to be optimized according to the new datapoints and decide on how much these new numbers of the parameters can deviate from the original ones.

Is this a clear question?

Thank you! Regards

After some research I think the answer lies in 'differential evolution' optimization. — DannyV, Commented Oct 26, 2023 at 13:39
This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review — prashant0598, Commented Nov 10, 2023 at 14:36
I think it does provide an answer to the question. the DEoptim function in R gives you the opportunity to optimize parameters (in the author's question A-parameter, B-parameter and C-parameter) in a given, fixed structure (the decision tree above). Kind regards — DannyV, Commented Nov 10, 2023 at 16:08

spectre · Accepted Answer · 2023-10-24 15:52:58Z

You have your new data points i.e. A, B and C and you have their ground truth Y. Their are a couple of things you can do to optimise your decision tree for the new data points:

Train your model by including the new data points into the old data. This way you increase the dataset size which will increase the accuracy of the model. Also it will make the model more robust as it is training on new data which might have varying trends not seen in old data.
Tune the hyperparameters of the model by using GridSearchCV or RandomizedSearchCV. Keep in mind this should be dine after training the model on new + old data. This will help choose the best parameters for your new model.

PS: You can also train different models and see which one gives the best results but since you specifically asked for DecisionTree I am assuming that is your best model.

Cheers!

Hi @spectre, Thank you for your answer. My question was not clear enough: I did not train the initial model on data. This is a kind of tree which is set up by a team based on clinical results. It is not trained on any data but has just some physiological rules. So I cannot 'retrain' this model. I edited the question with this comment. — DannyV, Commented Oct 24, 2023 at 17:06

Stack Exchange Network

Optimizing decision tree

1 Answer 1

Your Answer

Hot Network Questions

Optimizing decision tree

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions