LightGBM uses leaf-wise tree growth algorithms.
- Advantages:
Converge faster
- Disadvantages:
Tend to be over-fitting
Parameters to tune
- num_leaves
- min data in leaf
(ordered by importance level DESC, in my opinion of course)
num_leaves.
For XGBoost using depth
The main parameter to control the complexity of the tree model.
Numbers smaller than 2^(max_depth) could be better choices.
min_data_in_leaf.
The parameter to deal with over-fitting (in leaf-wise tree).
Its value depends on the number of training data and num_leaves.
Setting it to the range of (200, 999) should be enough for a large dataset.
- learning_rate
The initial learning rate I set was 0.1. After 140 rounds, I found the auc started to decrease from 0.688.
To improve accuracy, I chose a lower learning rate and enlarge the number of boosting iterations accordingly. Because with lower learning rate, it takes more rounds to converge. The accuracy 0.688 can be a target indicating that I have gone into a local minima.
The result of
- max_depth.
The parameter to limit the tree depth.
Actually, the concept depth can be forgotten in leaf-wise tree, since it doesn't have a correct mapping from leaves to depth.
### How do tree methods deal with NaN?
During EDA, I found many missing values.