Parameters to tune

num_leaves
min data in leaf
(ordered by importance level DESC, in my opinion of course)

num_leaves.

For XGBoost using depth

The main parameter to control the complexity of the tree model.

Numbers smaller than 2^(max_depth) could be better choices.

min_data_in_leaf.

The parameter to deal with over-fitting (in leaf-wise tree).

Its value depends on the number of training data and num_leaves.

Setting it to the range of (200, 999) should be enough for a large dataset.

learning_rate

The initial learning rate I set was 0.1. After 140 rounds, I found the auc started to decrease from 0.688.

To improve accuracy, I chose a lower learning rate and enlarge the number of boosting iterations accordingly. Because with lower learning rate, it takes more rounds to converge. The accuracy 0.688 can be a target indicating that I have gone into a local minima.

The result of

max_depth.

The parameter to limit the tree depth.

Actually, the concept depth can be forgotten in leaf-wise tree, since it doesn't have a correct mapping from leaves to depth.

### How do tree methods deal with NaN?

During EDA, I found many missing values.

LightGBM

Parameters to tune

(ordered by importance level DESC, in my opinion of course)

num_leaves.

min_data_in_leaf.

results matching ""

No results matching ""