Knowing which algorithm to apply is only part of machine learning. Besides algorithms, there are 2 things to distinguish masters from others: Feature Creation and Feature Selection.

This article is about the feature engineering techniques I've tried

What makes a good feature

Informative
Independent
Simples

When to apply feature scaling on data?

Feature scaling, which is also known as data normalisation, is a technique to standardise the range of data.

Algorithm	Apply or not	Why
Gradient Descent	True	Gradient descent will converge much faster with scaled input.
K-Means, or other classifiers based on Euclidean distance	True	While calculating Euclidean distance, features with larger scale will have more power on influencing clustering results even though they are not that important.
SVM	True	It reduces time to find support vectors.
Linear Discriminant Analysis	False	his algorithm do it by desig
Naive Bayes	False	Naive Bayes do feature scaling by design

A comparison of six data imputation methods

I hate cases where some data are missing. When I have all the cookers well prepared and large amount of beaf, you tell me that we are lack of pepper?

A constant value that has meaning within the domain, such as 0, distinct from all other values.
A value from another randomly selected record.
A mean, median or mode value for the column.
A value estimated by another predictive model.

The way we treat missing values are highly dependent on domain knowledge and experience. When we are not confident enough on the imputation method we chose, just simply have a try on both and measure it with feature importance.

Feature Selection Methods:

Filter
Wrapper
Embedded

Sometimes, less is more!

By feeding models with important features only, you can:

reduce training time
fewer features reduce model complexity, making it easier to interpret
improve accuracy
reduce overfitting because we filtered out noise in data

Feature selection is independent from algorithms. It is based on scores of features in various statistical test for their correlation with the target variable. Following is a summary to look up correlation coefficients:

Feature/Target	Numerical	Categorical
Numerical	Pearson's Correlation	LDA
Categorical	ANOVA	Chi-Square

Pearson's Correlation quantifies linear dependence between two numerical variables.

Linear Discriminant Analysis looks for linear combination of features that separates two or more classes of a categorical variable.

ANOVA conducts a statistical test of whether the mean of several groups are equal or not.

Chi-Square evaluates the likelihood of correlation between groups of categorical features using their frequency distribution.

Feature Engineering

What makes a good feature

When to apply feature scaling on data?

A comparison of six data imputation methods

Feature Selection Methods:

Numerical data:

Categorical data:

results matching ""

No results matching ""