Knowing which algorithm to apply is only part of machine learning. Besides algorithms, there are 2 things to distinguish masters from others: Feature Creation and Feature Selection.
This article is about the feature engineering techniques I've tried
What makes a good feature
Informative
Independent
Simples
When to apply feature scaling on data?
Feature scaling, which is also known as data normalisation, is a technique to standardise the range of data.
Algorithm | Apply or not | Why |
---|---|---|
Gradient Descent | True | Gradient descent will converge much faster with scaled input. |
K-Means, or other classifiers based on Euclidean distance | True | While calculating Euclidean distance, features with larger scale will have more power on influencing clustering results even though they are not that important. |
SVM | True | It reduces time to find support vectors. |
Linear Discriminant Analysis | False | his algorithm do it by desig |
Naive Bayes | False | Naive Bayes do feature scaling by design |
A comparison of six data imputation methods
I hate cases where some data are missing. When I have all the cookers well prepared and large amount of beaf, you tell me that we are lack of pepper?
A constant value that has meaning within the domain, such as 0, distinct from all other values.
A value from another randomly selected record.
A mean, median or mode value for the column.
A value estimated by another predictive model.
The way we treat missing values are highly dependent on domain knowledge and experience. When we are not confident enough on the imputation method we chose, just simply have a try on both and measure it with feature importance.
Feature Selection Methods:
- Filter
- Wrapper
- Embedded
Sometimes, less is more!
By feeding models with important features only, you can:
- reduce training time
- fewer features reduce model complexity, making it easier to interpret
- improve accuracy
- reduce overfitting because we filtered out noise in data
Feature selection is independent from algorithms. It is based on scores of features in various statistical test for their correlation with the target variable. Following is a summary to look up correlation coefficients:
Feature/Target | Numerical | Categorical |
---|---|---|
Numerical | Pearson's Correlation | LDA |
Categorical | ANOVA | Chi-Square |
Pearson's Correlation quantifies linear dependence between two numerical variables.
Linear Discriminant Analysis looks for linear combination of features that separates two or more classes of a categorical variable.
ANOVA conducts a statistical test of whether the mean of several groups are equal or not.
Chi-Square evaluates the likelihood of correlation between groups of categorical features using their frequency distribution.
Numerical data:
- Interval
- Ordinal
Categorical data:
- binary
- nominal