Knowing which algorithm to apply is only part of machine learning. Besides algorithms, there are 2 things to distinguish masters from others: Feature Creation and Feature Selection.

This article is about the feature engineering techniques I've tried

What makes a good feature

  1. Informative

  2. Independent

  3. Simples

When to apply feature scaling on data?

Feature scaling, which is also known as data normalisation, is a technique to standardise the range of data.

Algorithm Apply or not Why
Gradient Descent True Gradient descent will converge much faster with scaled input.
K-Means, or other classifiers based on Euclidean distance True While calculating Euclidean distance, features with larger scale will have more power on influencing clustering results even though they are not that important.
SVM True It reduces time to find support vectors.
Linear Discriminant Analysis False his algorithm do it by desig
Naive Bayes False Naive Bayes do feature scaling by design

A comparison of six data imputation methods

I hate cases where some data are missing. When I have all the cookers well prepared and large amount of beaf, you tell me that we are lack of pepper?

  • A constant value that has meaning within the domain, such as 0, distinct from all other values.

  • A value from another randomly selected record.

  • A mean, median or mode value for the column.

  • A value estimated by another predictive model.

The way we treat missing values are highly dependent on domain knowledge and experience. When we are not confident enough on the imputation method we chose, just simply have a try on both and measure it with feature importance.

Feature Selection Methods:

  1. Filter
  2. Wrapper
  3. Embedded

Sometimes, less is more!

By feeding models with important features only, you can:

  1. reduce training time
  2. fewer features reduce model complexity, making it easier to interpret
  3. improve accuracy
  4. reduce overfitting because we filtered out noise in data

Feature selection is independent from algorithms. It is based on scores of features in various statistical test for their correlation with the target variable. Following is a summary to look up correlation coefficients:

Feature/Target Numerical Categorical
Numerical Pearson's Correlation LDA
Categorical ANOVA Chi-Square

Pearson's Correlation quantifies linear dependence between two numerical variables.

Linear Discriminant Analysis looks for linear combination of features that separates two or more classes of a categorical variable.

ANOVA conducts a statistical test of whether the mean of several groups are equal or not.

Chi-Square evaluates the likelihood of correlation between groups of categorical features using their frequency distribution.

Numerical data:

  1. Interval
  2. Ordinal

Categorical data:

  1. binary
  2. nominal

results matching ""

    No results matching ""