| Literature DB >> 26170834 |
Zena M Hira1, Duncan F Gillies1.
Abstract
We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.Entities:
Year: 2015 PMID: 26170834 PMCID: PMC4480804 DOI: 10.1155/2015/198363
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Comparison between EWUSC, USC, and SC on breast cancer data [14].
Figure 2Comparison between ReliefF, Information Gain, Information Gain Ratio, and X 2 test on ALL and MLL Leukaemia datasets [21].
Deterministic versus randomised wrappers.
| Deterministic | Randomised |
|---|---|
| Small overfitting risk | High overfitting risk |
| Prone to local optima | Less prone to local optima |
| Classifier dependent | Classifier dependent |
| — | Computationally intensive |
Comparison between deterministic and randomised wrappers.
Algorithm 1Genetic algorithm.
Algorithm 2Simulated annealing algorithm.
Feature selection methods applied on microarray data.
| Method | Type | Supervised | Linear | Description |
|---|---|---|---|---|
|
| Filter | — | Yes | It finds features with a maximal difference of mean value between groups and a minimal variability within each group |
|
| ||||
| Correlation-based feature selection (CFS) [ | Filter | — | Yes | It finds features that are highly correlated with the class but are uncorrelated with each other |
|
| ||||
| Bayesian networks [ | Filter | Yes | No | They determine the causal relationships among features and remove the ones that do not have any causal relationship with the class |
|
| ||||
| Information gain (IG) [ | Filter | No | Yes | It measures how common a feature is in a class compared to all other classes |
|
| ||||
| Genetic algorithms (GA) [ | Wrapper | Yes | No | They find the smaller set of features for which the optimization criterion (classification accuracy) does not deteriorate |
|
| ||||
| Sequential search [ | Wrapper | — | — | Heuristic base search algorithm that finds the features with the highest criterion value (classification accuracy) by adding one new feature to the set every time |
|
| ||||
| SVM method of recursive feature elimination (RFE) [ | Embedded | Yes | Yes | It constructs the SVM classifier and eliminates the features based on their “weight” when constructing the classifier |
|
| ||||
| Random forests [ | Embedded | Yes | Yes | They create a number of decision trees using different samples of the original data and use different averaging algorithms to improve accuracy |
|
| ||||
| Least absolute shrinkage and selection operator (LASSO) [ | Embedded | Yes | Yes | It constructs a linear model that sets many of the feature coefficients to zero and uses the nonzero ones as the selected features. |
Different feature selection methods and their characteristics.
Figure 3Linear versus nonlinear classification problems.
Figure 4Dimensionality reduction using linear matrix factorization: projecting the data on a lower-dimensional linear subspace.
Figure 5Visualisation of a Leukaemia dataset with PCA, manifold LLE, and manifold Isomap [34].
Advantages and disadvantages between feature selection and feature extraction.
| Method | Advantages | Disadvantages |
|---|---|---|
| Selection | Preserving data characteristics for interpretability | Discriminative power |
| Lower shorter training times | ||
| Reducing overfitting | ||
|
| ||
| Extraction | Higher discriminating power | Loss of data interpretability |
| Control overfitting when it is unsupervised | Transformation maybe expensive | |
A comparison between feature selection and feature extraction methods.