| Literature DB >> 21982277 |
Jenny Önskog1, Eva Freyhult, Mattias Landfors, Patrik Rydén, Torgeir R Hvidsten.
Abstract
BACKGROUND: Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning.Entities:
Mesh:
Year: 2011 PMID: 21982277 PMCID: PMC3229535 DOI: 10.1186/1471-2105-12-390
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of the data sets and the methods used in this study
| Data set (D) | Classes* | |
|---|---|---|
| Alizadeh | DLBCL (68), other samples (65) | 7806 (7430) |
| Finak | Epithelial (34), stromal tissue (32) | 33491 |
| Galland | Invasive NFPAs (22), non- invasive NFPAs (18) | 40475 (40291) |
| Herschkowitz | High ER expression (58), low ER expression (46) | 19718 |
| Jones | Cancerous samples (72), non-cancerous samples (19) | 40233 (39746) |
| Sørlie | High ER expression (55), low ER expression (18) | 8033 (7734) |
| Ye | Metastatic (65), non-metastatic (22) | 8911 |
| No 0 | Raw data | |
| No 1 | Print-tip MA-loess, no background correction | |
| No 2 | Print-tip MA-loess, background correction | |
| No 3 | Global MA-loess, no background correction | |
| No 4 | Global MA-loess, background correction | |
| T-test | Two-sided | |
| Relief | Threshold = 0, nosample = # obs. in data set | |
| Paired distance | Euclidian distance | |
| 2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 100, 200, 300, 400, 150, 500, 600, 700, 800, 900, 1000 | ||
| DT Gini | Decision tree, Splitting index = Gini | |
| DT Information | Decision tree, Splitting index = Information | |
| NN One layer | Neural Network, one hidden layer, decay = 0.001, rang = 0.1, maxit = 100 | size = [2-5] |
| NN No layer | Neural Network, no hidden layer, decay = 0.001, rang = 0.1, maxit = 100, skip = TRUE, size = 0 | |
| SVM Linear | Support Vector Machine, linear kernel, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE | |
| SVM Poly2 | Support Vector Machine, polynomial kernel, deg 2, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE | |
| SVM Poly3 | Support Vector Machine, polynomial kernel, deg 3, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE | |
| SVM Rb | Support Vector Machine, radial basis kernel, type = nu-scv, cross = 10, nu = 0.2, scaled = FALSE | sigma = [2-14, 214] |
Acronyms defined here are used throughout the paper. "Fixed parameters" in the methods were given fixed values, while "Optimized parameters" were optimized in the inner cross validation using a grid search. *The number of samples belonging to each class is given in parenthesis. **Dimensions after background corrected normalization (No 2 and No 4) are given in parenthesis.
Figure 1Method overview. The figure illustrates the analysis pipeline used to induce and validate models. First data was normalized (or raw data are used: No 0). Then a 5-fold cross validation (CV) was conducted to divide data into training and test sets. The training sets were used to train the models (red box), while the test sets were used to validate their classification power. In order to induce a classification model, some parameters had to be tuned. For example, the different Support vector machines (SVMs) employed different kernels with one or more parameters. The parameter sigma in the radial basis kernel was tuned by conducting a grid search and choosing the values with the lowest error rate during a 10-fold CV. The selected parameter value was finally used to induce a model from the training set in the outer CV and to classify the observations in the corresponding test set. The outer 5-fold CV was performed 10 times resulting in 50 test sets from which we evaluated 50 different models trained on 50 different trainings sets. As a measure of classification performance we used the average fraction of misclassified observations (i.e. error rate) in these 50 test sets.
Figure 2Overall error rates for each data set. Box plots showing the error rates resulting from all the different combinations of methods applied to the seven data sets.
Figure 3Classification performance relative to normalization method. Box plots showing the error rates resulting from all method-combinations utilizing a specific normalization method. Since the Finak and Galland data sets could not be normalized with methods 1 and 2, these methods are absent from the relevant plots.
Figure 4Classification performance relative to gene selection method. Box plots showing the error rates resulting from all method-combinations utilizing a specific gene selection method.
Figure 5Classification performance relative to different numbers of genes. Box plots showing the error rates resulting from all method-combinations utilizing a specific number of genes.
Figure 6Classification performance relative to machine learning method. Box plots showing the error rates resulting from all method-combinations utilizing a specific machine learning method.
Figure 7The predictive performance of individual methods across data sets. The heatmap visualize the number of data sets in which one method (row) performed significantly better than another method (column). The Wilcoxon signed-rank test was used to compare the error rates of all combinations containing one method against the error rates of all combinations containing the other method. Significance was determined using a Bonferroni corrected p-value threshold (i.e. 0.05 divided by the number of tests).
Differently expressed genes in the datasets
| Data set (D) | No. significant genes | Significance threshold |
|---|---|---|
| Alizadeh | 787 | 6.40e-06 |
| Finak | 2145 | 1.49e-06 |
| Galland | 209 | 1.24e-06 |
| Herschkowitz | 324 | 2.54e-06 |
| Jones | 6282 | 1.24e-06 |
| Sørlie | 0 | 6.22e-06 |
| Ye | 47 | 5.61e-06 |
Number of significant genes in each data set and the corresponding significance thresholds. The t-test was used to compute p-values for each gene and the Bonferroni correction was used to judge significance (i.e. significance threshold at 0.05 divided by the number of genes).
Figure 8The predictive performance of machine learning and normalization methods across data sets. The heatmap visualize the number of data sets in which one pair of methods (row) performed significantly better than another pair of methods (column). The Wilcoxon signed-rank test was used to compare the error rates of all combinations containing one pair against the error rates of all combinations containing the other pair. Significance was determined using a p-value threshold of 0.05.
Figure 9The predictive performance of machine learning and gene selection methods across data sets. See figure text of Figure 8.
Figure 10The predictive performance of gene selection and normalization methods across data sets. See figure text of Figure 8.
Figure 11Permutation tests. The plots show the spread of adjusted error rates resulting from randomly shuffling class labels in each data set. The results were obtained using the method combination No 3, T-test, 150 genes and SVM Rb, and each data set was permutated 300 times. Adjusted error rates from the original class labels are marked by red crosses.