| Literature DB >> 25886892 |
Abstract
BACKGROUND: Although Linear Discriminant Analysis (LDA) is commonly used for classification, it may not be directly applied in genomics studies due to the large p, small n problem in these studies. Different versions of sparse LDA have been proposed to address this significant challenge. One implicit assumption of various LDA-based methods is that the covariance matrices are the same across different classes. However, rewiring of genetic networks (therefore different covariance matrices) across different diseases has been observed in many genomics studies, which suggests that LDA and its variations may be suboptimal for disease classifications. However, it is not clear whether considering differing genetic networks across diseases can improve classification in genomics studies.Entities:
Mesh:
Year: 2015 PMID: 25886892 PMCID: PMC4355996 DOI: 10.1186/s12859-014-0443-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The effects of block size and error margin on SQDA. The effects of two important parameters, block size and error margin, on SQDA are shown in this figure based on the simulated data.
Figure 2The effect of sample size on the seven classification methods. The effect of sample size on SQDA and six other classificaiton methods is shown in this figure based on the simulated data.
Comparisons of seven classification methods on simulated data
|
|
|
|
|
|
|---|---|---|---|---|
| DLDA | 0.048 (0.015, 50) | 0.083 (0.015, 100) | 0.228 (0.02, 1025) | 0.217 (0.04, 1175) |
| DQDA | 0.049 (0.021, 50) | 0.013 (0.007, 50) | 0.243 (0.025, 1400) | 0.214 (0.032, 825) |
| NN | 0.056 (0.02, 100) | 0.424 (0.021, 50) | 0.27 (0.034, 575) | 0.112 (0.061, 475) |
| SVM | 0.054 (0.029, 50) | 0.095 (0.024, 100) | 0.127 (0.047, 500) | 0.255 (0.05, 1050) |
| SCRDA | 0.019 (0.036, 651) | 0.024 (0.012, 2089) | 0.217 (0.041, 587) | 0.241 (0.069, 317) |
| RF | 0.109 (0.012, NA) | 0.038 (0.009, NA) | 0.262 (0.018, NA) | 0.21 (0.041, NA) |
| SQDA | 0.005 (0.002, 300) | 0.001 (0.001, 300) | 0.108 (0.042, 200) | 0.001 (0.002, 400) |
| DLDA2 | 0.002 (0.001, 600) | 0.04 (0.005, 500) | 0.224 (0.03, 700) | 0.217 (0.055, 600) |
| DQDA2 | 0.003 (0.001, 500) | 0 (0.001, 600) | 0.231 (0.033, 600) | 0.224 (0.058, 400) |
The reported numbers in each table entry in the form of a (b,c) mean: a is the average prediction error, b is the standard deviation, and c is the median number of predictors selected.
Comparisons of seven classification methods on TCGA data
|
|
|
|
|
|---|---|---|---|
| DLDA | 0.013 (0.007, 50) | 0.035 (0.027, 50) | 0.1 (0.042, 50) |
| DQDA | 0.008 (0.006,50) | 0.018 (0.019, 50) | 0.08 (0.047, 50) |
| NN | 0.014 (0.01, 50) | 0.027 (0.016, 50) | 0.085 (0.053, 50) |
| SVM | 0.01 (0.007, 50) | 0.024 (0.017, 50) | 0.088 (0.041, 50) |
| SCRDA | 0.039 (0.031, 128) | 0.044 (0.026, 95) | 0.122 (0.068, 502) |
| RF | 0.007 (0.002, NA) | 0.018 (0.009, NA) | 0.08 (0.039, NA) |
| SQDA | 0.003 (0.003, 1900) | 0.011 (0.009, 1900) | 0.036 (0.021, 2900) |
| DLDA2 | 0.017 (0.005, 12100) | 0.031 (0.013, 8900) | 0.114 (0.038, 2300) |
| DQDA2 | 0.008 (0.004,10000) | 0.023 (0.008, 8300) | 0.107 (0.05, 5800) |
|
|
|
|
|
| DLDA | 0.125 (0.024, 50) | 0.034 (0.012, 50) | 0.055 (0.017, 50) |
| DQDA | 0.11 (0.022, 50) | 0.03 (0.016, 50) | 0.045 (0.021, 50) |
| NN | 0.094 (0.029, 50) | 0.032 (0.013, 50) | 0.051(0.015, 50) |
| SVM | 0.116 (0.031, 150) | 0.037 (0.023, 50) | 0.04 (0.014, 50) |
| SCRDA | 0.094 (0.037, 1989) | 0.039 (0.021, 2200) | 0.069 (0.026, 56) |
| RF | 0.11 (0.013, NA) | 0.033 (0.013, NA) | 0.048 (0.018, NA) |
| SQDA | 0.206 (0.134, 1300) | 0.021 (0.015, 2200) | 0.04 (0.041, 500) |
| DLDA2 | 0.128 (0.026, 3400) | 0.033 (0.01, 6600) | 0.068 (0.02, 7800) |
| DQDA2 | 0.205 (0.066, 3100) | 0.049 (0.022, 5900) | 0.089 (0.027, 6100) |
|
|
|
|
|
| DLDA | 0.035 (0.017, 50) | 0.028 (0.018, 50) | 0.006 (0.008, 50) |
| DQDA | 0.018 (0.009, 50) | 0.037 (0.03, 50) | 0.004 (0.006, 50) |
| NN | 0.021 (0.013, 50) | 0.031 (0.019, 50) | 0.005 (0.009, 50) |
| SVM | 0.018 (0.012, 50) | 0.028 (0.018, 50) | 0.004 (0.006, 50) |
| SCRDA | 0.045 (0.019, 452) | 0.047 (0.011, 78) | 0.023 (0.014, 49) |
| RF | 0.027 (0.013, NA) | 0.025 (0.014, NA) | 0.011 (0.011, NA) |
| SQDA | 0.021 (0.008, 2800) | 0.009 (0.005, 6100) | 0.007 (0.008, 5900) |
| DLDA2 | 0.036 (0.015, 8000) | 0.039 (0.007, 10200) | 0.02 (0.014, 11700) |
| DQDA2 | 0.069 (0.033, 7400) | 0.045 (0.035, 9600) | 0.021 (0.014, 10200) |
The reported numbers in each table entry in the form of a (b,c) mean: a is the average prediction error, b is the standard deviation, and c is the median number of predictors selected.
Figure 3Workflow of SQDA. This figure decribes the general workflow of SQDA based on a toy example of classifications of tumor and normal samples.