| Literature DB >> 26413858 |
Mohammad Manir Hossain Mollah1, Rahman Jamal1, Norfilza Mohd Mokhtar2, Roslan Harun1, Md Nurul Haque Mollah3.
Abstract
BACKGROUND: Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression.Entities:
Mesh:
Year: 2015 PMID: 26413858 PMCID: PMC4587675 DOI: 10.1371/journal.pone.0138810
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Predicted distribution of β weights.
Predicted (solid curve) and simulated (histogram) observed distributions of the β weights of Eq (5): (a) without outlying gene expressions and (b) with 5% outlying gene expressions.
Performance evaluation based on simulated gene expression profiles with m = 2 conditions/groups.
|
| ||||||||||||||||
| Methods | TPR | FPR | TNR | FNR | FDR | MER | AUC | pAUC | TPR | FPR | TNR | FNR | FDR | MER | AUC | pAUC |
|
|
| |||||||||||||||
| ANOVA | 0.939 | 0.001 | 0.998 | 0.072 | 0.072 | 0.004 | 0.915 | 0.184 | 0.475 | 0.011 | 0.989 | 0.525 | 0.525 | 0.021 | 0.474 | 0.095 |
| SAM | 0.955 | 0.001 | 0.999 | 0.045 | 0.045 | 0.002 | 0.954 | 0.191 | 0.490 | 0.010 | 0.990 | 0.510 | 0.510 | 0.020 | 0.491 | 0.098 |
| LIMMA | 0.958 | 0.001 | 0.999 | 0.042 | 0.042 | 0.002 | 0.959 | 0.192 | 0.485 | 0.011 | 0.989 | 0.515 | 0.515 | 0.021 | 0.484 | 0.097 |
| eLNN | 0.932 | 0.001 | 0.999 | 0.068 | 0.068 | 0.003 | 0.931 | 0.186 | 0.372 | 0.013 | 0.987 | 0.627 | 0.627 | 0.025 | 0.371 | 0.074 |
| EBarrays | 0.938 | 0.001 | 0.999 | 0.062 | 0.062 | 0.002 | 0.939 | 0.188 | 0.307 | 0.014 | 0.986 | 0.692 | 0.692 | 0.028 | 0.306 | 0.061 |
| BetaEB | 0.938 | 0.001 | 0.999 | 0.062 | 0.062 | 0.002 | 0.937 | 0.188 | 0.940 | 0.001 | 0.999 | 0.060 | 0.060 | 0.002 | 0.941 | 0.188 |
| KW | 0.958 | 0.000 | 1.000 | 0.042 | 0.042 | 0.001 | 0.977 | 0.196 | 0.497 | 0.010 | 0.990 | 0.502 | 0.502 | 0.020 | 0.496 | 0.100 |
| Proposed | 0.939 | 0.001 | 0.998 | 0.072 | 0.072 | 0.004 | 0.915 | 0.184 | 0.936 | 0.001 | 0.998 | 0.064 | 0.064 | 0.004 | 0.914 | 0.182 |
|
|
| |||||||||||||||
| ANOVA | 0.318 | 0.014 | 0.986 | 0.682 | 0.682 | 0.027 | 0.318 | 0.063 | 0.087 | 0.019 | 0.981 | 0.912 | 0.912 | 0.036 | 0.086 | 0.018 |
| SAM | 0.323 | 0.014 | 0.986 | 0.677 | 0.677 | 0.027 | 0.324 | 0.064 | 0.087 | 0.019 | 0.981 | 0.912 | 0.912 | 0.036 | 0.086 | 0.018 |
| LIMMA | 0.318 | 0.014 | 0.986 | 0.682 | 0.682 | 0.027 | 0.317 | 0.064 | 0.087 | 0.019 | 0.981 | 0.912 | 0.912 | 0.036 | 0.088 | 0.018 |
| eLNN | 0.250 | 0.015 | 0.985 | 0.750 | 0.750 | 0.030 | 0.251 | 0.050 | 0.025 | 0.020 | 0.980 | 0.975 | 0.975 | 0.039 | 0.026 | 0.005 |
| EBarrays | 0.210 | 0.016 | 0.984 | 0.790 | 0.790 | 0.032 | 0.211 | 0.042 | 0.025 | 0.020 | 0.980 | 0.975 | 0.975 | 0.039 | 0.024 | 0.005 |
| BetaEB | 0.940 | 0.001 | 0.999 | 0.060 | 0.060 | 0.002 | 0.941 | 0.188 | 0.025 | 0.020 | 0.980 | 0.975 | 0.975 | 0.039 | 0.024 | 0.005 |
| KW | 0.325 | 0.014 | 0.986 | 0.675 | 0.675 | 0.027 | 0.326 | 0.065 | 0.087 | 0.019 | 0.981 | 0.912 | 0.912 | 0.036 | 0.088 | 0.018 |
| Proposed | 0.932 | 0.001 | 0.998 | 0.098 | 0.098 | 0.004 | 0.907 | 0.181 | 0.920 | 0.001 | 0.998 | 0.080 | 0.080 | 0.004 | 0.887 | 0.177 |
|
| ||||||||||||||||
| Methods | TPR | FPR | TNR | FNR | FDR | MER | AUC | pAUC | TPR | FPR | TNR | FNR | FDR | MER | AUC | pAUC |
|
|
| |||||||||||||||
| ANOVA | 0.971 | 0.002 | 0.998 | 0.026 | 0.026 | 0.001 | 0.972 | 0.195 | 0.560 | 0.009 | 0.991 | 0.440 | 0.440 | 0.018 | 0.561 | 0.112 |
| SAM | 0.978 | 0.000 | 1.000 | 0.022 | 0.022 | 0.001 | 0.979 | 0.196 | 0.613 | 0.008 | 0.992 | 0.388 | 0.388 | 0.015 | 0.612 | 0.122 |
| LIMMA | 0.978 | 0.000 | 1.000 | 0.022 | 0.022 | 0.001 | 0.977 | 0.196 | 0.562 | 0.009 | 0.991 | 0.438 | 0.438 | 0.018 | 0.563 | 0.112 |
| eLNN | 0.975 | 0.001 | 0.999 | 0.025 | 0.025 | 0.001 | 0.974 | 0.195 | 0.850 | 0.003 | 0.997 | 0.150 | 0.150 | 0.006 | 0.849 | 0.170 |
| EBarrays | 0.973 | 0.001 | 0.999 | 0.028 | 0.028 | 0.001 | 0.974 | 0.195 | 0.562 | 0.009 | 0.991 | 0.438 | 0.438 | 0.018 | 0.561 | 0.112 |
| BetaEB | 0.973 | 0.001 | 0.999 | 0.028 | 0.028 | 0.001 | 0.975 | 0.195 | 0.973 | 0.001 | 0.999 | 0.028 | 0.028 | 0.001 | 0.972 | 0.195 |
| KW | 0.973 | 0.001 | 0.999 | 0.028 | 0.028 | 0.001 | 0.973 | 0.194 | 0.911 | 0.002 | 0.998 | 0.089 | 0.089 | 0.008 | 0.801 | 0.160 |
| Proposed | 0.971 | 0.002 | 0.998 | 0.026 | 0.026 | 0.001 | 0.973 | 0.195 | 0.975 | 0.001 | 0.999 | 0.025 | 0.025 | 0.001 | 0.976 | 0.195 |
|
|
| |||||||||||||||
| ANOVA | 0.420 | 0.012 | 0.988 | 0.580 | 0.580 | 0.023 | 0.420 | 0.084 | 0.312 | 0.014 | 0.986 | 0.688 | 0.688 | 0.028 | 0.311 | 0.062 |
| SAM | 0.497 | 0.010 | 0.990 | 0.502 | 0.502 | 0.020 | 0.497 | 0.099 | 0.347 | 0.013 | 0.987 | 0.652 | 0.652 | 0.026 | 0.347 | 0.069 |
| LIMMA | 0.425 | 0.012 | 0.988 | 0.575 | 0.575 | 0.023 | 0.425 | 0.085 | 0.347 | 0.013 | 0.987 | 0.652 | 0.652 | 0.026 | 0.348 | 0.069 |
| eLNN | 0.885 | 0.002 | 0.998 | 0.115 | 0.115 | 0.005 | 0.885 | 0.177 | 0.855 | 0.003 | 0.997 | 0.145 | 0.145 | 0.006 | 0.856 | 0.171 |
| EBarrays | 0.427 | 0.012 | 0.988 | 0.573 | 0.573 | 0.023 | 0.427 | 0.085 | 0.228 | 0.016 | 0.984 | 0.772 | 0.772 | 0.031 | 0.227 | 0.045 |
| BetaEB | 0.973 | 0.001 | 0.999 | 0.028 | 0.028 | 0.001 | 0.973 | 0.195 | 0.225 | 0.016 | 0.984 | 0.775 | 0.775 | 0.031 | 0.224 | 0.045 |
| KW | 0.851 | 0.004 | 0.996 | 0.149 | 0.149 | 0.008 | 0.807 | 0.161 | 0.808 | 0.009 | 0.991 | 0.192 | 0.192 | 0.002 | 0.936 | 0.187 |
| Proposed | 0.973 | 0.001 | 0.999 | 0.028 | 0.028 | 0.001 | 0.973 | 0.195 | 0.978 | 0.000 | 1.000 | 0.022 | 0.022 | 0.001 | 0.979 | 0.196 |
Average performance results of eight methods (ANOVA, SAM, LIMMA, eLNN, EBarrays, BetaEB, KW and Proposed) based on 100 datasets generated using a one-way ANOVA model with m = 2 groups/conditions and σ 2 = 0.05 for both sample sizes n1 = n2 = 3 and n1 = n2 = 15. Each dataset for each case contained 300 true DE genes, and the remainder were 19700 true EE genes. The performance indices/measures TPR, FPR, TNR, FNR, FDR, MER and AUC were calculated for each method based on the top 300 estimated DE genes, under the assumption that the other estimated genes in each dataset for each case were EE genes for each method. The performance measure ‘pAUC’ was calculated at FPR = 0.2 for each method and for each dataset.
Performance evaluation based on simulated gene expression profiles with m ≥ 3 conditions with multiple patterns.
| Methods | |||||
|---|---|---|---|---|---|
| PM/PI | ANOVA | SAM | LIMMA | KW | Proposed |
|
| |||||
|
| |||||
| FDR | 0.0967 | 0.0915 | 0.0833 | 0.8600 | 0.0967 |
| AUC | 0.9012 | 0.9089 | 0.9105 | 0.1394 | 0.9012 |
| pAUC | 0.1816 | 0.1825 | 0.1854 | 0.0274 | 0.1816 |
|
| |||||
| FDR | 0.5467 | 0.4400 | 0.5400 | 0.9033 | 0.1067 |
| AUC | 0.4533 | 0.5600 | 0.4599 | 0.0962 | 0.8933 |
| pAUC | 0.0906 | 0.1120 | 0.0919 | 0.0188 | 0.1786 |
|
| |||||
| FDR | 0.6967 | 0.7633 | 0.6933 | 0.9200 | 0.1100 |
| AUC | 0.3033 | 0.2362 | 0.3066 | 0.0796 | 0.8900 |
| pAUC | 0.0606 | 0.0469 | 0.0613 | 0.0156 | 0.1780 |
|
| |||||
|
| |||||
| FDR | 0.0200 | 0.0200 | 0.0200 | 0.0200 | 0.0233 |
| AUC | 0.9800 | 0.9800 | 0.9800 | 0.9800 | 0.9767 |
| pAUC | 0.1960 | 0.1960 | 0.1960 | 0.1960 | 0.1953 |
|
| |||||
| FDR | 0.3833 | 0.3367 | 0.3800 | 0.0567 | 0.0233 |
| AUC | 0.6165 | 0.6633 | 0.6198 | 0.9433 | 0.9767 |
| pAUC | 0.1231 | 0.1326 | 0.1238 | 0.1886 | 0.1953 |
|
| |||||
| FDR | 0.4933 | 0.4633 | 0.4267 | 0.0833 | 0.0233 |
| AUC | 0.5063 | 0.5366 | 0.5727 | 0.9166 | 0.9767 |
| pAUC | 0.1010 | 0.1073 | 0.1140 | 0.1833 | 0.1953 |
Average performance results for five methods (ANOVA, SAM, LIMMA, KW and Proposed) based on 100 datasets generated using a one-way ANOVA model for both small- and large-sample cases with m = 4 groups/conditions and σ 2 = 0.05. The performance measure/index (PM/PI) FDR was calculated for each method based on the top 300 estimated DE genes, under the assumption that the other estimated genes were EE genes for each dataset (recall that each dataset contained 300 true DE genes and the remainder were EE genes). The performance measure ‘pAUC’ was calculated at FPR = 0.2 for each method and for each dataset.
Performance evaluation based on simulated gene expression profiles with m ≥ 3 conditions with multiple patterns based on raw p-values and adjusted p-values (controlling FDR and FWER) at 1% for a small sample size.
| METHODS | Without outlying expressions | With 5% outlying expressions in 5% genes | With 5% outlying expressions in 10% genes |
|---|---|---|---|
|
| |||
|
| 0.944 | 0.467 | 0.314 |
| (0.012) | (0.012) | (0.011) | |
|
| 0.947 | 0.553 | 0.220 |
| (0.003) | (0.001) | (0.006) | |
|
| 0.955 | 0.471 | 0.315 |
| (0.009) | (0.012) | (0.012) | |
|
| 0.000 | 0.000 | 0.000 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.944 | 0.941 | 0.939 |
| (0.012) | (0.012) | (0.013) | |
|
| |||
|
| 0.893 | 0.431 | 0.284 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.440 | 0.390 | 0.000 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.905 | 0.438 | 0.291 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.000 | 0.000 | 0.000 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.893 | 0.873 | 0.875 |
| (0.000) | (0.000) | (0.000) | |
|
| |||
|
| 0.891 | 0.383 | 0.256 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.623 | 0.328 | 0.222 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.901 | 0.399 | 0.264 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.000 | 0.000 | 0.000 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.891 | 0.890 | 0.888 |
| (0.000) | (0.000) | (0.000) | |
Average performance results for five methods (ANOVA, SAM, LIMMA, KW and Proposed) based on 100 datasets generated using a one-way ANOVA model for the small-sample (n 1 = n2 = n 3 = n 4 = 3) case with m = 4 groups/conditions and σ 2 = 0.05, where each dataset contained 300 true genes and the remainder were 19700 true EE genes. The values represent the average TPRs based on raw p-values and adjusted p-values for the ANOVA, SAM, LIMMA, KW and Proposed methods in both the absence and the presence of outlying expressions. The value within the bracket () indicates the average FPRs. The adjusted p-values were calculated using Benjamini-Hochberg (BH) and Bonferroni correction methods.
Performance evaluation based on simulated gene expression profiles with m ≥ 3 conditions with multiple patterns based on raw p-values and adjusted p-values (controlling FDR and FWER) at 1% for a large sample size.
| METHODS | Without outlying expressions | With 5% outlying expressions in 5% genes | With 5% outlying expressions in 10% genes |
|---|---|---|---|
|
| |||
|
| 0.981 | 0.631 | 0.503 |
| (0.010) | (0.009) | (0.009) | |
|
| 0.980 | 0.841 | 0.719 |
| (0.004) | (0.032) | (0.056) | |
|
| 0.983 | 0.635 | 0.508 |
| (0.010) | (0.009) | (0.009) | |
|
| 0.982 | 0.975 | 0.975 |
| (0.008) | (0.008) | (0.008) | |
|
| 0.983 | 0.981 | 0.981 |
| (0.037) | (0.036) | (0.036) | |
|
| |||
|
| 0.975 | 0.522 | 0.381 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.957 | 0.602 | 0.450 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.977 | 0.525 | 0.386 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.972 | 0.872 | 0.855 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.977 | 0.968 | 0.975 |
| (0.002) | (0.002) | (0.002) | |
|
| |||
|
| 0.965 | 0.485 | 0.345 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.937 | 0.477 | 0.333 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.970 | 0.485 | 0.345 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.941 | 0.581 | 0.469 |
| (0.000) | (0.000) | (0.000) | |
|
| 0.969 | 0.955 | 0.968 |
| (0.000) | (0.000) | (0.000) | |
Average performance results for five methods (ANOVA, SAM, LIMMA, KW and Proposed) based on 100 datasets generated using a one-way ANOVA model for the large-sample (n 1 = n2 = n 3 = n 4 = 15) case with m = 4 groups/conditions and σ 2 = 0.05, where each dataset contained 300 true genes and the remainder were 19700 true EE genes. The values represent the average TPRs based on raw p-values and adjusted p-values for the ANOVA, SAM, LIMMA, KW and Proposed methods in both the absence and the presence of outlying expressions. The value within the bracket () indicates the average FPRs. The adjusted p-values were calculated using Benjamini-Hochberg (BH) and Bonferroni correction methods.
Performance evaluation in multiple comparison tests using four methods (ANOVA, LIMMA, KW and Proposed) for a small sample size.
|
|
|
| |
|---|---|---|---|
| Without outlying expressions | |||
|
| (0.9379) | - | - |
| {1.0000} | - | - | |
| [0.8808] | - | - | |
| ⟨0.5537⟩ | - | - | |
|
| (0.0003) | (0.0004) | - |
| {0.0000} | {0.0000} | - | |
| [0.0364] | [0.0419] | - | |
| ⟨0.0001⟩ | ⟨0.0001⟩ | - | |
|
| (0.0003) | (0.0006) | (0.9110) |
| {0.0000} | {0.0001} | {1.0000} | |
| [0.0396] | [0.0450] | [0.8796] | |
| ⟨0.0001⟩ | ⟨0.0002⟩ | ⟨0.5166⟩ | |
| With 5% outlying expressions | |||
|
| (0.9343) | - | - |
| {1.0000} | - | - | |
| [0.7964] | - | - | |
| ⟨0.5498⟩ | - | - | |
|
| (0.3865) | (0.4527) | - |
| {0.6605} | {0.6610} | - | |
| [0.4104] | [0.4738] | - | |
| ⟨0.0002⟩ | ⟨0.0002⟩ | - | |
|
| (0.3759) | (0.4393) | (0.9026) |
| {0.6057} | {0.6313} | {1.0000} | |
| [0.3717] | [0.4564] | [0.8553] | |
| ⟨0.0002⟩ | ⟨0.0002⟩ | ⟨0.5163⟩ | |
Average p-values for multiple comparison tests using four methods (ANOVA, LIMMA, KW and Proposed) based on 100 sets of expressions for a DE gene with the pattern μ 1 = μ 2 ≠ μ 3 = μ 4 for small samples of size (n1 = n2 = n3 = n4 = 3) with m = 4 conditions and σ 2 = 0.05. The expression profiles for the gene were generated using a one-way ANOVA model. The p-value was calculated for each method and each dataset. A larger p-value indicates equality between the mean expressions of two groups. The bracket types (), {}, [] and ⟨⟩ indicate the average gene p-values obtained using ANOVA, LIMMA, KW and the proposed methods, respectively.
Performance evaluation in multiple comparison tests using four methods (ANOVA, LIMMA, KW and Proposed) for a large sample size.
|
|
|
| |
|---|---|---|---|
| Without outlying expressions | |||
|
| (0.9944) | - | - |
| {1.0000} | - | - | |
| [0.9118] | - | - | |
| ⟨0.5709⟩ | - | - | |
|
| (0.0000) | (0.0000) | - |
| {0.0000} | {0.0000} | - | |
| [0.0000] | [0.0000] | - | |
| ⟨0.0000⟩ | ⟨0.0000⟩ | - | |
|
| (0.0000) | (0.0000) | (0.0000) |
| {0.0000} | {0.0000} | {0.0000} | |
| [0.0000] | [0.0000] | [0.0000] | |
| ⟨0.0000⟩ | ⟨0.0000⟩ | ⟨0.0000⟩ | |
| With 5% outlying expressions | |||
|
| (0.9213) | - | - |
| {1.0000} | - | - | |
| [0.8504] | - | - | |
| ⟨0.5007⟩ | - | - | |
|
| (0.4452) | (0.4481) | - |
| {0.7440} | {0.7639} | - | |
| [0.0003] | [0.0001] | - | |
| ⟨0.0000⟩ | ⟨0.0000⟩ | - | |
|
| (0.2645) | (0.2219) | (0.6475) |
| {0.4775} | {0.4008} | {0.9199} | |
| [0.0000] | [0.0000] | [0.0026] | |
| ⟨0.0000⟩ | ⟨0.0000⟩ | ⟨0.0000⟩ | |
Average p-values for multiple comparison tests using four methods (ANOVA, LIMMA, KW and Proposed) based on 100 sets of expressions for a DE gene with the pattern μ 1 = μ 2 ≠ μ 3 ≠ μ 4 for large samples of size (n1 = n2 = n3 = n4 = 15) with m = 4 conditions and σ 2 = 0.05. The expression profiles for the gene were generated using a one-way ANOVA model. The p-value was calculated for each method and each dataset. A larger p-value indicates equality between the mean expressions of two groups. The bracket types (), {}, [] and ⟨⟩ indicate the average gene p-values obtained using ANOVA, LIMMA, KW and the proposed methods, respectively.
Performance evaluation in pairwise comparison tests using four methods (ANOVA, LIMMA, KW and Proposed) for the small-sample case.
|
| ||||||
|---|---|---|---|---|---|---|
| Pair of groups | Predicted DR | True DR | Correctly predicted DR | Predicted UR | True UR | Correctly predicted UR |
|
| ||||||
| G1 vs. G2 | {49, 51, 50, | 50 | {49, 50, 50, | {101, 100, 98, | 100 | {100, 99, 98, |
| G1 vs. G3 | {130, 129, 130, | 130 | {130, 129, 130, | {99, 102, 101, | 100 | {99, 100, 100, |
| G1 vs. G4 | {50, 49, 49, | 50 | {50, 49, 48, | {101, 100, 99, | 100 | {100, 99, 99, |
| G2 vs. G3 | {100, 101, 102, | 100 | {99, 100, 100, | {20, 19, 20, | 20 | {20, 19, 20, |
| G2 vs. G4 | {51, 49, 47, | 50 | {50, 49, 47, | {49, 50, 49, | 50 | {49, 50, 49, |
| G3 vs. G4 | {50, 50, 51, | 50 | {49, 48, 50, | {132, 128, 129, | 130 | {130, 128, 129, |
|
| ||||||
| G1 vs. G2 | {37, 31, 41, | 50 | {37, 31, 41, | {83, 74, 84, | 100 | {83, 74, 84, |
| G1 vs. G3 | {107, 90, 112, | 130 | {107, 90, 112, | {83, 76, 84, | 100 | {83, 76, 84, |
| G1 vs. G4 | {43, 38, 43, | 50 | {43, 38, 43, | {83, 69, 83, | 100 | {83, 69, 83, |
| G2 vs. G3 | {84, 69, 87, | 100 | {84, 69, 87, | {15, 12, 16, | 20 | {15, 12, 16, |
| G2 vs. G4 | {43, 36, 43, | 50 | {43, 36, 43, | {37, 27, 43, | 50 | {37, 27, 43, |
| G3 vs. G4 | {43, 37, 43, | 50 | {43, 37, 43, | {107, 87, 117, | 130 | {107, 87, 117, |
We generated 300 DE genes out of 20,000 total genes for m = 4 conditions with different patterns for a small-sample case (n1 = n2 = n3 = n4 = 6) and σ 2 = 0.05, with a 2-fold change in expression between the groups, to investigate the pattern-detection performance of the proposed method in comparison with the others. The values reported in the form {x, x, x, x} in this table represent the numbers of downregulated (DR) or upregulated (UR) differentially expressed (DE) genes estimated by the ANOVA, LIMMA, KW and proposed (Bold) methods, respectively. Note that indicates significant 2-fold downregulation and indicates significant 2-fold upregulation.
Performance evaluation based on Spike gene expression profiles with 2 conditions for the sample case (n1 = n2 = 9).
|
| ||||||||
| Methods | TPR | FPR | TNR | FNR | FDR | MER | AUC | pAUC |
| ANOVA | 0.8189 | 0.0210 | 0.9790 | 0.1811 | 0.1811 | 0.0376 | 0.8165 | 0.1613 |
| SAM | 0.8338 | 0.0193 | 0.9807 | 0.1662 | 0.1662 | 0.0345 | 0.8319 | 0.1648 |
| LIMMA | 0.8297 | 0.0197 | 0.9803 | 0.1703 | 0.1703 | 0.0354 | 0.8283 | 0.1645 |
| eLNN | 0.8071 | 0.0224 | 0.9776 | 0.1929 | 0.1929 | 0.0401 | 0.8057 | 0.1600 |
| EBarrays | 0.8292 | 0.0198 | 0.9802 | 0.1708 | 0.1708 | 0.0355 | 0.8275 | 0.1641 |
| BetaEB | 0.8066 | 0.0201 | 0.9799 | 0.1934 | 0.1934 | 0.0360 | 0.8247 | 0.1634 |
| KW | 0.8097 | 0.0221 | 0.9779 | 0.1903 | 0.1903 | 0.0396 | 0.8053 | 0.1576 |
| Proposed | 0.8098 | 0.0195 | 0.9805 | 0.1902 | 0.1902 | 0.0348 | 0.8225 | 0.1647 |
|
| ||||||||
| ANOVA | 0.7618 | 0.0276 | 0.9724 | 0.2382 | 0.2382 | 0.0495 | 0.7592 | 0.1498 |
| SAM | 0.7731 | 0.0263 | 0.9737 | 0.2269 | 0.2269 | 0.0471 | 0.7704 | 0.1519 |
| LIMMA | 0.7644 | 0.0273 | 0.9727 | 0.2356 | 0.2356 | 0.0490 | 0.7623 | 0.1508 |
| eLNN | 0.7757 | 0.0260 | 0.9740 | 0.2243 | 0.2243 | 0.0466 | 0.7673 | 0.1467 |
| EBarrays | 0.7567 | 0.0282 | 0.9718 | 0.2433 | 0.2433 | 0.0506 | 0.7554 | 0.1500 |
| BetaEB | 0.8061 | 0.0202 | 0.9798 | 0.1939 | 0.1939 | 0.0361 | 0.8242 | 0.1633 |
| KW | 0.7536 | 0.0286 | 0.9714 | 0.2464 | 0.2464 | 0.0512 | 0.7490 | 0.1461 |
| Proposed | 0.8090 | 0.0199 | 0.9801 | 0.1910 | 0.1910 | 0.0363 | 0.8215 | 0.1636 |
We considered the estimated top 1944 genes for each method and then crossed with the designated ‘DE gene-set’ to calculate the summary statistics (TPR, TNR, FPR, FNR, FDR, MER, AUC and pAUC) for performance evaluation in the Spike gene expression profiles.
Fig 2Venn diagram and outlier gene expression profile for colon cancer data.
Comparison of the results on the colon cancer gene expression dataset. (a) Venn diagram of the top 100 genes estimated by KW, BetaEB and the proposed method. (b) Outlying DE genes detected by the proposed method only. The results for the control group are plotted below the lines, and the results for the cancer group are plotted above the lines.
Fig 3Venn diagram and outlier gene expression profile for pancreatic cancer data.
(a) Venn diagram of the DE genes estimated by all four methods (ANOVA, LIMMA, KW and Proposed) based on pairwise comparisons of CTC vs T, CTC vs P, CTC vs G, T vs P, T vs G and G vs P. (b) Frequency distributions of β-weights for each expression of the 8152 genes in 24 samples. (c) Scatter plot of the smallest β-weight for each of the 8152 genes vs. the gene index, where the smallest value represents the minimum value of 24 β-weights from 24 samples for each gene. The red circles between the two gray lines represent moderate/noisy outliers, whereas the other red circles, corresponding to β-weights of less than 0.2, represent extreme outliers. (d) Plot of ordered smallest β-weights in (c) for 8152 genes. (e) The 80 DE genes detected by the proposed method only, as shown in (a). Seventeen out of 80 DE genes were detected as extreme outlying genes using the β-weight function. The results for the T, P, G and CTC groups are plotted above the lines with four different colors. The outlying samples are indicated by circles above them.
Pairwise comparison analysis by all 4 methods with their corresponding selected significance DE genes.
| Numbers of DR and UR genes | ||||
|---|---|---|---|---|
| Corrected p-value < 0.05 | ||||
| Pair of groups | Predicted DR | Overlapped predicted DR | Predicted UR | Overlapped predicted UR |
| CTC vs T | {1570, 1909, 1533, | 1367 | {1392, 1663, 1294, | 1215 |
| CTC vs P | {1589, 1904, 1590, | 1413 | {1446, 1738, 1323, | 1247 |
| CTC vs G | {612, 911, 463, | 354 | {744, 938, 618, | 534 |
| T vs P | {20, 0, 25, | 0 | {51, 7, 72, | 7 |
| T vs G | {1001, 1179, 1125, | 983 | {890, 1132, 1107, | 856 |
| P vs G | {1081, 1281, 1215, | 1048 | {898, 1191, 1165, | 857 |
The values reported in the form {x, x, x, x} in this table represent the numbers of downregulated (DR) or upregulated (UR) differentially expressed (DE) genes estimated by the ANOVA, LIMMA, KW and proposed (Bold) methods, respectively. Note that indicates significant 2-fold downregulation and indicates significant 2-fold upregulation.
KEGG pathways for the 17 DE genes identified by the proposed method only.
| KEGG ID | KEGG pathway description | No. of genes (%) |
| Adjusted |
|---|---|---|---|---|
| hsa00410 | beta-Alanine metabolism | 2 (11.76) | 2.25e-05 | 6.75e-05 |
| hsa04010 | MAPK signaling pathway | 2 (11.76) | 0.0033 | 0.0099 |
| hsa01100 | metabolic pathways | 2 (11.76) | 0.0507 | 0.1521 |
KEGG terms that are significantly enriched in the 17 pancreatic-cancer-related genes identified by the proposed method only. The p-values were estimated using the hypergeometric test and then adjusted via the Bonferroni multiple testing correction (adjusted p-values).