| Literature DB >> 25176111 |
Y-h Taguchi1, Yoshiki Murakami.
Abstract
BACKGROUND: The selection of disease biomarkers is often difficult because of their unstable identification, i.e., the selection of biomarkers is heavily dependent upon the set of samples analyzed and the use of independent sets of samples often results in a completely different set of biomarkers being identified. However, if a fixed set of disease biomarkers could be identified for the diagnosis of multiple diseases, the difficulties of biomarker selection could be reduced.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25176111 PMCID: PMC4161864 DOI: 10.1186/1756-0500-7-581
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Accuracies achieved by various discrimination methods and FEs. (a) Dependence upon diseases and methods. (b) Boxplot of accuracies.
Performance of UDB with PCA-based LDA and SVM
| Diseases | Accuracy | Sensitivity | Specificity |
|---|---|---|---|
| PCA-based LDA | |||
| AD | 0.829 | 0.833 | 0.818 |
| Carcinoma | 0.768 | 0.730 | 0.800 |
| CAD | 0.846 | 0.846 | 0.846 |
| NPC | 0.740 | 0.806 | 0.632 |
| HCC | 0.700 | 0.700 | 0.700 |
| BC | 0.870 | 0.813 | 0.955 |
| AML | 0.784 | 0.769 | 0.846 |
| Mean | 0.791 | 0.785 | 0.800 |
| Mean of previous study [ | 0.784 | 0.750 | 0.800 |
| SVM | |||
| AD | 0.914 | 0.917 | 0.909 |
| Carcinoma | 0.786 | 0.867 | 0.692 |
| CAD | 0.769 | 0.769 | 0.769 |
| NPC | 0.720 | 0.806 | 0.579 |
| HCC | 0.725 | 0.550 | 0.900 |
| BC | 0.852 | 0.813 | 0.909 |
| AML | 0.938 | 0.981 | 0.769 |
| Mean | 0.815 | 0.815 | 0.800 |
AD, Alzheimer’s disease; CAD, coronary artery disease; NPC, nasopharyngeal carcinoma; HCC, hepatocellular carcinoma; BC, breast cancer; AML, acute myeloid leukemia; UDB, universal disease biomarker; SVM, support vector machine; LDA, linear discriminant analysis; PCA, principal component analysis. Data from previous study [23] are also shown for comparison.
Performance of lasso-based discrimination
| Diseases | Accuracy | Sensitivity | Specificity | Optimal |
|---|---|---|---|---|
| AD | 0.928 | 0.979 | 0.818 | 0.09 |
| Carcinoma | 0.818 | 0.867 | 0.760 | 0.9 |
| CAD | 0.884 | 0.769 | 1.000 | 0.24 |
| NPC | 0.900 | 0.935 | 0.842 | 1 |
| HCC | 0.825 | 0.650 | 1.000 | 0.03 |
| BC | 0.925 | 0.906 | 0.955 | 0.46 |
| AML | 0.985 | 1.000 | 0.923 | 0.64 |
| Mean | 0.895 | 0.872 | 0.900 |
AD, Alzheimer’s disease; CAD, coronary artery disease; NPC, nasopharyngeal carcinoma; HCC, hepatocellular carcinoma; BC, breast cancer; AML, acute myeloid leukemia. s (fraction) is used for the predict.lars function (see Methods).
Figure 2Stabilities achieved by UDB, PCA-based FE and lasso. Since no selections are required for UDB, stabilities of UDB are uniquely designated as 1.
The number of miRNAs that exhibit significant differences between normal controls and patients for each disease
| Diseases | Significant | Not significant |
|---|---|---|
| AD | 4 | 498 |
| Carcinoma | 7 | 558 |
| CAD | 0 | 746 |
| NPC | 264 | 622 |
| HCC | 0 | 255 |
| BC | 86 | 188 |
| AML | 6 | 122 |
AD, Alzheimer’s disease; CAD, coronary artery disease; NPC, nasopharyngeal carcinoma; HCC, hepatocellular carcinoma; BC, breast cancer; AML, acute myeloid leukemia. For more details, see Methods.
Performance of miRNAs selected by PCA-based FE with PCA-based LDA and SVM
| Number of | ||||||
|---|---|---|---|---|---|---|
| Diseases | Accuracy | Sens. | Spec. | miRNAs * | PCs + |
|
| PCA-based LDA | ||||||
| AD | 0.886 | 0.917 | 0.818 | 22 | 16 | 2.5 |
| Carcinoma | 0.857 | 0.846 | 0.867 | 36 | 2 | 7 |
| CAD | 0.885 | 0.923 | 0.846 | 16 | 14 | 9 |
| NPC | 0.720 | 0.806 | 0.579 | 28 | 18 | 5 |
| HCC | 0.650 | 0.600 | 0.700 | 8 | 1 | 7 |
| BC | 1.000 | 1.000 | 1.000 | 18 | 13 | 6 |
| AML | 0.862 | 0.846 | 0.923 | 11 | 8 | 7 |
| Mean | 0.837 | 0.848 | 0.819 | |||
| Mean of previous study [ | 0.784 | 0.750 | 0.800 | |||
| SVM | ||||||
| AD | 0.843 | 0.833 | 0.864 | 22 | ||
| Carcinoma | 0.786 | 0.807 | 0.767 | 36 | ||
| CAD | 0.807 | 0.615 | 1.000 | 16 | ||
| NPC | 0.720 | 0.774 | 0.632 | 28 | ||
| HCC | 0.770 | 0.550 | 0.850 | 8 | ||
| BC | 0.963 | 1.000 | 0.938 | 18 | ||
| AML | 0.969 | 1.000 | 0.846 | 11 | ||
| Mean | 0.837 | 0.797 | 0.842 | |||
*number of miRNAs selected by PCA-based FE, +optimal number of PCs estimated by LOOCV, #threshold value of PCA-based FE. Data from previous study [23] are also shown for comparison. AD, Alzheimer’s disease; CAD, coronary artery disease; NPC, nasopharyngeal carcinoma; HCC, hepatocellular carcinoma; BC, breast cancer; AML, acute myeloid leukemia; UDB, universal disease biomarker; SVM, support vector machine; LDA, linear discriminant analysis; PCA, principal component analysis.
KEGG pathway analysis of 12 miRNAs included in the UDB using DIANA-mirpath [ [25]]
| KEGG.pathway | p.value | # of genes | # of miRNAs | |
|---|---|---|---|---|
| 1 |
| 0.00e+00 | 6 | 2 |
| 2 |
| 3.00e-13 | 39 | 6 |
| 3 | PI3K-Akt signaling pathway ∗ | 1.07e-11 | 43 | 4 |
| 4 | TGF-beta signaling pathway ∗ | 5.98e-10 | 14 | 4 |
| 5 |
| 1.56e-09 | 27 | 5 |
| 6 | Ribosome | 6.04e-09 | 22 | 1 |
| 7 |
| 6.33e-09 | 17 | 5 |
| 8 |
| 1.02e-08 | 9 | 6 |
| 9 | Ribosome biogenesis | 2.02e-08 | 20 | 1 |
| in eukaryotes | ||||
| 10 | p53 signaling pathway ∗ | 6.75e-08 | 16 | 5 |
| 11 | RNA transport | 1.19e-07 | 28 | 1 |
| 12 | Cell cycle ∗ | 1.56e-07 | 22 | 3 |
| 13 |
| 2.75e-07 | 11 | 5 |
| 14 |
| 1.17e-06 | 12 | 5 |
| 15 |
| 4.64e-06 | 16 | 5 |
| 16 |
| 6.94e-06 | 7 | 4 |
| 17 |
| 2.01e-05 | 8 | 4 |
| 18 |
| 1.28e-04 | 19 | 5 |
| 19 | Protein export | 3.29e-04 | 9 | 1 |
| 20 |
| 3.69e-04 | 6 | 5 |
| 21 |
| 1.43e-03 | 11 | 3 |
| 22 |
| 1.46e-03 | 6 | 3 |
| 23 |
| 1.71e-03 | 7 | 5 |
| 24 |
| 1.22e-02 | 9 | 3 |
|
| ||||
| 25 | Oocyte meiosis | 1.36e-02 | 14 | 1 |
| 26 | Focal adhesion ∗ | 1.50e-02 | 8 | 3 |
| 27 |
| 2.45e-02 | 19 | 3 |
Bold faces: tumors/cancers, Bold italic: other diseases, Italic: tumors/cancers related, *parts of “Pathways in cancer”, and surrounded by blue rectangular in Figure 3.
Figure 3KEGG pathway: “Pathways in cancer”. Yellow: genes targeted by an miRNA included in the UDB in this study. Orange: genes targeted by more than one miRNAs included in the UDB in this study. Pathways surrounded by blue rectangles are listed in Table 5.
Details of data normalization
| Data set names/ | Data normalization | Datanormalization | ||
|---|---|---|---|---|
| GEO ID | Disease | Data retrieval methods | timing | methods |
| GSE46579 | AD | GSE46579_AD_ngs_data_summarized.xls.gz | before FE | zero mean/variance is one |
| GSE37472 | carcinoma | getGEO | before FE | zero mean/variance is one |
| GSE49823 | CAD | getGEO | after FE | zero mean/variance is one ∗ |
| GSE43329 | NPC | getGEO | before FE | zero mean/variance is one + |
| GSE50013 | HCC | getGEO | before FE | zero mean/variance is one ∗ |
| GSE41922 | BC | GSE41922_series_matrix.txt.gz | after FE | zero mean/variance is one ∗ |
| GSE49665 | AML | getGEO | after FE | zero mean/variance is one ∗ |
*no normalization for SVM/lasso, +no normalization for SVM with PCA-based FE, #after FE for PCA-based LDA with universal features. All the sample normalizations were sample-based; i.e., each sample was normalized to have both zero mean and unit variance. AD, Alzheimer disease; CAD, coronary artery disease; NPC, nasopharyngeal carcinoma; HCC, hepatocellular carcinoma; BC, breast cancer; AML, acute myeloid leukemia. Data retrieval methods/data set names were used to name files and for analysis. getGEO indicates that individual sample profiles whose files names started with “GEO” were downloaded by the getGEO command in R.