| Literature DB >> 29018623 |
Gokmen Zararsiz1,2, Dincer Goksuluk2,3, Bernd Klaus4, Selcuk Korkmaz2,5, Vahap Eldem6, Erdem Karabulut3, Ahmet Ozturk1,2.
Abstract
RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.Entities:
Keywords: Diagnostic biomarker discovery; Diagonal discriminant analysis; Gene-expression based classification; Machine learning; Nearest shrunken centroids; Voom transformation
Year: 2017 PMID: 29018623 PMCID: PMC5633036 DOI: 10.7717/peerj.3890
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1A flowchart of the steps of voomNSC algorithm.
Figure 2Selection of voomNSC threshold parameter for cervical data.
Figure 3Misclassification errors of classifiers for the simulation scenario K = 2, e = 5%, σ = 0.1.
Figure 4Sparsities of classifiers for the simulation scenario K = 2, e = 5%, σ = 0.1.
Misclassification errors of classifiers for real datasets.
| Classifier | Cervical | Alzheimer | Renal cell cancer | Lung Cancer |
|---|---|---|---|---|
| DLDA | 0.149(0.015) | 0.197(0.012) | 0.140(0.003) | 0.098(0.002) |
| DQDA | 0.140(0.012) | 0.188(0.012) | 0.135(0.003) | 0.098(0.002) |
| NBLDA | 0.198(0.014) | 0.139(0.003) | 0.098(0.002) | |
| NSC | 0.108(0.011) | 0.201(0.012) | 0.140(0.003) | 0.097(0.002) |
| PLDA1 | 0.287(0.029) | 0.317(0.014) | 0.756(0.044) | 0.262(0.028) |
| PLDA2 | 0.111(0.011) | 0.223(0.013) | 0.143(0.003) | 0.100(0.002) |
| RF | 0.135(0.012) | 0.204(0.013) | 0.077(0.002) | 0.062(0.002) |
| SVM | 0.101(0.010) | |||
| voomDLDA1 | 0.148(0.015) | 0.210(0.012) | 0.141(0.003) | 0.097(0.002) |
| voomDLDA2 | 0.211(0.019) | 0.228(0.015) | 0.139(0.003) | 0.097(0.002) |
| voomDLDA3 | 0.146(0.015) | 0.203(0.012) | 0.142(0.003) | 0.097(0.002) |
| voomDQDA1 | 0.164(0.014) | 0.181(0.012) | 0.134(0.002) | 0.097(0.002) |
| voomDQDA2 | 0.165(0.013) | 0.139(0.010) | 0.138(0.003) | 0.098(0.002) |
| voomDQDA3 | 0.153(0.014) | 0.170(0.011) | 0.137(0.003) | 0.095(0.002) |
| voomNSC1 | 0.119(0.013) | 0.227(0.010) | 0.181(0.002) | 0.097(0.002) |
| voomNSC2 | 0.111(0.010) | 0.226(0.018) | 0.192(0.003) | 0.097(0.002) |
| voomNSC3 | 0.112(0.012) | 0.233(0.012) | 0.184(0.002) | 0.092(0.002) |
Notes.
Values are misclassification errors, calculated from 50 repetitions and expressed as mean (standard error). Best performed methods are indicated as bold in each column.
Sparsities of classifiers for real datasets.
| Classifier | Cervical | Alzheimer | Renal cell cancer | Lung cancer |
|---|---|---|---|---|
| NSC | 194.18(27.40) | 333.06(19.04) | 1989.00(7.32) | 1685.22(47.73) |
| PLDA1 | 290.44(40.01) | 606.82(112.40) | 1339.90(112.54) | |
| PLDA2 | 126.66(29.13) | 228.97(22.53) | 1640.47(81.59) | 1060.84(70.93) |
| voomNSC1 | 48.06(10.78) | 85.04(39.34) | ||
| voomNSC2 | 59.16(13.60) | 140.32(20.22) | 700.90(114.63) | 122.44(33.22) |
| voomNSC3 | 63.34(13.94) | 30.02(8.10) | 208.22(42.35) |
Notes.
Values are the number of genes selected in each model, calculated from 50 repetitions and expressed as mean (standard error). Best performed methods are indicated as bold in each column.
Summary of voomNSC models and selected genes in real datasets.
| Classifier | Misclassification error | Number of features | Selected features |
|---|---|---|---|
| Cervical | 2/58 | 14 | |
| Alzheimer | 13/70 | 3 | |
| Renal cell cancer | 87/1,020 | 87 | |
| Lung cancer | 96/1,118 | 6 |
Figure 5Illustration of voomDDA web-tool.
Figure 6A Venn-diagram displaying the number of selected miRNAs.