| Literature DB >> 23311594 |
Wenlong Tang1, Junbo Duan, Ji-Gang Zhang, Yu-Ping Wang.
Abstract
In the clinical practice, many diseases such as glioblastoma, leukemia, diabetes, and prostates have multiple subtypes. Classifying subtypes accurately using genomic data will provide individualized treatments to target-specific disease subtypes. However, it is often difficult to obtain satisfactory classification accuracy using only one type of data, because the subtypes of a disease can exhibit similar patterns in one data type. Fortunately, multiple types of genomic data are often available due to the rapid development of genomic techniques. This raises the question on whether the classification performance can significantly be improved by combining multiple types of genomic data. In this article, we classified four subtypes of glioblastoma multiforme (GBM) with multiple types of genome-wide data (e.g., mRNA and miRNA expression) from The Cancer Genome Atlas (TCGA) project. We proposed a multi-class compressed sensing-based detector (MCSD) for this study. The MCSD was trained with data from TCGA and then applied to subtype GBM patients using an independent testing data. We performed the classification on the same patient subjects with three data types, i.e., miRNA expression data, mRNA (or gene expression) data, and their combinations. The classification accuracy is 69.1% with the miRNA expression data, 52.7% with mRNA expression data, and 90.9% with the combination of both mRNA and miRNA expression data. In addition, some biomarkers identified by the integrated approaches have been confirmed with results from the published literatures. These results indicate that the combined analysis can significantly improve the accuracy of classifying GBM subtypes and identify potential biomarkers for disease diagnosis.Entities:
Year: 2013 PMID: 23311594 PMCID: PMC3651309 DOI: 10.1186/1687-4153-2013-2
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
GBM subtypes and their corresponding samples used for the training and the testing
| Pro-neural | 15 | 17 |
| Neural | 15 | 3 |
| Classical | 15 | 17 |
| Mesenchymal | 15 | 18 |
These datasets are publically available from the TCGA project.
Figure 1Each probability density function is one-dimensional normal distribution (area under each curve sums to 1).
Comparison of classification accuracy between MCSD and non-compressed detector using combined and single data type
| Combined analysis | 90.9 | 121 | 32.7 |
| miRNA | 69.1 | 54 | 41.8 |
| Gene expression | 52.7 | 432 | 32.7 |
Figure 2The comparison of the classification accuracies between the combined analysis and the single data type analysis. All of them employed MCSD method to subtype four types of GBM. Note that a significant improvement of the classification accuracy has been achieved by using the combined analysis.
Figure 3Display of the selected features in distinguishing the four subtypes of GBM, i.e., pro-neural (P), neural (N), classical (C), and mesenchymal (M) for the testing dataset (a) and the training dataset (b). 121 features (3 miRNA expression probes on the top and followed by 118 mRNA expression probes) were chosen from both miRNA expression and mRNA expression data. Each row represents a feature and each column represents a sample/patient. Each feature is normalized by the largest value in each row. The samples with arrows were misclassified to the subtypes as denoted by the arrow.