| Literature DB >> 33119584 |
Komal S Rathi1,2,3, Sherjeel Arif1,2,3, Mateusz Koptyra1,3, Ammar S Naqvi1,2,3, Deanne M Taylor2,4, Phillip B Storm1,3, Adam C Resnick1,3, Jo Lynne Rokita1,2,3, Pichai Raman1,2,3.
Abstract
Medulloblastoma is a highly heterogeneous pediatric brain tumor with five molecular subtypes, Sonic Hedgehog TP53-mutant, Sonic Hedgehog TP53-wildtype, WNT, Group 3, and Group 4, defined by the World Health Organization. The current mechanism for classification into these molecular subtypes is through the use of immunostaining, methylation, and/or genetics. We surveyed the literature and identified a number of RNA-Seq and microarray datasets in order to develop, train, test, and validate a robust classifier to identify medulloblastoma molecular subtypes through the use of transcriptomic profiling data. We have developed a GPL-3 licensed R package and a Shiny Application to enable users to quickly and robustly classify medulloblastoma samples using transcriptomic data. The classifier utilizes a large composite microarray dataset (15 individual datasets), an individual microarray study, and an RNA-Seq dataset, using gene ratios instead of gene expression measures as features for the model. Discriminating features were identified using the limma R package and samples were classified using an unweighted mean of normalized scores. We utilized two training datasets and applied the classifier in 15 separate datasets. We observed a minimum accuracy of 85.71% in the smallest dataset and a maximum of 100% accuracy in four datasets with an overall median accuracy of 97.8% across the 15 datasets, with the majority of misclassification occurring between the heterogeneous Group 3 and Group 4 subtypes. We anticipate this medulloblastoma transcriptomic subtype classifier will be broadly applicable to the cancer research and clinical communities.Entities:
Mesh:
Year: 2020 PMID: 33119584 PMCID: PMC7654754 DOI: 10.1371/journal.pcbi.1008263
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Confusion Matrix, Accuracy, and other evaluation metrics obtained after combining 15 test MB datasets followed by applying the classifier on the combined dataset (N = 1,286 samples).
| Confusion Matrix | ||||
|---|---|---|---|---|
| Ref_Group3 | Ref_Group4 | Ref_WNT | Ref_SHH | |
| 210 | 10 | 0 | 1 | |
| 5 | 478 | 0 | 0 | |
| 0 | 0 | 110 | 2 | |
| 3 | 7 | 0 | 392 | |
| 97.70% | ||||
| 96.70% | ||||
| 96.70% | ||||
| 98.50% | ||||
| 40.60% | ||||
| 0.00E+00 | ||||
| NaN | ||||
| 96.30% | 98.90% | 95.00% | 99.20% | |
| 96.60% | 99.30% | 99% | 97.70% | |
| 100% | 99.80% | 98.20% | 100% | |
| 99.20% | 98.80% | 97.50% | 99.60% | |
| 95.00% | 96.30% | 95.70% | 17.90% | |
| 99% | 96.60% | 97.80% | 40.64% | |
| 98.20% | 100% | 99.10% | 9.03% | |
| 97.50% | 99.20% | 98.40% | 32.43% | |
| 17.24% | 18.10% | 97.60% | ||
| 39.24% | 39.70% | 98% | ||
| 9.03% | 9.20% | 99.90% | ||
| 32.18% | 33.00% | 99.00% |
Subtype-specific Sensitivity, Specificity and overall Accuracy across 15 test MB datasets.
| Study | Sample_Size | Group3_Sensitivity | Group3_Specificity | Group4_Sensitivity | Group4_Specificity | |
|---|---|---|---|---|---|---|
| 19 | 100% | 100% | 100% | 100% | ||
| 62 | 100% | 98% | 100% | 100% | ||
| 40 | 87.50% | 96.80% | 95% | 94.70% | ||
| 103 | 95.70% | 98.60% | 94.30% | 98.30% | ||
| 30 | 100% | 92.60% | 87.50% | 100% | ||
| 50 | 100% | 100% | 100% | 100% | ||
| 19 | 100% | 100% | 100% | 100% | ||
| 58 | NA | 100% | NA | 100% | ||
| 24 | 100% | 100% | 100% | 100% | ||
| 12 | NA | 100% | NA | 100% | ||
| 8 | 100% | 80% | 66.70% | 100% | ||
| 22 | 88.90% | 100% | 100% | 100% | ||
| 46 | 66.70% | 100% | 100% | 91.40% | ||
| 30 | 100% | 92.30% | 90.90% | 100% | ||
| 763 | 98.50% | 99.30% | 96.80% | 100% | ||
| 100% | 100% | NA | 100% | 100% | ||
| 92.90% | 100% | 100% | 100% | 98.30% | ||
| 100% | 100% | 100% | 100% | 94.90% | ||
| 100% | 98.50% | 100% | 100% | 96.80% | ||
| 100% | 100% | 100% | 100% | 93.30% | ||
| 90% | 100% | NA | 97.80% | 97.80% | ||
| 100% | 100% | 100% | 100% | 100% | ||
| 98.30% | NA | NA | 98.30% | 98.30% | ||
| 100% | 100% | 100% | 100% | 100% | ||
| 100% | NA | NA | 100% | 100% | ||
| 100% | 100% | 100% | 100% | 85.71% | ||
| 100% | 94.10% | 100% | 100% | 95.50% | ||
| 100% | 100% | 100% | 100% | 93.30% | ||
| 100% | 100% | 100% | 100% | 95% | ||
| 100% | 98.50% | 100% | 100% | 98.40% | ||