| Literature DB >> 34178626 |
Sivan Gershanov1, Shreyas Madiwale2,3, Galina Feinberg-Gorenshtein2, Igor Vainer1, Tamar Nehushtan1, Shalom Michowiz3,4, Nitza Goldenberg-Cohen5,6,7, Yehudit Birger2, Helen Toledano3,8, Mali Salmon-Divon1,9.
Abstract
As treatment protocols for medulloblastoma (MB) are becoming subgroup-specific, means for reliably distinguishing between its subgroups are a timely need. Currently available methods include immunohistochemical stains, which are subjective and often inconclusive, and molecular techniques-e.g., NanoString, microarrays, or DNA methylation assays-which are time-consuming, expensive and not widely available. Quantitative PCR (qPCR) provides a good alternative for these methods, but the current NanoString panel which includes 22 genes is impractical for qPCR. Here, we applied machine-learning-based classifiers to extract reliable, concise gene sets for distinguishing between the four MB subgroups, and we compared the accuracy of these gene sets to that of the known NanoString 22-gene set. We validated our results using an independent microarray-based dataset of 92 samples of all four subgroups. In addition, we performed a qPCR validation on a cohort of 18 patients diagnosed with SHH, Group 3 and Group 4 MB. We found that the 22-gene set can be reduced to only six genes (IMPG2, NPR3, KHDRBS2, RBM24, WIF1, and EMX2) without compromising accuracy. The identified gene set is sufficiently small to make a qPCR-based MB subgroup classification easily accessible to clinicians, even in developing, poorly equipped countries.Entities:
Keywords: biomarkers; gene expression; machine learning; medulloblastoma; subgroup classification
Year: 2021 PMID: 34178626 PMCID: PMC8223061 DOI: 10.3389/fonc.2021.637482
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
The accuracies of the sets of attributes selected for classification by each algorithm, based on the GSE85217 dataset (n = 763 MB samples).
| Algorithm | Input1 | Accuracy(%) | Attributes required for classification (output)2 | Number of attributes required for classification |
|---|---|---|---|---|
| Decision tree3 | All attributes | 95.5 |
| 9 |
| 22 genes | 94.5 |
| 10 | |
| Decision rules3 | All attributes | 94.2 |
| 10 |
| 22 genes | 94 |
| 13 | |
| Random forest | All attributes | 97.8 |
| 21,641 |
| 22 genes | 97.1 |
| 22 | |
| SVM-SMO | All attributes | 98.4 |
| 21,641 |
| 22 genes | 97.8 |
| 22 |
1Attribute sets that were used as inputs for the algorithm.
2Attributes chosen by each algorithm for classification.
3Detailed results obtained from these algorithms can be found in and .
Figure 1Accuracy of the smallest best-performing gene sets output by the SARC classifier, applied on the GSE85217 dataset (n = 763 samples), (A) when introducing all 21,641 attributes as input, and (B) when introducing the Nanostring 22-gene set as input.
The accuracies of the top set of attributes selected for classification by the SARC algorithm for each input, based on the GSE85217 dataset (n = 763 MB samples).
| Input1 | Accuracy (%) | Attributes required for classification (output)2 | Number of attributes required for classification |
|---|---|---|---|
| All attributes | 98.6 |
| 32 |
| 22 genes | 98.3 |
| 15 |
1Attribute that were used as inputs for the algorithm.
2Attributes chosen by the algorithm for classification.
Classification accuracy of the reduced genes sets (12 genes or fewer), as compared with the full, 22-gene NanoString set, used on the independent validation datasets GSE37418 and GSE41842 (n = 92 MB samples altogether).
| Number of attributes | Accuracy (%) | Input set for validation1 |
|---|---|---|
| 22 | 91.30 |
|
| 12 | 96.74 |
|
| 8 | 90.22 |
|
| 7 | 93.48 |
|
| 6 | 93.48 |
|
| 5 | 82.61 |
|
| 4 | 81.52 |
|
1Attribute sets that were used as input for the validation based on the SARC classifier output, chosen from the GSE85217 dataset ( ).
Figure 2Validation of the predicted classification set outputs created by the SARC classifier. Expression t-SNE of the independent datasets GSE37418 and GSE41842 (n = 92) based on (A) a 22-gene NanoString panel set, (B) 12 genes out of the 22 Nanostring panel, and (C) six genes out of the 22 Nanostring panel.
Figure 3Demographic and clinical data of the patient cohort used for qPCR validation (n = 18). BMT, bone marrow transplantation; YA, young adult; N/A, not available. 1At first diagnosis. 2As of the completion of this study. More detailed information in .
Figure 4qPCR-based classification of an independent cohort, using reduced six-gene setout of the 22-gene NanoString set (IMPG2, NPR3, KHDRBS2, RBM24, WIF1, and EMX2). An unsupervised hierarchical clustering of gene expression levels was generated by using qPCR (dCt) values. (A) A cohort of 16 patients who were classified by NanoString as having either SHH, Group 3, or Group 4 MBs (n = 5, 3, and 8, respectively; see and ). (B) The same cohort, but with the addition of two patients who were classified as having a non-WNT/SHH MB. The Height (y axis) is a measure of closeness of either individual data points or clusters.