| Literature DB >> 30322114 |
Yu-Dong Cai1, Shiqi Zhang2,3, Yu-Hang Zhang4, Xiaoyong Pan5, KaiYan Feng6, Lei Chen7,8, Tao Huang9, Xiangyin Kong10.
Abstract
As a common brain cancer derived from glial cells, gliomas have three subtypes: glioblastoma, diffuse astrocytoma, and anaplastic astrocytoma. The subtypes have distinctive clinical features but are closely related to each other. A glioblastoma can be derived from the early stage of diffuse astrocytoma, which can be transformed into anaplastic astrocytoma. Due to the complexity of these dynamic processes, single-cell gene expression profiles are extremely helpful to understand what defines these subtypes. We analyzed the single-cell gene expression profiles of 5057 cells of anaplastic astrocytoma tissues, 261 cells of diffuse astrocytoma tissues, and 1023 cells of glioblastoma tissues with advanced machine learning methods. In detail, a powerful feature selection method, Monte Carlo feature selection (MCFS) method, was adopted to analyze the gene expression profiles of cells, resulting in a feature list. Then, the incremental feature selection (IFS) method was applied to the obtained feature list, with the help of support vector machine (SVM), to extract key features (genes) and construct an optimal SVM classifier. Several key biomarker genes, such as IGFBP2, IGF2BP3, PRDX1, NOV, NEFL, HOXA10, GNG12, SPRY4, and BCL11A, were identified. In addition, the underlying rules of classifying the three subtypes were produced by Johnson reducer algorithm. We found that in diffuse astrocytoma, PRDX1 is highly expressed, and in glioblastoma, the expression level of PRDX1 is low. These rules revealed the difference among the three subtypes, and how they are formed and transformed. These genes are not only biomarkers for glioma subtypes, but also drug targets that may switch the clinical features or even reverse the tumor progression.Entities:
Keywords: Johnson reducer algorithm; Monte Carlo feature selection; gene expression; glioma; support vector machine
Year: 2018 PMID: 30322114 PMCID: PMC6210469 DOI: 10.3390/jcm7100350
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Figure 1A flowchart to show the procedures of the method. The gene expression profile was analyzed by the Monte Carlo feature selection method, yielding a feature list. Some top-ranked features were used to produce classification rules via Johnson reducer algorithm. The incremental feature selection method used the feature list to extract optimal features and construct the optimal classifier, with the help of support vector machine.
Twenty-four detected rules for classifying different glioma subtypes.
| Rules | Criteria | Glioma Subtype | Rules | Criteria | Glioma Subtype |
|---|---|---|---|---|---|
| Rule1 | XIST ≥ 2.725 | diffuse astrocytoma | Rule2 | XIST ≥ 3.588 | diffuse astrocytoma |
| Rule3 | XIST ≥ 3.132 | diffuse astrocytoma | Rule4 | XIST ≥ 2.601 | diffuse astrocytoma |
| Rule5 | XIST ≥ 2.395 | diffuse astrocytoma | Rule6 | XIST ≥ 2.395 | diffuse astrocytoma |
| Rule7 | XIST ≥ 2.395 | diffuse astrocytoma | Rule8 | XIST ≥ 3.021 | diffuse astrocytoma |
| Rule9 | PCDHB7 ≥ 3.827 | diffuse astrocytoma | Rule10 | RHOB ≥ 6.545 | diffuse astrocytoma |
| Rule11 | RPSAP58 ≤ 1.280 | glioblastoma | Rule12 | TCF12 ≤ 4.952 | glioblastoma |
| Rule13 | NRCAM ≤ 0.999 | glioblastoma | Rule14 | RPSAP58 ≤ 1.414 | glioblastoma |
| Rule15 | NRCAM ≤ 2.392 | glioblastoma | Rule16 | FAM110B ≤ 2.527 | glioblastoma |
| Rule17 | FAM110B ≤ 2.607 | glioblastoma | Rule18 | TCF12 ≤ 4.215 | glioblastoma |
| Rule19 | RIA2 ≤ 3.045 | glioblastoma | Rule20 | NRCAM ≤ 1.090 | glioblastoma |
| Rule21 | SMOC1 ≤ 1.959 | glioblastoma | Rule22 | NRCAM ≤ 0.548 | glioblastoma |
| Rule23 | NRCAM ≤ 0.548 | glioblastoma | Rule24 | Other conditions | anaplastic astrocytoma |
Figure 2Confusion matrix for 10-fold cross-validation based on the detected 24 rules for classifying three glioma subtypes. The numbers were pooled from running 10-fold cross-validation on the training data thrice. The darker the color is, the higher the proportion is.
Figure 3Incremental feature selection (IFS) curve derived from the IFS method and support vector machine (SVM) classifier. X-axis is the number of features involved in building classifiers. Y-axis is their corresponding MCC values. (A) IFS curve with X-values of 10 to 23,686. The selected feature intervals were 300 and 600, which were marked with two vertical lines; (B) IFS curve with X-values of 300 to 600 for the SVM classifier. When the 539 features were selected, the MCC value (0.889) is the highest.
Top nine genes yielded by Monte Carlo feature selection (MCFS) method.
| Rank | Gene Symbol | Description | Relative importance (RI) |
|---|---|---|---|
| 1 | IGFBP2 | Insulin-Like Growth Factor Binding Protein 2 | 0.1375 |
| 2 | PRDX1 | Peroxiredoxin 1 | 0.1226 |
| 3 | NOV | Nephroblastoma Overexpressed | 0.1194 |
| 4 | NEFL | Neurofilament Light | 0.1100 |
| 5 | HOXA10 | Homeobox A10 | 0.1059 |
| 6 | GNG12 | G Protein Subunit Gamma 12 | 0.0942 |
| 7 | IGF2BP3 | Insulin Like Growth Factor 2 MRNA Binding Protein 3 | 0.0891 |
| 8 | SPRY4 | Sprouty RTK Signaling Antagonist 4 | 0.0865 |
| 9 | BCL11A | B Cell CLL/Lymphoma 11A | 0.0847 |
Figure 4A heatmap to illustrate the expression level of three glioma subtypes on top nine genes.