| Literature DB >> 35911980 |
Zhandong Li1, Deling Wang2, Wei Guo3, Shiqi Zhang4, Lei Chen5, Yu-Hang Zhang6, Lin Lu7, XiaoYong Pan8, Tao Huang9, Yu-Dong Cai10.
Abstract
Mammalian cortical interneurons (CINs) could be classified into more than two dozen cell types that possess diverse electrophysiological and molecular characteristics, and participate in various essential biological processes in the human neural system. However, the mechanism to generate diversity in CINs remains controversial. This study aims to predict CIN diversity in mouse embryo by using single-cell transcriptomics and the machine learning methods. Data of 2,669 single-cell transcriptome sequencing results are employed. The 2,669 cells are classified into three categories, caudal ganglionic eminence (CGE) cells, dorsal medial ganglionic eminence (dMGE) cells, and ventral medial ganglionic eminence (vMGE) cells, corresponding to the three regions in the mouse subpallium where the cells are collected. Such transcriptomic profiles were first analyzed by the minimum redundancy and maximum relevance method. A feature list was obtained, which was further fed into the incremental feature selection, incorporating two classification algorithms (random forest and repeated incremental pruning to produce error reduction), to extract key genes and construct powerful classifiers and classification rules. The optimal classifier could achieve an MCC of 0.725, and category-specified prediction accuracies of 0.958, 0.760, and 0.737 for the CGE, dMGE, and vMGE cells, respectively. The related genes and rules may provide helpful information for deepening the understanding of CIN diversity.Entities:
Keywords: cortical interneuron diversity; embryo; ganglionic eminences; machine learning; rule learning
Year: 2022 PMID: 35911980 PMCID: PMC9337837 DOI: 10.3389/fnins.2022.841145
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
FIGURE 1Overall workflow of the study.
FIGURE 2The IFS curves of RIPPER and RF. RIPPER can reach the highest MCC of 0.577, whereas RF can reach the highest MCC of 0.725.
Performance of some key classifiers.
| Classification algorithm | Number of features | ACC | MCC |
| RIPPER | 1,650 | 0.718 | 0.577 |
| RF | 240 | 0.816 | 0.725 |
| RF | 120 | 0.807 | 0.711 |
FIGURE 3The Individual accuracies of some key classifier. Two RF classifiers provide almost equal performance and are superior to the RIPPER classifier.
FIGURE 4The IFS curve of RF between 10 and 500. The RF classifier with top 120 features yielded a little lower MCC than the classifier with top 240 features, which is the optimal RF classifier.
FIGURE 5Box plot to show the performance of RF classifier with top 120 features under 10-fold cross-validation for 10 times. ACC and MCC vary in a small range, indicating the stability of the classifier.
The 22 classification rules generated by RIPPER for predicting cell types.
| Index | Rule | Label |
| 1 | NKX2-1 ≤ 3.6524) and (MT-TM ≥ 1.4699) and (MEIS2 ≥ 0.7558) and (H3F3B ≥ 2019.223223) | Caudal ganglionic eminence |
| 2 | (FOXP2 ≥ 0.0047) and (NKX2-1 ≤ 5.5415) and (TMSB10 ≤ 76.7268) and (NR2F1 ≥ 25.2511) | Caudal ganglionic eminence |
| 3 | (NKX2-1 ≤ 12.9594) and (RPS20 ≤ 17.1104) and (SLC7A11 ≥ 0.0538) and (PID1 ≥ 0.4191) and (LHX8 ≤ 0.0839) | Caudal ganglionic eminence |
| 4 | (NR2F2 ≥ 0.2374) and (LHX6 ≤ 0.3856) and (5730494M16RIK > = 14.4383) and (LHX8 ≤ 7.0592) | Caudal ganglionic eminence |
| 5 | (FOXP2 ≥ 0.1294) and (EPHA5 ≥ 25.8453) and (RPS9 ≤ 159.5439) and (DCX ≥ 49.7284) | Caudal ganglionic eminence |
| 6 | (NR2F2 ≥ 2.2717) and (EPHA5 ≥ 3.4019) and (LHX8 ≤ 0) and (MEIS2 ≥ 18.9392) and (NCAPH ≤ 2.4094) | Caudal ganglionic eminence |
| 7 | (ENC1 ≤ 1.8862) and (NKX2-1 ≤ 12.0905) and (STX7 ≥ 0.3228) and (CALM1 ≤ 546.2493) and (SOX6 ≤ 2.8190) | Caudal ganglionic eminence |
| 8 | (GM6180 ≥ 36.7994) and (FOXP2 ≥ 3.5745) and (GM13340 ≥ 34.469118) | Caudal ganglionic eminence |
| 9 | (EPHA5 ≥ 0.1196) and (LHX8 ≤ 0.1496) and (GM15266 ≤ 19.2334) and (GM1821 ≤ 89.839694) and (CALM1 ≤ 744.1502) | Caudal ganglionic eminence |
| 10 | (BIRC6 ≤ 0) and (NKX2-1 ≤ 10.7393) and (CRIP2 ≥ 400.8077) and (ZFP238 ≥ 6.1810) and (GM10039 ≥ 12.3788) | Caudal ganglionic eminence |
| 11 | (ARRDC3 ≥ 368.6154) and (PTPRS ≤ 24.5126) and (ZFP238 ≤ 70.9705) | Caudal ganglionic eminence |
| 12 | (ZFP503 ≥ 0.1111) and (MAP4K4 ≤ 7.0016) | Caudal ganglionic eminence |
| 13 | (LHX8 ≥ 1.4487) and (ZIC1 ≥ 0.0111) | Ventral medial ganglionic eminence |
| 14 | (LHX8 ≥ 0.3736) and (NR2F1 ≤ 9.4921) and (ZSWIM5 ≤ 0.0334) | Ventral medial ganglionic eminence |
| 15 | (NR2F1 ≤ 15.6811) and (CD24A ≥ 2.7069) and (BASP1 ≤ 66.0865) and (PKIA ≤ 1.8492) | Ventral medial ganglionic eminence |
| 16 | (PAK3 ≤ 0.7899) and (RPS18 ≥ 4.9001) and (PID1 ≥ 1.2578) | Ventral medial ganglionic eminence |
| 17 | (YWHAZ ≥ 1.8915) and (SEPT11 ≤ 7.8189) and (GM13604 ≤ 0.2081) and (H2AFV ≥ 1305.9678) | Ventral medial ganglionic eminence |
| 18 | (PAK3 ≤ 0.0474) and (CFL1 ≥ 31.1042) and (GTF2A1 ≤ 20.8911) and (LHX6 ≤ 5.6025) | Ventral medial ganglionic eminence |
| 19 | (GM15266 ≤ 63.0475) and (MT-RNR2 ≥ 9387.3828) | Ventral medial ganglionic eminence |
| 20 | (CITED2 ≤ 35.2427) and (GM10718 ≤ 1.0898) and (2610017I09RIK ≤ 2442.7505) and (GSK3B ≤ 1.5164) | Ventral medial ganglionic eminence |
| 21 | (UBE2QL1 ≥ 1.5601) and (3110003A17RIK ≥ 20.2789) and (RTN1 ≤ 4.7837) | Ventral medial ganglionic eminence |
| 22 | (GM3511 ≥ 22.9860) and (PTMA ≥ 2.2359) and (LHX6 ≤ 47.9178) | Ventral medial ganglionic eminence |
| 23 | Others | Dorsal medial ganglionic eminence |
Details of essential genes.
| Gene symbol | Description | Rank in the feature list |
| Lhx8 | LIM homeobox 8 | 2 |
| Calm1 | Calmodulin 1 | 4 |
| Hmgb1 | High mobility group box 1 | 9 |
| Meis2 | Meis homeobox 2 | 12 |
| Nr2f2 | Nuclear receptor subfamily 2 group F member 2 | 22 |
| Basp1 | Brain abundant membrane attached signal protein 1 | 7 |
| Actb | Actin beta | 23 |
| Zic1 | Zic family member 1 | 30 |
| Nr2f1 | Nuclear receptor subfamily 2 group F member 1 | 15 |
| Rps29 | Ribosomal protein S29 | 18 |