| Literature DB >> 29649131 |
Xiaoyong Pan1,2, Xiaohua Hu3, Yu Hang Zhang4, Kaiyan Feng5, Shao Peng Wang6, Lei Chen7, Tao Huang8, Yu Dong Cai9.
Abstract
Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew's correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.Entities:
Keywords: Down syndrome; Monte Carlo feature selection; atrioventricular septal defect; random forest; self-normalizing neural network
Year: 2018 PMID: 29649131 PMCID: PMC5924550 DOI: 10.3390/genes9040208
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Flowchart of our proposed pipeline. 236 patients with both Down syndrome (DS) and atrioventricular septal defect (AVSD) (positive samples) and 290 patients had only DS (negative samples) were measured by the copy numbers of 52,842 probes on chromosome 21. Then, all features were evaluated by Monte Carlo feature selection method (MCFS), resulting in a feature list and several informative features. The feature list was used in the incremental feature selection method to construct an optimal self-normalizing neural network (SNN) classifier and extract optimal features. The informative features were feed into the Johnson Reducer algorithm to extract decision rules.
Figure 2The procedures of RIPPER algorithm.
Figure 3Incremental feature selection (IFS) curves derived from the IFS method and SNN algorithm. X-axis is the number of features participating in building classifiers in feature subsets. Y-axis is their corresponding Matthew’s correlation coefficient (MCC) or area under the curve (AUC) values. (A) IFS curve with X-values of 10 to 5000. The selected feature interval for SNN algorithm is [2001, 4999], which were marked with two vertical lines. (B) IFS curve with X-values of 2001 to 4999 for SNN algorithm. When first 2737 features in feature list were considered, the optimal MCC value reached 0.748, which is marked by a red diamond.
Optimal number of features and MCC values yielded from the optimal SNN and RF classifiers.
| Classification Algorithm | Number of Features | MCC | AUC |
|---|---|---|---|
| SNN | 2737 | 0.748 | 0.915 |
| Random forest | 132 | 0.582 | 0.834 |
Figure 4The receiver operating characteristic (ROC) curves for the optimal SNN and Random Forest classifier.
Figure 5IFS curves derived from the IFS method and RF algorithm. X-axis is the number of features participating in building classifiers in feature subsets. Y-axis is their corresponding MCC or AUC values. When first 132 features in feature list were considered, the optimal MCC value reached 0.582, which is marked by a triangle.
Three decision rules extracted from the informative features.
| Classification | Rules | Features | Criteria |
|---|---|---|---|
| With AVSD | Rule 1 | A_16_P41408273 | ≤−0.00593 |
| With AVSD | Rule 2 | A_16_P03593084 | ≥−0.0164 |
| A_16_P03593084 | ≤0.075 | ||
| A_16_P41408273 | ≥0.0248 | ||
| Without DS | Rule 3 | Other conditions |
Detailed analyzed optimal features in Section 4.2.
| No. | Feature Name | Gene Name |
|---|---|---|
| 1 | A_16_P03593084 | |
| 2 | A_16_P03583086 | |
| 3 | A_16_P03587947 | |
| 4 | A_16_P21251330 | |
| 5 | A_16_P41466725 | |
| 6 | A_16_P41430034 |