| Literature DB >> 29670206 |
Xiong Li1,2, Liyue Liu3, Juan Zhou3, Che Wang3.
Abstract
Understanding genetic mechanism of complex diseases is a serious challenge. Existing methods often neglect the heterogeneity phenomenon of complex diseases, resulting in lack of power or low reproducibility. Addressing heterogeneity when detecting epistatic single nucleotide polymorphisms (SNPs) can enhance the power of association studies and improve prediction performance of complex diseases diagnosis. In this study, we propose a three-stage framework including epistasis detection, clustering and prediction to address both epistasis and heterogeneity of complex diseases based on deep learning method. The epistasis detection stage applies a multi-objective optimization method to find several candidate sets of epistatic SNPs which contribute to different subtypes of complex diseases. Then, a K-means clustering algorithm is used to define subtypes of the case group. Finally, a deep learning model has been trained for disease prediction based on graphics processing unit (GPU). Experimental results on pure and heterogeneous datasets show that our method has potential practicality and can serve as a possible alternative to other methods. Therefore, when epistasis and heterogeneity exist at the same time, our method is especially suitable for diagnosis of complex diseases.Entities:
Mesh:
Year: 2018 PMID: 29670206 PMCID: PMC5906634 DOI: 10.1038/s41598-018-24588-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The three-stage of the DPEH.
Figure 2The framework of the DLM.
Figure 3The prediction accuracy on pure datasets: (a) All the pure datasets are generated by 2 epistatic SNPs; (b) All the pure datasets are generated by 3 epistatic SNPs.
Figure 4The prediction accuracy in disease model H1: (a) All the datasets are generated by 2 epistatic SNPs; (b) All the datasets are generated by 3 epistatic SNPs.
Figure 5The prediction accuracy in disease model H2: (a) All the datasets are generated by 2 epistatic SNPs; (b) All the datasets are generated by 3 epistatic SNPs.
Figure 6The prediction accuracy in disease model H1 and H2: (a) All the datasets composed with two disease models and each model involved in 2 epistatic SNPs; (b) All the datasets composed with two disease models and each model involved in 3 epistatic SNPs.
The configurations of experimental datasets.
| Data ID | Sample size | MAF | Heterogeneity proportion |
|---|---|---|---|
| Pure1 | 1000 | (0.2, 0.2) | 1.0 |
| Pure2 | 2000 | (0.2, 0.2) | 1.0 |
| Pure3 | 3000 | (0.2, 0.2) | 1.0 |
| Pure4 | 4000 | (0.2, 0.2) | 1.0 |
| Pure5 | 8000 | (0.2, 0.2) | 1.0 |
| Pure6 | 1000 | (0.2,0.2,0.2) | 1.0 |
| Pure7 | 2000 | (0.2,0.2,0.2) | 1.0 |
| Pure8 | 3000 | (0.2,0.2,0.2) | 1.0 |
| Pure9 | 4000 | (0.2,0.2,0.2) | 1.0 |
| Pure10 | 8000 | (0.2,0.2,0.2) | 1.0 |
| Hete1 | 1000 | (0.2, 0.2) (0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete2 | 2000 | (0.2, 0.2) (0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete3 | 3000 | (0.2, 0.2) (0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete4 | 4000 | (0.2, 0.2) (0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete5 | 8000 | (0.2, 0.2) (0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete6 | 1000 | (0.2,0.2,0.2) (0.3,0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete7 | 2000 | (0.2,0.2,0.2) (0.3,0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete8 | 3000 | (0.2,0.2,0.2) (0.3,0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete9 | 4000 | (0.2,0.2,0.2) (0.3,0.3,0.3) | H1 = 50%, H2 = 50% |
| Hete10 | 8000 | (0.2,0.2,0.2) (0.3,0.3,0.3) | H1 = 50%, H2 = 50% |