Literature DB >> 30183645

Utilizing Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women.

Paul Fergus, Casimiro Curbelo Montanez, Basma Abdulaimma, Paulo Lisboa, Carl Chalmers, Beth Pineles.   

Abstract

Genome-Wide Association Studies (GWAS) are used to identify statistically significant genetic variants in case-control studies. The main objective is to find single nucleotide polymorphisms (SNPs) that influence a particular phenotype (i.e., disease trait). GWAS typically use a p-value threshold of 5*10-8 to identify highly ranked SNPs. While this approach has proven useful for detecting disease-susceptible SNPs, evidence has shown that many of these are, in fact, false positives. Consequently, there is some ambiguity about the most suitable threshold for claiming genome-wide significance. Many believe that using lower p-values will allow us to investigate the joint epistatic interactions between SNPs and provide better insights into phenotype expression. One example that uses this approach is multifactor dimensionality reduction (MDR), which identifies combinations of SNPs that interact to influence a particular outcome. However, computational complexity is increased exponentially as a function of higher-order combinations making approaches like MDR difficult to implement. Even so, understanding epistatic interactions in complex diseases is a fundamental component for robust genotype-phenotype mapping. In this paper, we propose a novel framework that combines GWAS quality control and logistic regression with deep learning stacked autoencoders to abstract higher-order SNP interactions from large, complex genotyped data for case-control classification tasks in GWAS analysis. We focus on the challenging problem of classifying preterm births which has a strong genetic component with unexplained heritability reportedly between 20-40 percent. A GWAS data set, obtained from dbGap is utilised, which contains predominantly urban low-income African-American women who had normal and preterm deliveries. Epistatic interactions from original SNP sequences were extracted through a deep learning stacked autoencoder model and used to fine-tune a classifier for discriminating between term and preterm births observations. All models are evaluated using standard binary classifier performance metrics. The findings show that important information pertaining to SNPs and epistasis can be extracted from 4,666 raw SNPs generated using logistic regression (p-value = 5*10-3) and used to fit a highly accurate classifier model. The following results (Sen = 0.9562, Spec = 0.8780, Gini = 0.9490, Logloss = 0.5901, AUC = 0.9745, and MSE = 0.2010) were obtained using 50 hidden nodes and (Sen = 0.9289, Spec = 0.9591, Gini = 0.9651, Logloss = 0.3080, AUC = 0.9825, and MSE = 0.0942) using 500 hidden nodes. The results were compared with a Support Vector Machine (SVM), a Random Forest (RF), and a Fishers Linear Discriminant Analysis classifier, which all failed to improve on the deep learning approach.

Entities:  

Mesh:

Year:  2018        PMID: 30183645     DOI: 10.1109/TCBB.2018.2868667

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  4 in total

1.  Application of Correlation Pre-Filtering Neural Network to DNA Methylation Data: Biological Aging Prediction.

Authors:  Lechuan Li; Chonghao Zhang; Hannah Guan; Yu Zhang
Journal:  Methods Mol Biol       Date:  2022

2.  Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype.

Authors:  Bojian Yin; Marleen Balvert; Rick A A van der Spek; Bas E Dutilh; Sander Bohté; Jan Veldink; Alexander Schönhuth
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

3.  LINA: A Linearizing Neural Network Architecture for Accurate First-Order and Second-Order Interpretations.

Authors:  Adrien Badré; Chongle Pan
Journal:  IEEE Access       Date:  2022-03-30       Impact factor: 3.476

4.  New neural network classification method for individuals ancestry prediction from SNPs data.

Authors:  H Soumare; S Rezgui; N Gmati; A Benkahla
Journal:  BioData Min       Date:  2021-06-28       Impact factor: 2.522

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.