Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Imputing missing genotypic data of single-nucleotide polymorphisms using neural networks.

Literature DB >> 18197192

Imputing missing genotypic data of single-nucleotide polymorphisms using neural networks.

Abstract

With advances in high-throughput single-nucleotide polymorphism (SNP) genotyping, the amount of genotype data available for genetic studies is steadily increasing, and with it comes new abilities to study multigene interactions as well as to develop higher dimensional genetic models that more closely represent the polygenic nature of common disease risk. The combined impact of even small amounts of missing data on a multi-SNP analysis may be considerable. In this study, we present a neural network method for imputing missing SNP genotype data. We compared its imputation accuracy with fastPHASE and an expectation-maximization algorithm implemented in HelixTree. In a simulation data set of 1000 SNPs and 1000 subjects, 1, 5 and 10% of genotypes were randomly masked. Four levels of linkage disequilibrium (LD), LD R2<0.2, R2<0.5, R2<0.8 and no LD threshold, were examined to evaluate the impact of LD on imputation accuracy. All three methods are capable of imputing most missing genotypes accurately (accuracy >86%). The neural network method accurately predicted 92.0-95.9% of the missing genotypes. In a real data set comparison with 419 subjects and 126 SNPs from chromosome 2, the neural network method achieves the highest imputation accuracies >83.1% with missing rate from 1 to 5%. Using 90 HapMap subjects with 1962 SNPs, fastPHASE had the highest accuracy ( approximately 97%) while the other two methods had >95% accuracy. These results indicate that the neural network model is an accurate and convenient tool, requiring minimal parameter tuning for SNP data recovery, and provides a valuable alternative to usual complete-case analysis.

Mesh：

Year: 2008 PMID： 18197192 DOI： 10.1038/sj.ejhg.5201988

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Keyword Cloud
Cited

11 in total

1. Application of machine learning algorithms to predict coronary artery calcification with a sibship-based design.

Authors: Yan V Sun; Lawrence F Bielak; Patricia A Peyser; Stephen T Turner; Patrick F Sheedy; Eric Boerwinkle; Sharon L R Kardia
Journal: Genet Epidemiol Date: 2008-05 Impact factor: 2.135

2. Utilizing genotype imputation for the augmentation of sequence data.

Authors: Brooke L Fridley; Gregory Jenkins; Matthew E Deyo-Svendsen; Scott Hebbring; Robert Freimuth
Journal: PLoS One Date: 2010-06-08 Impact factor: 3.240

3. Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk.

Authors: Ellen L Goode; Brooke L Fridley; Robert A Vierkant; Julie M Cunningham; Catherine M Phelan; Stephanie Anderson; David N Rider; Kristin L White; V Shane Pankratz; Honglin Song; Estrid Hogdall; Susanne K Kjaer; Alice S Whittemore; Richard DiCioccio; Susan J Ramus; Simon A Gayther; Joellen M Schildkraut; Paul P D Pharaoh; Thomas A Sellers
Journal: Cancer Epidemiol Biomarkers Prev Date: 2009-03-03 Impact factor: 4.254

4. Machine learning and complex biological data.

Authors: Chunming Xu; Scott A Jackson
Journal: Genome Biol Date: 2019-04-16 Impact factor: 13.583

5. A deep learning approach for staging embryonic tissue isolates with small data.

Authors: Adam Joseph Ronald Pond; Seongwon Hwang; Berta Verd; Benjamin Steventon
Journal: PLoS One Date: 2021-01-08 Impact factor: 3.240

6. Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example.

Authors: Huimin Wang; Jianxiang Tang; Mengyao Wu; Xiaoyu Wang; Tao Zhang
Journal: BMC Med Inform Decis Mak Date: 2022-01-13 Impact factor: 2.796

7. A deep learning framework for characterization of genotype data.

Authors: Kristiina Ausmees; Carl Nettelblad
Journal: G3 (Bethesda) Date: 2022-03-04 Impact factor: 3.154

8. Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank.

Authors: Ting-Yuan Liu; Chih-Fan Lin; Hsing-Tsung Wu; Ya-Lun Wu; Yu-Chia Chen; Chi-Chou Liao; Yu-Pao Chou; Dysan Chao; Ya-Sian Chang; Hsing-Fang Lu; Jan-Gowth Chang; Kai-Cheng Hsu; Fuu-Jen Tsai
Journal: Biomedicine (Taipei) Date: 2021-12-01

9. Simpute: an efficient solution for dense genotypic data.

Authors: Yen-Jen Lin; Chun-Tien Chang; Chuan Yi Tang; Wen-Ping Hsieh
Journal: Biomed Res Int Date: 2013-02-03 Impact factor: 3.411

10. Imputation of missing genotypes: an empirical evaluation of IMPUTE.

Authors: Zhenming Zhao; Nadia Timofeev; Stephen W Hartley; David Hk Chui; Supan Fucharoen; Thomas T Perls; Martin H Steinberg; Clinton T Baldwin; Paola Sebastiani
Journal: BMC Genet Date: 2008-12-12 Impact factor: 2.797