Literature DB >> 21902678

Efficient genomewide selection of PCA-correlated tSNPs for genotype imputation.

Asif Javed1, Petros Drineas, Michael W Mahoney, Peristera Paschou.   

Abstract

The linkage disequilibrium structure of the human genome allows identification of small sets of single nucleotide polymorphisms (SNPs) (tSNPs) that efficiently represent dense sets of markers. This structure can be translated into linear algebraic terms as evidenced by the well documented principal components analysis (PCA)-based methods. Here we apply, for the first time, PCA-based methodology for efficient genomewide tSNP selection; and explore the linear algebraic structure of the human genome. Our algorithm divides the genome into contiguous nonoverlapping windows of high linear structure. Coupling this novel window definition with a PCA-based tSNP selection method, we analyze 2.5 million SNPs from the HapMap phase 2 dataset. We show that 10-25% of these SNPs suffice to predict the remaining genotypes with over 95% accuracy. A comparison with other popular methods in the ENCODE regions indicates significant genotyping savings. We evaluate the portability of genome-wide tSNPs across a diverse set of populations (HapMap phase 3 dataset). Interestingly, African populations are good reference populations for the rest of the world. Finally, we demonstrate the applicability of our approach in a real genome-wide disease association study. The chosen tSNP panels can be used toward genotype imputation using either a simple regression-based algorithm or more sophisticated genotype imputation methods.
© 2011 The Authors Annals of Human Genetics © 2011 Blackwell Publishing Ltd/University College London.

Entities:  

Mesh:

Year:  2011        PMID: 21902678     DOI: 10.1111/j.1469-1809.2011.00673.x

Source DB:  PubMed          Journal:  Ann Hum Genet        ISSN: 0003-4800            Impact factor:   1.670


  1 in total

1.  rCUR: an R package for CUR matrix decomposition.

Authors:  András Bodor; István Csabai; Michael W Mahoney; Norbert Solymosi
Journal:  BMC Bioinformatics       Date:  2012-05-17       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.