| Literature DB >> 29293371 |
Jaeil Ahn1, Brian Conkright2, Simina M Boca1,2,3, Subha Madhavan2,3.
Abstract
Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled populations. We develop a Bayesian joint modeling framework of SNPs and CNVs, called POPSTR, to better understand population structure than approaches that use SNPs solely. To deal with the increased data volume, we use the Metropolis Adjusted Langevin algorithm (MALA) that guides the target distribution in a computationally efficient way. We illustrate applications of our approach using the HapMap 2005 project data. We carry out simulation studies and show that the performance of our approach is comparable or better than that of popular benchmarks, STRUCTURE and ADMIXTURE. We also observe that using only CNVs can be remarkably efficient if SNP data are not available.Keywords: Bayesian modeling; copy number variations; population structure; single-nucleotide polymorphisms
Mesh:
Year: 2018 PMID: 29293371 PMCID: PMC5915226 DOI: 10.1089/cmb.2017.0127
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479