Literature DB >> 35794006

Haplotype and population structure inference using neural networks in whole-genome sequencing data.

Jonas Meisner1, Anders Albrechtsen1.   

Abstract

Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
© 2022 Meisner and Albrechtsen; Published by Cold Spring Harbor Laboratory Press.

Entities:  

Year:  2022        PMID: 35794006      PMCID: PMC9435741          DOI: 10.1101/gr.276813.122

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.438


  30 in total

1.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

Review 2.  Population identification using genetic data.

Authors:  Daniel John Lawson; Daniel Falush
Journal:  Annu Rev Genomics Hum Genet       Date:  2012-06-11       Impact factor: 8.929

3.  A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks.

Authors:  Jeffrey Chan; Valerio Perrone; Jeffrey P Spence; Paul A Jenkins; Sara Mathieson; Yun S Song
Journal:  Adv Neural Inf Process Syst       Date:  2018-12

4.  Deep Learning for Population Genetic Inference.

Authors:  Sara Sheehan; Yun S Song
Journal:  PLoS Comput Biol       Date:  2016-03-28       Impact factor: 4.475

5.  SLiM 3: Forward Genetic Simulations Beyond the Wright-Fisher Model.

Authors:  Benjamin C Haller; Philipp W Messer
Journal:  Mol Biol Evol       Date:  2019-03-01       Impact factor: 16.240

Review 6.  Array programming with NumPy.

Authors:  Charles R Harris; K Jarrod Millman; Stéfan J van der Walt; Ralf Gommers; Pauli Virtanen; David Cournapeau; Eric Wieser; Julian Taylor; Sebastian Berg; Nathaniel J Smith; Robert Kern; Matti Picus; Stephan Hoyer; Marten H van Kerkwijk; Matthew Brett; Allan Haldane; Jaime Fernández Del Río; Mark Wiebe; Pearu Peterson; Pierre Gérard-Marchant; Kevin Sheppard; Tyler Reddy; Warren Weckesser; Hameer Abbasi; Christoph Gohlke; Travis E Oliphant
Journal:  Nature       Date:  2020-09-16       Impact factor: 49.962

7.  Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations.

Authors:  Juba Nait Saada; Georgios Kalantzis; Derek Shyr; Fergus Cooper; Martin Robinson; Alexander Gusev; Pier Francesco Palamara
Journal:  Nat Commun       Date:  2020-11-30       Impact factor: 14.919

8.  Automatic inference of demographic parameters using generative adversarial networks.

Authors:  Zhanpeng Wang; Jiaping Wang; Michael Kourakos; Nhung Hoang; Hyong Hark Lee; Iain Mathieson; Sara Mathieson
Journal:  Mol Ecol Resour       Date:  2021-03-20       Impact factor: 7.090

Review 9.  Supervised Machine Learning for Population Genetics: A New Paradigm.

Authors:  Daniel R Schrider; Andrew D Kern
Journal:  Trends Genet       Date:  2018-01-10       Impact factor: 11.639

10.  On rare variants in principal component analysis of population stratification.

Authors:  Shengqing Ma; Gang Shi
Journal:  BMC Genet       Date:  2020-03-17       Impact factor: 2.797

View more
  1 in total

1.  Quantitative evaluation of nonlinear methods for population structure visualization and inference.

Authors:  Jordan Ubbens; Mitchell J Feldmann; Ian Stavness; Andrew G Sharpe
Journal:  G3 (Bethesda)       Date:  2022-08-25       Impact factor: 3.542

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.