Literature DB >> 33539374

Creating artificial human genomes using generative neural networks.

Burak Yelmen1,2,3, Aurélien Decelle3,4, Linda Ongaro1,2, Davide Marnetto1, Corentin Tallec3, Francesco Montinaro1,5, Cyril Furtlehner3, Luca Pagani1,6, Flora Jay3.   

Abstract

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.

Entities:  

Mesh:

Year:  2021        PMID: 33539374      PMCID: PMC7861435          DOI: 10.1371/journal.pgen.1009303

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


  36 in total

1.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

2.  Reducing the dimensionality of data with neural networks.

Authors:  G E Hinton; R R Salakhutdinov
Journal:  Science       Date:  2006-07-28       Impact factor: 47.728

3.  An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination.

Authors:  Joshua S Paul; Matthias Steinrücken; Yun S Song
Journal:  Genetics       Date:  2011-01-26       Impact factor: 4.562

4.  Optimal Transport for Domain Adaptation.

Authors:  Nicolas Courty; Remi Flamary; Devis Tuia; Alain Rakotomamonjy
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2016-10-07       Impact factor: 6.226

5.  Learning protein constitutive motifs from sequence data.

Authors:  Jérôme Tubiana; Simona Cocco; Rémi Monasson
Journal:  Elife       Date:  2019-03-12       Impact factor: 8.140

Review 6.  Machine learning applications in genetics and genomics.

Authors:  Maxwell W Libbrecht; William Stafford Noble
Journal:  Nat Rev Genet       Date:  2015-05-07       Impact factor: 53.242

7.  Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu.

Authors:  Liis Leitsalu; Toomas Haller; Tõnu Esko; Mari-Liis Tammesoo; Helene Alavere; Harold Snieder; Markus Perola; Pauline C Ng; Reedik Mägi; Lili Milani; Krista Fischer; Andres Metspalu
Journal:  Int J Epidemiol       Date:  2014-02-11       Impact factor: 7.196

8.  Genome-wide detection and characterization of positive selection in human populations.

Authors:  Pardis C Sabeti; Patrick Varilly; Ben Fry; Jason Lohmueller; Elizabeth Hostetter; Chris Cotsapas; Xiaohui Xie; Elizabeth H Byrne; Steven A McCarroll; Rachelle Gaudet; Stephen F Schaffner; Eric S Lander; Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; Todd A Johnson; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal:  Nature       Date:  2007-10-18       Impact factor: 49.962

9.  Genotype imputation with thousands of genomes.

Authors:  Bryan Howie; Jonathan Marchini; Matthew Stephens
Journal:  G3 (Bethesda)       Date:  2011-11-01       Impact factor: 3.154

10.  hicGAN infers super resolution Hi-C data with generative adversarial networks.

Authors:  Qiao Liu; Hairong Lv; Rui Jiang
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

View more
  11 in total

1.  BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.

Authors:  Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Christoph Zehe; Johan Trygg; Andreas Dengel; Sheraz Ahmed
Journal:  Interdiscip Sci       Date:  2022-08-10       Impact factor: 3.492

2.  Haplotype and population structure inference using neural networks in whole-genome sequencing data.

Authors:  Jonas Meisner; Anders Albrechtsen
Journal:  Genome Res       Date:  2022-07-06       Impact factor: 9.438

3.  Using game theory to thwart multistage privacy intrusions when sharing data.

Authors:  Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Yongtai Liu; Myrna Wooders; Jia Guo; Zhijun Yin; Ellen Wright Clayton; Murat Kantarcioglu; Bradley A Malin
Journal:  Sci Adv       Date:  2021-12-10       Impact factor: 14.136

Review 4.  Artificial Intelligence and Cardiovascular Genetics.

Authors:  Chayakrit Krittanawong; Kipp W Johnson; Edward Choi; Scott Kaplin; Eric Venner; Mullai Murugan; Zhen Wang; Benjamin S Glicksberg; Christopher I Amos; Michael C Schatz; W H Wilson Tang
Journal:  Life (Basel)       Date:  2022-02-14

Review 5.  Sociotechnical safeguards for genomic data privacy.

Authors:  Zhiyu Wan; James W Hazel; Ellen Wright Clayton; Yevgeniy Vorobeychik; Murat Kantarcioglu; Bradley A Malin
Journal:  Nat Rev Genet       Date:  2022-03-04       Impact factor: 59.581

6.  A deep learning framework for characterization of genotype data.

Authors:  Kristiina Ausmees; Carl Nettelblad
Journal:  G3 (Bethesda)       Date:  2022-03-04       Impact factor: 3.154

7.  Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome.

Authors:  Debapriya Hazra; Mi-Ryung Kim; Yung-Cheol Byun
Journal:  Int J Mol Sci       Date:  2022-03-28       Impact factor: 5.923

8.  Holdout-Based Empirical Assessment of Mixed-Type Synthetic Data.

Authors:  Michael Platzer; Thomas Reutterer
Journal:  Front Big Data       Date:  2021-06-29

Review 9.  Privacy considerations for sharing genomics data.

Authors:  Marie Oestreich; Dingfan Chen; Joachim L Schultze; Mario Fritz; Matthias Becker
Journal:  EXCLI J       Date:  2021-07-16       Impact factor: 4.068

10.  Automatic Liver Segmentation in CT Images with Enhanced GAN and Mask Region-Based CNN Architectures.

Authors:  Xiaoqin Wei; Xiaowen Chen; Ce Lai; Yuanzhong Zhu; Hanfeng Yang; Yong Du
Journal:  Biomed Res Int       Date:  2021-12-16       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.