Literature DB >> 33399819

Accurate, scalable cohort variant calls using DeepVariant and GLnexus.

Taedong Yun1, Helen Li1, Pi-Chuan Chang1, Michael F Lin2, Andrew Carroll1, Cory Y McLean1.   

Abstract

MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging.
RESULTS: We introduce an open-source cohort-calling method that uses the highly-accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimized the method across a range of cohort sizes, sequencing methods, and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently-generated GATK Best Practices pipeline.
AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-sourced, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 33399819      PMCID: PMC8023681          DOI: 10.1093/bioinformatics/btaa1081

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  44 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  BGT: efficient and flexible genotype query across many samples.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2015-10-24       Impact factor: 6.937

3.  MENDELIAN PROPORTIONS IN A MIXED POPULATION.

Authors:  G H Hardy
Journal:  Science       Date:  1908-07-10       Impact factor: 47.728

4.  Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study.

Authors:  Frederick E Dewey; Michael F Murray; John D Overton; Lukas Habegger; Joseph B Leader; Samantha N Fetterolf; Colm O'Dushlaine; Cristopher V Van Hout; Jeffrey Staples; Claudia Gonzaga-Jauregui; Raghu Metpally; Sarah A Pendergrass; Monica A Giovanni; H Lester Kirchner; Suganthi Balasubramanian; Noura S Abul-Husn; Dustin N Hartzel; Daniel R Lavage; Korey A Kost; Jonathan S Packer; Alexander E Lopez; John Penn; Semanti Mukherjee; Nehal Gosalia; Manoj Kanagaraj; Alexander H Li; Lyndon J Mitnaul; Lance J Adams; Thomas N Person; Kavita Praveen; Anthony Marcketta; Matthew S Lebo; Christina A Austin-Tse; Heather M Mason-Suares; Shannon Bruse; Scott Mellis; Robert Phillips; Neil Stahl; Andrew Murphy; Aris Economides; Kimberly A Skelding; Christopher D Still; James R Elmore; Ingrid B Borecki; George D Yancopoulos; F Daniel Davis; William A Faucett; Omri Gottesman; Marylyn D Ritchie; Alan R Shuldiner; Jeffrey G Reid; David H Ledbetter; Aris Baras; David J Carey
Journal:  Science       Date:  2016-12-23       Impact factor: 47.728

5.  Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction.

Authors:  Kouichi Ozaki; Yozo Ohnishi; Aritoshi Iida; Akihiko Sekine; Ryo Yamada; Tatsuhiko Tsunoda; Hiroshi Sato; Hideyuki Sato; Masatsugu Hori; Yusuke Nakamura; Toshihiro Tanaka
Journal:  Nat Genet       Date:  2002-11-11       Impact factor: 38.330

6.  A map of human genome variation from population-scale sequencing.

Authors:  Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal:  Nature       Date:  2010-10-28       Impact factor: 49.962

7.  A universal SNP and small-indel variant caller using deep neural networks.

Authors:  Ryan Poplin; Pi-Chuan Chang; David Alexander; Scott Schwartz; Thomas Colthurst; Alexander Ku; Dan Newburger; Jojo Dijamco; Nam Nguyen; Pegah T Afshar; Sam S Gross; Lizzie Dorfman; Cory Y McLean; Mark A DePristo
Journal:  Nat Biotechnol       Date:  2018-09-24       Impact factor: 54.908

8.  Efficient genotype compression and analysis of large genetic-variation data sets.

Authors:  Ryan M Layer; Neil Kindlon; Konrad J Karczewski; Aaron R Quinlan
Journal:  Nat Methods       Date:  2015-11-09       Impact factor: 28.547

9.  SeqArray-a storage-efficient high-performance data format for WGS variant calls.

Authors:  Xiuwen Zheng; Stephanie M Gogarten; Michael Lawrence; Adrienne Stilp; Matthew P Conomos; Bruce S Weir; Cathy Laurie; David Levine
Journal:  Bioinformatics       Date:  2017-08-01       Impact factor: 6.937

10.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

View more
  18 in total

1.  The emergence of supergenes from inversions in Atlantic salmon.

Authors:  Kristina Stenløkk; Marie Saitou; Live Rud-Johansen; Torfinn Nome; Michel Moser; Mariann Árnyasi; Matthew Kent; Nicola Jane Barson; Sigbjørn Lien
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2022-06-13       Impact factor: 6.671

2.  Familial long-read sequencing increases yield of de novo mutations.

Authors:  Michelle D Noyes; William T Harvey; David Porubsky; Arvis Sulovari; Ruiyang Li; Nicholas R Rose; Peter A Audano; Katherine M Munson; Alexandra P Lewis; Kendra Hoekzema; Tuomo Mantere; Tina A Graves-Lindsay; Ashley D Sanders; Sara Goodwin; Melissa Kramer; Younes Mokrab; Michael C Zody; Alexander Hoischen; Jan O Korbel; W Richard McCombie; Evan E Eichler
Journal:  Am J Hum Genet       Date:  2022-03-14       Impact factor: 11.043

3.  A complete pedigree-based graph workflow for rare candidate variant analysis.

Authors:  Charles Markello; Charles Huang; Alex Rodriguez; Andrew Carroll; Pi-Chuan Chang; Jordan Eizenga; Thomas Markello; David Haussler; Benedict Paten
Journal:  Genome Res       Date:  2022-04-28       Impact factor: 9.438

4.  Genomic diversity of 39 samples of Pyropia species grown in Japan.

Authors:  Yukio Nagano; Kei Kimura; Genta Kobayashi; Yoshio Kawamura
Journal:  PLoS One       Date:  2021-06-09       Impact factor: 3.240

5.  Directed evolution for high functional production and stability of a challenging G protein-coupled receptor.

Authors:  Yann Waltenspühl; Jeliazko R Jeliazkov; Lutz Kummer; Andreas Plückthun
Journal:  Sci Rep       Date:  2021-04-21       Impact factor: 4.379

6.  A population-specific reference panel for improved genotype imputation in African Americans.

Authors:  Jared O'Connell; Taedong Yun; Meghan Moreno; Helen Li; Nadia Litterman; Alexey Kolesnikov; Elizabeth Noblin; Pi-Chuan Chang; Anjali Shastri; Elizabeth H Dorfman; Suyash Shringarpure; Adam Auton; Andrew Carroll; Cory Y McLean
Journal:  Commun Biol       Date:  2021-11-05

7.  Identification of Rare Loss-of-Function Genetic Variation Regulating Body Fat Distribution.

Authors:  Mine Koprulu; Yajie Zhao; Eleanor Wheeler; Liang Dong; Nuno Rocha; Chen Li; John D Griffin; Satish Patel; Marcel Van de Streek; Craig A Glastonbury; Isobel D Stewart; Felix R Day; Jian'an Luan; Nicholas Bowker; Laura B L Wittemans; Nicola D Kerrison; Lina Cai; Debora M E Lucarelli; Inês Barroso; Mark I McCarthy; Robert A Scott; Vladimir Saudek; Kerrin S Small; Nicholas J Wareham; Robert K Semple; John R B Perry; Stephen O'Rahilly; Luca A Lotta; Claudia Langenberg; David B Savage
Journal:  J Clin Endocrinol Metab       Date:  2022-03-24       Impact factor: 5.958

8.  DNA-free CRISPR-Cas9 gene editing of wild tetraploid tomato Solanum peruvianum using protoplast regeneration.

Authors:  Choun-Sea Lin; Chen-Tran Hsu; Yu-Hsuan Yuan; Po-Xing Zheng; Fu-Hui Wu; Qiao-Wei Cheng; Yu-Lin Wu; Ting-Li Wu; Steven Lin; Jin-Jun Yue; Ying-Huey Cheng; Shu-I Lin; Ming-Che Shih; Jen Sheen; Yao-Cheng Lin
Journal:  Plant Physiol       Date:  2022-03-28       Impact factor: 8.005

9.  Effective variant filtering and expected candidate variant yield in studies of rare human disease.

Authors:  Brent S Pedersen; Joe M Brown; Harriet Dashnow; Amelia D Wallace; Matt Velinder; Martin Tristani-Firouzi; Joshua D Schiffman; Tatiana Tvrdik; Rong Mao; D Hunter Best; Pinar Bayrak-Toydemir; Aaron R Quinlan
Journal:  NPJ Genom Med       Date:  2021-07-15       Impact factor: 8.617

10.  Whole genome sequencing of nearly isogenic WMI and WLI inbred rats identifies genes potentially involved in depression and stress reactivity.

Authors:  Tristan V de Jong; Panjun Kim; Victor Guryev; Megan K Mulligan; Robert W Williams; Eva E Redei; Hao Chen
Journal:  Sci Rep       Date:  2021-07-20       Impact factor: 4.996

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.