Literature DB >> 29857119

Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.

Adam C Naj1, Honghuang Lin2, Badri N Vardarajan3, Simon White4, Daniel Lancour5, Yiyi Ma5, Michael Schmidt6, Fangui Sun7, Mariusz Butkiewicz8, William S Bush8, Brian W Kunkle6, John Malamon9, Najaf Amin10, Seung Hoan Choi7, Kara L Hamilton-Nelson6, Sven J van der Lee10, Namrata Gupta11, Daniel C Koboldt12, Mohamad Saad13, Bowen Wang14, Alejandro Q Nato15, Harkirat K Sohi15, Amanda Kuzma9, Li-San Wang9, L Adrienne Cupples16, Cornelia van Duijn10, Sudha Seshadri17, Gerard D Schellenberg9, Eric Boerwinkle18, Joshua C Bis19, Josée Dupuis16, William J Salerno4, Ellen M Wijsman13, Eden R Martin6, Anita L DeStefano20.   

Abstract

The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Copyright © 2018. Published by Elsevier Inc.

Entities:  

Keywords:  Atlas; Consensus calling; GATK; Mendelian inconsistencies; Quality control; Whole genome sequencing

Mesh:

Year:  2018        PMID: 29857119      PMCID: PMC6397097          DOI: 10.1016/j.ygeno.2018.05.004

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  29 in total

1.  NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.

Authors:  Ravi K Patel; Mukesh Jain
Journal:  PLoS One       Date:  2012-02-01       Impact factor: 3.240

Review 2.  Three-stage quality control strategies for DNA re-sequencing data.

Authors:  Yan Guo; Fei Ye; Quanghu Sheng; Travis Clark; David C Samuels
Journal:  Brief Bioinform       Date:  2013-09-24       Impact factor: 11.622

3.  PedCheck: a program for identification of genotype incompatibilities in linkage analysis.

Authors:  J R O'Connell; D E Weeks
Journal:  Am J Hum Genet       Date:  1998-07       Impact factor: 11.025

4.  The Alzheimer's Disease Sequencing Project: Study design and sample selection.

Authors:  Gary W Beecham; J C Bis; E R Martin; S-H Choi; A L DeStefano; C M van Duijn; M Fornage; S B Gabriel; D C Koboldt; D E Larson; A C Naj; B M Psaty; W Salerno; W S Bush; T M Foroud; E Wijsman; L A Farrer; A Goate; J L Haines; Margaret A Pericak-Vance; E Boerwinkle; R Mayeux; S Seshadri; G Schellenberg
Journal:  Neurol Genet       Date:  2017-10-13

5.  From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.

Authors:  Geraldine A Van der Auwera; Mauricio O Carneiro; Christopher Hartl; Ryan Poplin; Guillermo Del Angel; Ami Levy-Moonshine; Tadeusz Jordan; Khalid Shakir; David Roazen; Joel Thibault; Eric Banks; Kiran V Garimella; David Altshuler; Stacey Gabriel; Mark A DePristo
Journal:  Curr Protoc Bioinformatics       Date:  2013

6.  Multi-perspective quality control of Illumina exome sequencing data using QC3.

Authors:  Yan Guo; Shilin Zhao; Quanhu Sheng; Fei Ye; Jiang Li; Brian Lehmann; Jennifer Pietenpol; David C Samuels; Yu Shyr
Journal:  Genomics       Date:  2014-04-03       Impact factor: 5.736

Review 7.  Sequencing technologies and genome sequencing.

Authors:  Chandra Shekhar Pareek; Rafal Smoczynski; Andrzej Tretyn
Journal:  J Appl Genet       Date:  2011-06-23       Impact factor: 3.240

8.  QPLOT: a quality assessment tool for next generation sequencing data.

Authors:  Bingshan Li; Xiaowei Zhan; Mary-Kate Wing; Paul Anderson; Hyun Min Kang; Goncalo R Abecasis
Journal:  Biomed Res Int       Date:  2013-11-11       Impact factor: 3.411

9.  Estimating genotype error rates from high-coverage next-generation sequence data.

Authors:  Jeffrey D Wall; Ling Fung Tang; Brandon Zerbe; Mark N Kvale; Pui-Yan Kwok; Catherine Schaefer; Neil Risch
Journal:  Genome Res       Date:  2014-10-10       Impact factor: 9.043

10.  QC-Chain: fast and holistic quality control method for next-generation sequencing data.

Authors:  Qian Zhou; Xiaoquan Su; Anhui Wang; Jian Xu; Kang Ning
Journal:  PLoS One       Date:  2013-04-02       Impact factor: 3.240

View more
  9 in total

1.  Identification of risk genes for Alzheimer's disease by gene embedding.

Authors:  Yashwanth Lagisetty; Thomas Bourquard; Ismael Al-Ramahi; Carl Grant Mangleburg; Samantha Mota; Shirin Soleimani; Joshua M Shulman; Juan Botas; Kwanghyuk Lee; Olivier Lichtarge
Journal:  Cell Genom       Date:  2022-07-26

2.  Rare genetic variation implicated in non-Hispanic white families with Alzheimer disease.

Authors:  Gary W Beecham; Badri Vardarajan; Elizabeth Blue; William Bush; James Jaworski; Sandra Barral; Anita DeStefano; Kara Hamilton-Nelson; Brian Kunkle; Eden R Martin; Adam Naj; Farid Rajabli; Christiane Reitz; Timothy Thornton; Cornelia van Duijn; Allison Goate; Sudha Seshadri; Lindsay A Farrer; Eric Boerwinkle; Gerard Schellenberg; Jonathan L Haines; Ellen Wijsman; Richard Mayeux; Margaret A Pericak-Vance
Journal:  Neurol Genet       Date:  2018-11-21

3.  Alzheimer Disease Pathology-Associated Polymorphism in a Complex Variable Number of Tandem Repeat Region Within the MUC6 Gene, Near the AP2A2 Gene.

Authors:  Yuriko Katsumata; David W Fardo; Adam D Bachstetter; Sergey C Artiushin; Wang-Xia Wang; Angela Wei; Lena J Brzezinski; Bela G Nelson; Qingwei Huang; Erin L Abner; Sonya Anderson; Indumati Patel; Benjamin C Shaw; Douglas A Price; Dana M Niedowicz; Donna W Wilcock; Gregory A Jicha; Janna H Neltner; Linda J Van Eldik; Steven Estus; Peter T Nelson
Journal:  J Neuropathol Exp Neurol       Date:  2020-01-01       Impact factor: 3.685

Review 4.  The MUC6/AP2A2 Locus and Its Relevance to Alzheimer's Disease: A Review.

Authors:  Peter T Nelson; David W Fardo; Yuriko Katsumata
Journal:  J Neuropathol Exp Neurol       Date:  2020-06-01       Impact factor: 3.685

5.  Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

Authors:  Mark T W Ebbert; Tanner D Jensen; Karen Jansen-West; Jonathon P Sens; Joseph S Reddy; Perry G Ridge; John S K Kauwe; Veronique Belzil; Luc Pregent; Minerva M Carrasquillo; Dirk Keene; Eric Larson; Paul Crane; Yan W Asmann; Nilufer Ertekin-Taner; Steven G Younkin; Owen A Ross; Rosa Rademakers; Leonard Petrucelli; John D Fryer
Journal:  Genome Biol       Date:  2019-05-20       Impact factor: 13.583

6.  Empirical design of a variant quality control pipeline for whole genome sequencing data using replicate discordance.

Authors:  Robert P Adelson; Alan E Renton; Wentian Li; Nir Barzilai; Gil Atzmon; Alison M Goate; Peter Davies; Yun Freudenberg-Hua
Journal:  Sci Rep       Date:  2019-11-06       Impact factor: 4.379

7.  APOE Gene Associated with Cholesterol-Related Traits in the Hispanic Population.

Authors:  Stephanie Lozano; Victoria Padilla; Manuel Lee Avila; Mario Gil; Gladys Maestre; Kesheng Wang; Chun Xu
Journal:  Genes (Basel)       Date:  2021-11-08       Impact factor: 4.096

8.  Hadoop and PySpark for reproducibility and scalability of genomic sequencing studies.

Authors:  Nicholas R Wheeler; Penelope Benchek; Brian W Kunkle; Kara L Hamilton-Nelson; Mike Warfe; Jeremy R Fondran; Jonathan L Haines; William S Bush
Journal:  Pac Symp Biocomput       Date:  2020

9.  Association of mitochondrial variants and haplogroups identified by whole exome sequencing with Alzheimer's disease.

Authors:  Xiaoling Zhang; John J Farrell; Tong Tong; Junming Hu; Congcong Zhu; Li-San Wang; Richard Mayeux; Jonathan L Haines; Margaret A Pericak-Vance; Gerard D Schellenberg; Kathryn L Lunetta; Lindsay A Farrer
Journal:  Alzheimers Dement       Date:  2021-06-20       Impact factor: 16.655

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.