Bohu Pan1, Luyao Ren2,3, Vitor Onuchic4, Meijian Guan5, Rebecca Kusko6, Steve Bruinsma4, Len Trigg7, Andreas Scherer8,9, Baitang Ning1, Chaoyang Zhang10, Christine Glidewell-Kenney4, Chunlin Xiao11, Eric Donaldson12, Fritz J Sedlazeck13, Gary Schroth4, Gokhan Yavas1, Haiying Grunenwald4, Haodong Chen14, Heather Meinholz4, Joe Meehan1, Jing Wang15, Jingcheng Yang2,3, Jonathan Foox16, Jun Shang2,3, Kelci Miclaus5, Lianhua Dong15, Leming Shi2,3, Marghoob Mohiyuddin17, Mehdi Pirooznia18, Ping Gong19, Rooz Golshani4, Russ Wolfinger5, Samir Lababidi20, Sayed Mohammad Ebrahim Sahraeian17, Steve Sherry11, Tao Han1, Tao Chen1, Tieliu Shi21, Wanwan Hou2,3, Weigong Ge1, Wen Zou1, Wenjing Guo1, Wenjun Bao5, Wenzhong Xiao22, Xiaohui Fan23, Yoichi Gondo24, Ying Yu2,3, Yongmei Zhao25, Zhenqiang Su26, Zhichao Liu1, Weida Tong1, Wenming Xiao27, Justin M Zook28, Yuanting Zheng29,30, Huixiao Hong31. 1. Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA. 2. State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China. 3. Human Phenome Institute, Fudan University, Shanghai, 200438, China. 4. Illumina Inc., San Diego, CA, 92122, USA. 5. SAS Institute Inc., Cary, NC, 27513, USA. 6. Immuneering Corporation, Cambridge, MA, 02142, USA. 7. Real Time Genomics, Hamilton, New Zealand. 8. Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland. 9. EATRIS ERIC- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands. 10. School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA. 11. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA. 12. Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, 20993, USA. 13. Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. 14. Sentieon Inc., San Jose, CA, 95134, USA. 15. Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100013, China. 16. Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA. 17. Roche Sequencing Solutions, Santa Clara, CA, 95050, USA. 18. Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA. 19. Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA. 20. Office of Health Informatics, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, 20993, USA. 21. The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China. 22. Stanford Genome Technology Center, Stanford University School of Medicine, Palo Alto, CA, 94305, USA. 23. Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China. 24. Department of Molecular Life Sciences, Tokai University School of Medicine, 143 Shimokasuya, Isehara, 259-1193, Japan. 25. CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21701, USA. 26. Takeda Pharmaceuticals, Cambridge, MA, 02139, USA. 27. Division of Molecular Genetics and Pathology, Center for Device and Radiological Health, US Food and Drug Administration, Silver Spring, MD, 20993, USA. 28. Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA. justin.zook@nist.gov. 29. State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China. zhengyuanting@fudan.edu.cn. 30. Human Phenome Institute, Fudan University, Shanghai, 200438, China. zhengyuanting@fudan.edu.cn. 31. Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA. huixiao.hong@fda.hhs.gov.
Abstract
BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
Authors: Justin M Zook; Brad Chapman; Jason Wang; David Mittelman; Oliver Hofmann; Winston Hide; Marc Salit Journal: Nat Biotechnol Date: 2014-02-16 Impact factor: 54.908
Authors: Jane C Davies; Claire E Wainwright; Gerard J Canny; Mark A Chilvers; Michelle S Howenstine; Anne Munck; Jochen G Mainz; Sally Rodriguez; Haihong Li; Karl Yen; Claudia L Ordoñez; Richard Ahrens Journal: Am J Respir Crit Care Med Date: 2013-06-01 Impact factor: 21.405
Authors: Verena Heinrich; Jens Stange; Thorsten Dickhaus; Peter Imkeller; Ulrike Krüger; Sebastian Bauer; Stefan Mundlos; Peter N Robinson; Jochen Hecht; Peter M Krawitz Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971
Authors: Aaron M Wenger; Paul Peluso; William J Rowell; Pi-Chuan Chang; Richard J Hall; Gregory T Concepcion; Jana Ebler; Arkarachai Fungtammasan; Alexey Kolesnikov; Nathan D Olson; Armin Töpfer; Michael Alonge; Medhat Mahmoud; Yufeng Qian; Chen-Shan Chin; Adam M Phillippy; Michael C Schatz; Gene Myers; Mark A DePristo; Jue Ruan; Tobias Marschall; Fritz J Sedlazeck; Justin M Zook; Heng Li; Sergey Koren; Andrew Carroll; David R Rank; Michael W Hunkapiller Journal: Nat Biotechnol Date: 2019-08-12 Impact factor: 54.908
Authors: Katrina A Andrews; David B Ascher; Douglas Eduardo Valente Pires; Daniel R Barnes; Lindsey Vialard; Ruth T Casey; Nicola Bradshaw; Julian Adlard; Simon Aylwin; Paul Brennan; Carole Brewer; Trevor Cole; Jackie A Cook; Rosemarie Davidson; Alan Donaldson; Alan Fryer; Lynn Greenhalgh; Shirley V Hodgson; Richard Irving; Fiona Lalloo; Michelle McConachie; Vivienne P M McConnell; Patrick J Morrison; Victoria Murday; Soo-Mi Park; Helen L Simpson; Katie Snape; Susan Stewart; Susan E Tomkins; Yvonne Wallis; Louise Izatt; David Goudie; Robert S Lindsay; Colin G Perry; Emma R Woodward; Antonis C Antoniou; Eamonn R Maher Journal: J Med Genet Date: 2018-01-31 Impact factor: 6.318