Samantha Zarate1,2, Andrew Carroll1, Medhat Mahmoud3, Olga Krasheninina3, Goo Jun4, William J Salerno3, Michael C Schatz2, Eric Boerwinkle3,4, Richard A Gibbs3, Fritz J Sedlazeck3. 1. DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA. 2. Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA. 3. Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA. 4. Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA.
Abstract
BACKGROUND: Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS: We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION: Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
BACKGROUND: Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS: We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION: Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Authors: Jianmin Wang; Charles G Mullighan; John Easton; Stefan Roberts; Sue L Heatley; Jing Ma; Michael C Rusch; Ken Chen; Christopher C Harris; Li Ding; Linda Holmfeldt; Debbie Payne-Turner; Xian Fan; Lei Wei; David Zhao; John C Obenauer; Clayton Naeve; Elaine R Mardis; Richard K Wilson; James R Downing; Jinghui Zhang Journal: Nat Methods Date: 2011-06-12 Impact factor: 28.547
Authors: Mark J P Chaisson; John Huddleston; Megan Y Dennis; Peter H Sudmant; Maika Malig; Fereydoun Hormozdiari; Francesca Antonacci; Urvashi Surti; Richard Sandstrom; Matthew Boitano; Jane M Landolin; John A Stamatoyannopoulos; Michael W Hunkapiller; Jonas Korlach; Evan E Eichler Journal: Nature Date: 2014-11-10 Impact factor: 49.962
Authors: Ken Chen; John W Wallis; Michael D McLellan; David E Larson; Joelle M Kalicki; Craig S Pohl; Sean D McGrath; Michael C Wendl; Qunyuan Zhang; Devin P Locke; Xiaoqi Shi; Robert S Fulton; Timothy J Ley; Richard K Wilson; Li Ding; Elaine R Mardis Journal: Nat Methods Date: 2009-08-09 Impact factor: 28.547
Authors: Michael Alonge; Xingang Wang; Matthias Benoit; Sebastian Soyk; Lara Pereira; Lei Zhang; Hamsini Suresh; Srividya Ramakrishnan; Florian Maumus; Danielle Ciren; Yuval Levy; Tom Hai Harel; Gili Shalev-Schlosser; Ziva Amsellem; Hamid Razifard; Ana L Caicedo; Denise M Tieman; Harry Klee; Melanie Kirsche; Sergey Aganezov; T Rhyker Ranallo-Benavidez; Zachary H Lemmon; Jennifer Kim; Gina Robitaille; Melissa Kramer; Sara Goodwin; W Richard McCombie; Samuel Hutton; Joyce Van Eck; Jesse Gillis; Yuval Eshed; Fritz J Sedlazeck; Esther van der Knaap; Michael C Schatz; Zachary B Lippman Journal: Cell Date: 2020-06-17 Impact factor: 66.850
Authors: Peter H Sudmant; Tobias Rausch; Eugene J Gardner; Robert E Handsaker; Alexej Abyzov; John Huddleston; Yan Zhang; Kai Ye; Goo Jun; Markus Hsi-Yang Fritz; Miriam K Konkel; Ankit Malhotra; Adrian M Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark J P Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y K Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki; Taejeong Bae; Eliza Cerveira; Peter Chines; Zechen Chong; Laura Clarke; Elif Dal; Li Ding; Sarah Emery; Xian Fan; Madhusudan Gujral; Fatma Kahveci; Jeffrey M Kidd; Yu Kong; Eric-Wubbo Lameijer; Shane McCarthy; Paul Flicek; Richard A Gibbs; Gabor Marth; Christopher E Mason; Androniki Menelaou; Donna M Muzny; Bradley J Nelson; Amina Noor; Nicholas F Parrish; Matthew Pendleton; Andrew Quitadamo; Benjamin Raeder; Eric E Schadt; Mallory Romanovitch; Andreas Schlattl; Robert Sebra; Andrey A Shabalin; Andreas Untergasser; Jerilyn A Walker; Min Wang; Fuli Yu; Chengsheng Zhang; Jing Zhang; Xiangqun Zheng-Bradley; Wanding Zhou; Thomas Zichner; Jonathan Sebat; Mark A Batzer; Steven A McCarroll; Ryan E Mills; Mark B Gerstein; Ali Bashir; Oliver Stegle; Scott E Devine; Charles Lee; Evan E Eichler; Jan O Korbel Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937
Authors: Colby Chiang; Ryan M Layer; Gregory G Faust; Michael R Lindberg; David B Rose; Erik P Garrison; Gabor T Marth; Aaron R Quinlan; Ira M Hall Journal: Nat Methods Date: 2015-08-10 Impact factor: 28.547
Authors: Fritz J Sedlazeck; Philipp Rescheneder; Moritz Smolka; Han Fang; Maria Nattestad; Arndt von Haeseler; Michael C Schatz Journal: Nat Methods Date: 2018-04-30 Impact factor: 28.547
Authors: Justin M Zook; Nancy F Hansen; Nathan D Olson; Lesley Chapman; James C Mullikin; Chunlin Xiao; Stephen Sherry; Sergey Koren; Adam M Phillippy; Paul C Boutros; Sayed Mohammad E Sahraeian; Vincent Huang; Alexandre Rouette; Noah Alexander; Christopher E Mason; Iman Hajirasouliha; Camir Ricketts; Joyce Lee; Rick Tearle; Ian T Fiddes; Alvaro Martinez Barrio; Jeremiah Wala; Andrew Carroll; Noushin Ghaffari; Oscar L Rodriguez; Ali Bashir; Shaun Jackman; John J Farrell; Aaron M Wenger; Can Alkan; Arda Soylev; Michael C Schatz; Shilpa Garg; George Church; Tobias Marschall; Ken Chen; Xian Fan; Adam C English; Jeffrey A Rosenfeld; Weichen Zhou; Ryan E Mills; Jay M Sage; Jennifer R Davis; Michael D Kaiser; John S Oliver; Anthony P Catalano; Mark J P Chaisson; Noah Spies; Fritz J Sedlazeck; Marc Salit Journal: Nat Biotechnol Date: 2020-06-15 Impact factor: 54.908
Authors: Katarina Cisarova; Livia Garavelli; Stefano Giuseppe Caraffi; Francesca Peluso; Lara Valeri; Giancarlo Gargano; Sara Gavioli; Gabriele Trimarchi; Alberto Neri; Belinda Campos-Xavier; Andrea Superti-Furga Journal: Am J Med Genet A Date: 2021-09-28 Impact factor: 2.578