Tobias Marschall1, Iman Hajirasouliha, Alexander Schönhuth. 1. Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Science Park 123, Amsterdam 1098 XG, The Netherlands and Department of Computer Science and Brown University, Providence, Rhode Island 02906, USA.
Abstract
MOTIVATION: Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly relevant owing to the increasing attention in globally concerted projects. RESULTS: We present MATE-CLEVER (Mendelian-inheritance-AtTEntive CLique-Enumerating Variant findER) as an approach that accurately discovers and genotypes indels longer than 30 bp from contemporary NGS reads with a special focus on family data. For enhanced quality of indel calls in family trios or quartets, MATE-CLEVER integrates statistics that reflect the laws of Mendelian inheritance. MATE-CLEVER's performance rates for indels longer than 30 bp are on a par with those of the GATK for indels shorter than 30 bp, achieving up to 90% precision overall, with >80% of calls correctly typed. In predicting de novo indels longer than 30 bp in family contexts, MATE-CLEVER even raises the standards of the GATK. MATE-CLEVER achieves precision and recall of ∼63% on indels of 30 bp and longer versus 55% in both categories for the GATK on indels of 10-29 bp. A special version of MATE-CLEVER has contributed to indel discovery, in particular for indels of 30-100 bp, the 'NGS twilight zone of indels', in the Genome of the Netherlands Project. AVAILABILITY AND IMPLEMENTATION: http://clever-sv.googlecode.com/
MOTIVATION: Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly relevant owing to the increasing attention in globally concerted projects. RESULTS: We present MATE-CLEVER (Mendelian-inheritance-AtTEntive CLique-Enumerating Variant findER) as an approach that accurately discovers and genotypes indels longer than 30 bp from contemporary NGS reads with a special focus on family data. For enhanced quality of indel calls in family trios or quartets, MATE-CLEVER integrates statistics that reflect the laws of Mendelian inheritance. MATE-CLEVER's performance rates for indels longer than 30 bp are on a par with those of the GATK for indels shorter than 30 bp, achieving up to 90% precision overall, with >80% of calls correctly typed. In predicting de novo indels longer than 30 bp in family contexts, MATE-CLEVER even raises the standards of the GATK. MATE-CLEVER achieves precision and recall of ∼63% on indels of 30 bp and longer versus 55% in both categories for the GATK on indels of 10-29 bp. A special version of MATE-CLEVER has contributed to indel discovery, in particular for indels of 30-100 bp, the 'NGS twilight zone of indels', in the Genome of the Netherlands Project. AVAILABILITY AND IMPLEMENTATION: http://clever-sv.googlecode.com/
Authors: Anne-Katrin Emde; Marcel H Schulz; David Weese; Ruping Sun; Martin Vingron; Vera M Kalscheuer; Stefan A Haas; Knut Reinert Journal: Bioinformatics Date: 2012-01-11 Impact factor: 6.937
Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043
Authors: Sante Gnerre; Iain Maccallum; Dariusz Przybylski; Filipe J Ribeiro; Joshua N Burton; Bruce J Walker; Ted Sharpe; Giles Hall; Terrance P Shea; Sean Sykes; Aaron M Berlin; Daniel Aird; Maura Costello; Riza Daza; Louise Williams; Robert Nicol; Andreas Gnirke; Chad Nusbaum; Eric S Lander; David B Jaffe Journal: Proc Natl Acad Sci U S A Date: 2010-12-27 Impact factor: 11.205
Authors: Ken Chen; John W Wallis; Michael D McLellan; David E Larson; Joelle M Kalicki; Craig S Pohl; Sean D McGrath; Michael C Wendl; Qunyuan Zhang; Devin P Locke; Xiaoqi Shi; Robert S Fulton; Timothy J Ley; Richard K Wilson; Li Ding; Elaine R Mardis Journal: Nat Methods Date: 2009-08-09 Impact factor: 28.547
Authors: Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly Journal: Nat Genet Date: 2011-04-10 Impact factor: 38.330
Authors: Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J Craig Venter Journal: PLoS Biol Date: 2007-09-04 Impact factor: 8.029
Authors: Li C Xia; Sukolsak Sakshuwong; Erik S Hopmans; John M Bell; Susan M Grimes; David O Siegmund; Hanlee P Ji; Nancy R Zhang Journal: Nucleic Acids Res Date: 2016-06-20 Impact factor: 16.971
Authors: Biao Liu; Jeffrey M Conroy; Carl D Morrison; Adekunle O Odunsi; Maochun Qin; Lei Wei; Donald L Trump; Candace S Johnson; Song Liu; Jianmin Wang Journal: Oncotarget Date: 2015-03-20
Authors: Wigard P Kloosterman; Laurent C Francioli; Fereydoun Hormozdiari; Tobias Marschall; Jayne Y Hehir-Kwa; Abdel Abdellaoui; Eric-Wubbo Lameijer; Matthijs H Moed; Vyacheslav Koval; Ivo Renkens; Markus J van Roosmalen; Pascal Arp; Lennart C Karssen; Bradley P Coe; Robert E Handsaker; Eka D Suchiman; Edwin Cuppen; Djie Tjwan Thung; Mitch McVey; Michael C Wendl; André Uitterlinden; Cornelia M van Duijn; Morris A Swertz; Cisca Wijmenga; GertJan B van Ommen; P Eline Slagboom; Dorret I Boomsma; Alexander Schönhuth; Evan E Eichler; Paul I W de Bakker; Kai Ye; Victor Guryev Journal: Genome Res Date: 2015-04-16 Impact factor: 9.043