Thomas Hackl1, Rainer Hedrich2, Jörg Schultz2, Frank Förster2. 1. Department for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs-Platz 2, 97082 Würzburg, Germany and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany Department for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs-Platz 2, 97082 Würzburg, Germany and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany. 2. Department for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs-Platz 2, 97082 Würzburg, Germany and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany.
Abstract
MOTIVATION: Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects. RESULTS: Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing. AVAILABILITY AND IMPLEMENTATION: proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de.
MOTIVATION: Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects. RESULTS: Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing. AVAILABILITY AND IMPLEMENTATION: proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de.
Authors: Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach Journal: Nat Methods Date: 2013-05-05 Impact factor: 28.547
Authors: Sergey Koren; Michael C Schatz; Brian P Walenz; Jeffrey Martin; Jason T Howard; Ganeshkumar Ganapathy; Zhong Wang; David A Rasko; W Richard McCombie; Erich D Jarvis Journal: Nat Biotechnol Date: 2012-07-01 Impact factor: 54.908
Authors: Jason R Miller; Arthur L Delcher; Sergey Koren; Eli Venter; Brian P Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark Mobarry; Granger Sutton Journal: Bioinformatics Date: 2008-10-24 Impact factor: 6.937
Authors: Timothy Y James; Lucas A Michelotti; Alexander D Glasco; Rebecca A Clemons; Robert A Powers; Ellen S James; D Rabern Simmons; Fengyan Bai; Shuhua Ge Journal: Genetics Date: 2019-08-01 Impact factor: 4.562
Authors: Matthew T Parker; Katarzyna Knop; Anna V Sherwood; Nicholas J Schurch; Katarzyna Mackinnon; Peter D Gould; Anthony Jw Hall; Geoffrey J Barton; Gordon G Simpson Journal: Elife Date: 2020-01-14 Impact factor: 8.140
Authors: Scott T Small; Frédéric Labbé; Yaya I Coulibaly; Thomas B Nutman; Christopher L King; David Serre; Peter A Zimmerman Journal: Mol Biol Evol Date: 2019-09-01 Impact factor: 16.240
Authors: Jianhui Xiong; Maxime Déraspe; Naeem Iqbal; Jennifer Ma; Frances B Jamieson; Jessica Wasserscheid; Ken Dewar; Peter M Hawkey; Paul H Roy Journal: Antimicrob Agents Chemother Date: 2016-10-21 Impact factor: 5.191