MOTIVATION: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read length hinders us from better assembling the genome from scratch. New protocols now exist that can generate overlapping pair-end reads. By joining the 3' ends of each read pair, one is able to construct longer reads for assembling. However, effectively joining two overlapped pair-end reads remains a challenging task. RESULT: In this article, we present an efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads. AVAILABILITY AND IMPLEMENTATION: COPE is implemented in C++ and is freely available as open-source code at ftp://ftp.genomics.org.cn/pub/cope. CONTACT: twlam@cs.hku.hk or luoruibang@genomics.org.cn
MOTIVATION: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read length hinders us from better assembling the genome from scratch. New protocols now exist that can generate overlapping pair-end reads. By joining the 3' ends of each read pair, one is able to construct longer reads for assembling. However, effectively joining two overlapped pair-end reads remains a challenging task. RESULT: In this article, we present an efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads. AVAILABILITY AND IMPLEMENTATION: COPE is implemented in C++ and is freely available as open-source code at ftp://ftp.genomics.org.cn/pub/cope. CONTACT: twlam@cs.hku.hk or luoruibang@genomics.org.cn
Authors: David John Bertioli; Steven B Cannon; Lutz Froenicke; Guodong Huang; Andrew D Farmer; Ethalinda K S Cannon; Xin Liu; Dongying Gao; Josh Clevenger; Sudhansu Dash; Longhui Ren; Márcio C Moretzsohn; Kenta Shirasawa; Wei Huang; Bruna Vidigal; Brian Abernathy; Ye Chu; Chad E Niederhuth; Pooja Umale; Ana Cláudia G Araújo; Alexander Kozik; Kyung Do Kim; Mark D Burow; Rajeev K Varshney; Xingjun Wang; Xinyou Zhang; Noelle Barkley; Patrícia M Guimarães; Sachiko Isobe; Baozhu Guo; Boshou Liao; H Thomas Stalker; Robert J Schmitz; Brian E Scheffler; Soraya C M Leal-Bertioli; Xu Xun; Scott A Jackson; Richard Michelmore; Peggy Ozias-Akins Journal: Nat Genet Date: 2016-02-22 Impact factor: 38.330
Authors: Carole F S Koning-Boucoiran; G Danny Esselink; Mirjana Vukosavljev; Wendy P C van 't Westende; Virginia W Gitonga; Frans A Krens; Roeland E Voorrips; W Eric van de Weg; Dietmar Schulz; Thomas Debener; Chris Maliepaard; Paul Arens; Marinus J M Smulders Journal: Front Plant Sci Date: 2015-04-21 Impact factor: 5.753