Alkes L Price1, Neil C Jones, Pavel A Pevzner. 1. Department of Computer Science and Engineering, University of California San Diego La Jolla, CA 92093-0114, USA.
Abstract
MOTIVATION: De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. RESULTS: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that approximately 2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence. AVAILABILITY: Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html
MOTIVATION: De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. RESULTS: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that approximately 2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence. AVAILABILITY: Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html
Authors: Rajeev K Varshney; Wenbin Chen; Yupeng Li; Arvind K Bharti; Rachit K Saxena; Jessica A Schlueter; Mark T A Donoghue; Sarwar Azam; Guangyi Fan; Adam M Whaley; Andrew D Farmer; Jaime Sheridan; Aiko Iwata; Reetu Tuteja; R Varma Penmetsa; Wei Wu; Hari D Upadhyaya; Shiaw-Pyng Yang; Trushar Shah; K B Saxena; Todd Michael; W Richard McCombie; Bicheng Yang; Gengyun Zhang; Huanming Yang; Jun Wang; Charles Spillane; Douglas R Cook; Gregory D May; Xun Xu; Scott A Jackson Journal: Nat Biotechnol Date: 2011-11-06 Impact factor: 54.908
Authors: John A St John; Edward L Braun; Sally R Isberg; Lee G Miles; Amanda Y Chong; Jaime Gongora; Pauline Dalzell; Christopher Moran; Bertrand Bed'hom; Arkhat Abzhanov; Shane C Burgess; Amanda M Cooksey; Todd A Castoe; Nicholas G Crawford; Llewellyn D Densmore; Jennifer C Drew; Scott V Edwards; Brant C Faircloth; Matthew K Fujita; Matthew J Greenwold; Federico G Hoffmann; Jonathan M Howard; Taisen Iguchi; Daniel E Janes; Shahid Yar Khan; Satomi Kohno; Ap Jason de Koning; Stacey L Lance; Fiona M McCarthy; John E McCormack; Mark E Merchant; Daniel G Peterson; David D Pollock; Nader Pourmand; Brian J Raney; Kyria A Roessler; Jeremy R Sanford; Roger H Sawyer; Carl J Schmidt; Eric W Triplett; Tracey D Tuberville; Miryam Venegas-Anaya; Jason T Howard; Erich D Jarvis; Louis J Guillette; Travis C Glenn; Richard E Green; David A Ray Journal: Genome Biol Date: 2012-01-31 Impact factor: 13.583
Authors: Jingwei Jiang; Jun Li; Hoi Shan Kwan; Chun Hang Au; Patrick Tik Wan Law; Lei Li; Kai Man Kam; Julia Mei Lun Ling; Frederick C Leung Journal: BMC Res Notes Date: 2012-01-31