MOTIVATION: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. RESULTS: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory.
MOTIVATION: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. RESULTS: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory.
Authors: Belinda Giardine; Cathy Riemer; Ross C Hardison; Richard Burhans; Laura Elnitski; Prachi Shah; Yi Zhang; Daniel Blankenberg; Istvan Albert; James Taylor; Webb Miller; W James Kent; Anton Nekrutenko Journal: Genome Res Date: 2005-09-16 Impact factor: 9.043
Authors: Shinichi Sunagawa; Emily C Wilson; Michael Thaler; Marc L Smith; Carlo Caruso; John R Pringle; Virginia M Weis; Mónica Medina; Jodi A Schwarz Journal: BMC Genomics Date: 2009-06-05 Impact factor: 3.969
Authors: Philipp Brand; Santiago R Ramírez; Florian Leese; J Javier G Quezada-Euan; Ralph Tollrian; Thomas Eltz Journal: BMC Evol Biol Date: 2015-08-28 Impact factor: 3.260
Authors: Anna V Klepikova; Artem S Kasianov; Mikhail S Chesnokov; Natalia L Lazarevich; Aleksey A Penin; Maria Logacheva Journal: PeerJ Date: 2017-03-16 Impact factor: 2.984